Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU

Real-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the com...

Full description

Saved in:
Bibliographic Details
Main Authors: Guerrouj Fatima Zahra, Rodríguez Flórez Sergio, El Ouardi Abdelhafid, Abouzahir Mohamed, Ramzi Mustapha
Format: Article
Language:English
Published: EDP Sciences 2024-01-01
Series:ITM Web of Conferences
Online Access:https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554700377260032
author Guerrouj Fatima Zahra
Rodríguez Flórez Sergio
El Ouardi Abdelhafid
Abouzahir Mohamed
Ramzi Mustapha
author_facet Guerrouj Fatima Zahra
Rodríguez Flórez Sergio
El Ouardi Abdelhafid
Abouzahir Mohamed
Ramzi Mustapha
author_sort Guerrouj Fatima Zahra
collection DOAJ
description Real-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the computationally intensive convolution operations by employing the cuDNN library to achieve efficient inference. The evaluation assesses critical performance metrics, including object detection accuracy in terms of Mean Average Precision (mAP) and inference latency on the embedded architecture. We conduct a comparative analysis using the publicly available KITTI [7] database. The reported results establish a benchmark between the parallelized YOLOv4 model and the baseline implementation, assessing the advantages of cuDNN acceleration for real-time object detection on resource-constrained devices.
format Article
id doaj-art-4f03804a24c54c9bbabc91badaaaad48
institution Kabale University
issn 2271-2097
language English
publishDate 2024-01-01
publisher EDP Sciences
record_format Article
series ITM Web of Conferences
spelling doaj-art-4f03804a24c54c9bbabc91badaaaad482025-01-08T10:58:54ZengEDP SciencesITM Web of Conferences2271-20972024-01-01690400810.1051/itmconf/20246904008itmconf_maih2024_04008Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUGuerrouj Fatima Zahra0https://orcid.org/0009-0004-1714-5027Rodríguez Flórez Sergio1https://orcid.org/0000-0003-3029-7020El Ouardi Abdelhafid2https://orcid.org/0000-0003-3665-2185Abouzahir Mohamed3https://orcid.org/0000-0002-9743-2402Ramzi Mustapha4https://orcid.org/0000-0002-7905-0734Université Paris-Saclay, ENS Paris-Saclay, CNRS, SATIEUniversité Paris-Saclay, ENS Paris-Saclay, CNRS, SATIEUniversité Paris-Saclay, ENS Paris-Saclay, CNRS, SATIESystems Analysis, Information Processing and Industrial Management Laboratory, Higher School of Technology of Sale, Mohamed V UniversitySystems Analysis, Information Processing and Industrial Management Laboratory, Higher School of Technology of Sale, Mohamed V UniversityReal-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the computationally intensive convolution operations by employing the cuDNN library to achieve efficient inference. The evaluation assesses critical performance metrics, including object detection accuracy in terms of Mean Average Precision (mAP) and inference latency on the embedded architecture. We conduct a comparative analysis using the publicly available KITTI [7] database. The reported results establish a benchmark between the parallelized YOLOv4 model and the baseline implementation, assessing the advantages of cuDNN acceleration for real-time object detection on resource-constrained devices.https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf
spellingShingle Guerrouj Fatima Zahra
Rodríguez Flórez Sergio
El Ouardi Abdelhafid
Abouzahir Mohamed
Ramzi Mustapha
Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
ITM Web of Conferences
title Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
title_full Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
title_fullStr Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
title_full_unstemmed Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
title_short Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
title_sort optimizing convolution operations for yolov4 based object detection on gpu
url https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf
work_keys_str_mv AT guerroujfatimazahra optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu
AT rodriguezflorezsergio optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu
AT elouardiabdelhafid optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu
AT abouzahirmohamed optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu
AT ramzimustapha optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu