Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU
Real-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the com...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2024-01-01
|
Series: | ITM Web of Conferences |
Online Access: | https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841554700377260032 |
---|---|
author | Guerrouj Fatima Zahra Rodríguez Flórez Sergio El Ouardi Abdelhafid Abouzahir Mohamed Ramzi Mustapha |
author_facet | Guerrouj Fatima Zahra Rodríguez Flórez Sergio El Ouardi Abdelhafid Abouzahir Mohamed Ramzi Mustapha |
author_sort | Guerrouj Fatima Zahra |
collection | DOAJ |
description | Real-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the computationally intensive convolution operations by employing the cuDNN library to achieve efficient inference. The evaluation assesses critical performance metrics, including object detection accuracy in terms of Mean Average Precision (mAP) and inference latency on the embedded architecture. We conduct a comparative analysis using the publicly available KITTI [7] database. The reported results establish a benchmark between the parallelized YOLOv4 model and the baseline implementation, assessing the advantages of cuDNN acceleration for real-time object detection on resource-constrained devices. |
format | Article |
id | doaj-art-4f03804a24c54c9bbabc91badaaaad48 |
institution | Kabale University |
issn | 2271-2097 |
language | English |
publishDate | 2024-01-01 |
publisher | EDP Sciences |
record_format | Article |
series | ITM Web of Conferences |
spelling | doaj-art-4f03804a24c54c9bbabc91badaaaad482025-01-08T10:58:54ZengEDP SciencesITM Web of Conferences2271-20972024-01-01690400810.1051/itmconf/20246904008itmconf_maih2024_04008Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUGuerrouj Fatima Zahra0https://orcid.org/0009-0004-1714-5027Rodríguez Flórez Sergio1https://orcid.org/0000-0003-3029-7020El Ouardi Abdelhafid2https://orcid.org/0000-0003-3665-2185Abouzahir Mohamed3https://orcid.org/0000-0002-9743-2402Ramzi Mustapha4https://orcid.org/0000-0002-7905-0734Université Paris-Saclay, ENS Paris-Saclay, CNRS, SATIEUniversité Paris-Saclay, ENS Paris-Saclay, CNRS, SATIEUniversité Paris-Saclay, ENS Paris-Saclay, CNRS, SATIESystems Analysis, Information Processing and Industrial Management Laboratory, Higher School of Technology of Sale, Mohamed V UniversitySystems Analysis, Information Processing and Industrial Management Laboratory, Higher School of Technology of Sale, Mohamed V UniversityReal-time object detection is crucial for autonomous vehicles, and YOLO (You Only Look Once) algorithms have demonstrated their effectiveness for this purpose. This study examines the performance of YOLOv4 [3] for real-time object detection on an embedded architecture. We focus on optimizing the computationally intensive convolution operations by employing the cuDNN library to achieve efficient inference. The evaluation assesses critical performance metrics, including object detection accuracy in terms of Mean Average Precision (mAP) and inference latency on the embedded architecture. We conduct a comparative analysis using the publicly available KITTI [7] database. The reported results establish a benchmark between the parallelized YOLOv4 model and the baseline implementation, assessing the advantages of cuDNN acceleration for real-time object detection on resource-constrained devices.https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf |
spellingShingle | Guerrouj Fatima Zahra Rodríguez Flórez Sergio El Ouardi Abdelhafid Abouzahir Mohamed Ramzi Mustapha Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU ITM Web of Conferences |
title | Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU |
title_full | Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU |
title_fullStr | Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU |
title_full_unstemmed | Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU |
title_short | Optimizing Convolution Operations for YOLOv4-based Object Detection on GPU |
title_sort | optimizing convolution operations for yolov4 based object detection on gpu |
url | https://www.itm-conferences.org/articles/itmconf/pdf/2024/12/itmconf_maih2024_04008.pdf |
work_keys_str_mv | AT guerroujfatimazahra optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu AT rodriguezflorezsergio optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu AT elouardiabdelhafid optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu AT abouzahirmohamed optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu AT ramzimustapha optimizingconvolutionoperationsforyolov4basedobjectdetectionongpu |