Improving Performance of Real-Time Object Detection in Edge Device Through Concurrent Multi-Frame Processing

As the performance and accuracy of machine learning and AI algorithms improve, the demand for adopting computer vision techniques to solve various problems, such as autonomous driving and AI robots, increases. To meet such demand, IoT and edge devices, which are small enough to be adopted in various...

Full description

Saved in:
Bibliographic Details
Main Authors: Seunghwan Kim, Changjong Kim, Sunggon Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807180/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As the performance and accuracy of machine learning and AI algorithms improve, the demand for adopting computer vision techniques to solve various problems, such as autonomous driving and AI robots, increases. To meet such demand, IoT and edge devices, which are small enough to be adopted in various environments while having sufficient computing capabilities, are being widely adopted. However, as devices are utilized in IoT and edge environments, which have harsh restrictions compared to traditional server environments, they are often limited by low computational and memory resources, in addition to the limited electrical power supply. This necessitates a unique approach for small IoT devices that are required to run complex tasks. In this paper, we propose a concurrent multi-frame processing scheme for real-time object detection algorithms. To do this, we first divide the video into individual frames and group the frames according to the number of cores in the device. Then, we allocate a group of frames per core to perform the object detection, resulting in parallel detection of multiple frames. We implement our scheme in YOLO (You Only Look Once), one of the most popular real-time object detection algorithms, on a state-of-the-art, resource-constrained IoT edge device, Nvidia Jetson Orin Nano, using real-world video and image datasets, including MS-COCO, ImageNet, PascalVOC, DOTA, animal videos, and car-traffic videos. Our evaluation results show that our proposed scheme can improve the diverse aspect of edge performance and improve the runtime, memory consumption, and power usage by up to 445%, 69%, and 73%, respectively. Additionally, it demonstrates improvements of <inline-formula> <tex-math notation="LaTeX">$2.10\times $ </tex-math></inline-formula> over state-of-the-art model optimization.
ISSN:2169-3536