Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmad Mouri Zadeh Khaki, Ahyoung Choi
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/422
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549368808701952
author Ahmad Mouri Zadeh Khaki
Ahyoung Choi
author_facet Ahmad Mouri Zadeh Khaki
Ahyoung Choi
author_sort Ahmad Mouri Zadeh Khaki
collection DOAJ
description Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.
format Article
id doaj-art-a7c8c734d1db47789af2b49c41dc3c78
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-a7c8c734d1db47789af2b49c41dc3c782025-01-10T13:15:30ZengMDPI AGApplied Sciences2076-34172025-01-0115142210.3390/app15010422Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image ClassificationAhmad Mouri Zadeh Khaki0Ahyoung Choi1Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDepartment of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDeep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.https://www.mdpi.com/2076-3417/15/1/422AI hardware accelerationconvolutional neural network (CNN)deep learningfield-programmable gate array (FPGA)transfer learning
spellingShingle Ahmad Mouri Zadeh Khaki
Ahyoung Choi
Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
Applied Sciences
AI hardware acceleration
convolutional neural network (CNN)
deep learning
field-programmable gate array (FPGA)
transfer learning
title Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_full Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_fullStr Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_full_unstemmed Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_short Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_sort optimizing deep learning acceleration on fpga for real time and resource efficient image classification
topic AI hardware acceleration
convolutional neural network (CNN)
deep learning
field-programmable gate array (FPGA)
transfer learning
url https://www.mdpi.com/2076-3417/15/1/422
work_keys_str_mv AT ahmadmourizadehkhaki optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification
AT ahyoungchoi optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification