Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ahmad Mouri Zadeh Khaki, Ahyoung Choi
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	AI hardware acceleration convolutional neural network (CNN) deep learning field-programmable gate array (FPGA) transfer learning
Online Access:	https://www.mdpi.com/2076-3417/15/1/422
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841549368808701952
author	Ahmad Mouri Zadeh Khaki Ahyoung Choi
author_facet	Ahmad Mouri Zadeh Khaki Ahyoung Choi
author_sort	Ahmad Mouri Zadeh Khaki
collection	DOAJ
description	Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.
format	Article
id	doaj-art-a7c8c734d1db47789af2b49c41dc3c78
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-a7c8c734d1db47789af2b49c41dc3c782025-01-10T13:15:30ZengMDPI AGApplied Sciences2076-34172025-01-0115142210.3390/app15010422Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image ClassificationAhmad Mouri Zadeh Khaki0Ahyoung Choi1Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDepartment of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDeep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.https://www.mdpi.com/2076-3417/15/1/422AI hardware accelerationconvolutional neural network (CNN)deep learningfield-programmable gate array (FPGA)transfer learning
spellingShingle	Ahmad Mouri Zadeh Khaki Ahyoung Choi Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification Applied Sciences AI hardware acceleration convolutional neural network (CNN) deep learning field-programmable gate array (FPGA) transfer learning
title	Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_full	Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_fullStr	Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_full_unstemmed	Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_short	Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
title_sort	optimizing deep learning acceleration on fpga for real time and resource efficient image classification
topic	AI hardware acceleration convolutional neural network (CNN) deep learning field-programmable gate array (FPGA) transfer learning
url	https://www.mdpi.com/2076-3417/15/1/422
work_keys_str_mv	AT ahmadmourizadehkhaki optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification AT ahyoungchoi optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification

Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

Similar Items