Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/1/422 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841549368808701952 |
---|---|
author | Ahmad Mouri Zadeh Khaki Ahyoung Choi |
author_facet | Ahmad Mouri Zadeh Khaki Ahyoung Choi |
author_sort | Ahmad Mouri Zadeh Khaki |
collection | DOAJ |
description | Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. |
format | Article |
id | doaj-art-a7c8c734d1db47789af2b49c41dc3c78 |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-a7c8c734d1db47789af2b49c41dc3c782025-01-10T13:15:30ZengMDPI AGApplied Sciences2076-34172025-01-0115142210.3390/app15010422Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image ClassificationAhmad Mouri Zadeh Khaki0Ahyoung Choi1Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDepartment of AI and Software, Gachon University, Seongnam-si 13120, Republic of KoreaDeep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.https://www.mdpi.com/2076-3417/15/1/422AI hardware accelerationconvolutional neural network (CNN)deep learningfield-programmable gate array (FPGA)transfer learning |
spellingShingle | Ahmad Mouri Zadeh Khaki Ahyoung Choi Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification Applied Sciences AI hardware acceleration convolutional neural network (CNN) deep learning field-programmable gate array (FPGA) transfer learning |
title | Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification |
title_full | Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification |
title_fullStr | Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification |
title_full_unstemmed | Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification |
title_short | Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification |
title_sort | optimizing deep learning acceleration on fpga for real time and resource efficient image classification |
topic | AI hardware acceleration convolutional neural network (CNN) deep learning field-programmable gate array (FPGA) transfer learning |
url | https://www.mdpi.com/2076-3417/15/1/422 |
work_keys_str_mv | AT ahmadmourizadehkhaki optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification AT ahyoungchoi optimizingdeeplearningaccelerationonfpgaforrealtimeandresourceefficientimageclassification |