Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network
Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-base...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11098840/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-based OCR implementations have primarily focused on simplified datasets such as MNIST digits, limiting their generalizability to scripts like Urdu that exhibit extensive intra-class variability, contextual shaping, and diacritics. This study presents a hardware-accelerated Urdu OCR framework using a custom-designed Convolutional Neural Network (CNN) optimized for deployment on the Xilinx Zynq UltraScale+ MPSoC (ZCU104). The proposed CNN is trained on a novel large-scale dataset of 336,000 labeled images spanning 48 Urdu characters across 230 font styles. Compared to MNIST-based FPGA implementations, our approach addresses significantly higher script complexity while achieving a classification accuracy of 96.73% (FP32) and 94.06% (INT8). Hardware-aware quantization and deployment using the Vitis AI toolchain enabled 75% model compression with minimal accuracy loss, achieving real-time inference of 0.189 ms per character and 4,886.95 FPS, while consuming only 1.32 W. Benchmarking against CPU and GPU platforms confirmed substantial improvements in speed and energy efficiency. This work establishes a high-performance, scalable, and energy-efficient FPGA-based OCR framework for Urdu and sets the foundation for extending such solutions to other cursive, low-resource languages like Arabic, Pashto, and Persian. |
|---|---|
| ISSN: | 2169-3536 |