Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network

Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-base...

Full description

Saved in:
Bibliographic Details
Main Authors: Fauzia Yasir, Majida Kazmi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11098840/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-based OCR implementations have primarily focused on simplified datasets such as MNIST digits, limiting their generalizability to scripts like Urdu that exhibit extensive intra-class variability, contextual shaping, and diacritics. This study presents a hardware-accelerated Urdu OCR framework using a custom-designed Convolutional Neural Network (CNN) optimized for deployment on the Xilinx Zynq UltraScale+ MPSoC (ZCU104). The proposed CNN is trained on a novel large-scale dataset of 336,000 labeled images spanning 48 Urdu characters across 230 font styles. Compared to MNIST-based FPGA implementations, our approach addresses significantly higher script complexity while achieving a classification accuracy of 96.73% (FP32) and 94.06% (INT8). Hardware-aware quantization and deployment using the Vitis AI toolchain enabled 75% model compression with minimal accuracy loss, achieving real-time inference of 0.189 ms per character and 4,886.95 FPS, while consuming only 1.32 W. Benchmarking against CPU and GPU platforms confirmed substantial improvements in speed and energy efficiency. This work establishes a high-performance, scalable, and energy-efficient FPGA-based OCR framework for Urdu and sets the foundation for extending such solutions to other cursive, low-resource languages like Arabic, Pashto, and Persian.
ISSN:2169-3536