Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century: Possibilities, Challenges, and the Future of the First Generic AI-based Model

This article reports on developing and evaluating a generic Handwritten Text Recognition (HTR) model created for the automatic computer-assisted transcription of Ukrainian handwriting publicly available via the HTR platform Transkribus. The model’s training process encompasses diverse datasets, incl...

Full description

Saved in:
Bibliographic Details
Main Authors: Aleksej Tikhonov, Achim Rabus
Format: Article
Language:English
Published: National University of Kyiv-Mohyla Academy 2024-12-01
Series:Kyiv-Mohyla Humanities Journal
Subjects:
Online Access:http://kmhj.ukma.edu.ua/article/view/320422
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article reports on developing and evaluating a generic Handwritten Text Recognition (HTR) model created for the automatic computer-assisted transcription of Ukrainian handwriting publicly available via the HTR platform Transkribus. The model’s training process encompasses diverse datasets, including historical manuscripts by renowned poets Taras Shevchenko and Lesya Ukrainka, along with private correspondence used for the General Regionally Annotated Corpus of Ukrainian (GRAC) and a diary procured at the Holodomor Museum collection. We evaluate the model’s performance by comparing its theoretical accuracy, with a character error rate (CER) of 4.2%, against its practical efficacy when augmented with an AI-based language model for Ukrainian and a Large Language Model. The model is versatile and functional and can thus be applied for mass-digitization of Ukrainian cultural heritage. In our outlook section, we identify possibilities for further improving the model.
ISSN:2313-4895