Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.

Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especia...

Full description

Saved in:

Bibliographic Details
Main Authors:	Andrei C Aioanei, Regine R Hunziker-Rodewald, Konstantin M Klein, Dominik L Michels
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2024-01-01
Series:	PLoS ONE
Online Access:	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846129419798708224
author	Andrei C Aioanei Regine R Hunziker-Rodewald Konstantin M Klein Dominik L Michels
author_facet	Andrei C Aioanei Regine R Hunziker-Rodewald Konstantin M Klein Dominik L Michels
author_sort	Andrei C Aioanei
collection	DOAJ
description	Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
format	Article
id	doaj-art-c9f2d671c8f14dd18a347f3071c92f76
institution	Kabale University
issn	1932-6203
language	English
publishDate	2024-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-c9f2d671c8f14dd18a347f3071c92f762024-12-10T05:32:51ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01194e029929710.1371/journal.pone.0299297Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.Andrei C AioaneiRegine R Hunziker-RodewaldKonstantin M KleinDominik L MichelsEpigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
spellingShingle	Andrei C Aioanei Regine R Hunziker-Rodewald Konstantin M Klein Dominik L Michels Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy. PLoS ONE
title	Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_full	Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_fullStr	Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_full_unstemmed	Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_short	Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_sort	deep aramaic towards a synthetic data paradigm enabling machine learning in epigraphy
url	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
work_keys_str_mv	AT andreicaioanei deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy AT reginerhunzikerrodewald deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy AT konstantinmklein deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy AT dominiklmichels deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy

Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.

Similar Items