Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.

Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especia...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrei C Aioanei, Regine R Hunziker-Rodewald, Konstantin M Klein, Dominik L Michels
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846129419798708224
author Andrei C Aioanei
Regine R Hunziker-Rodewald
Konstantin M Klein
Dominik L Michels
author_facet Andrei C Aioanei
Regine R Hunziker-Rodewald
Konstantin M Klein
Dominik L Michels
author_sort Andrei C Aioanei
collection DOAJ
description Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
format Article
id doaj-art-c9f2d671c8f14dd18a347f3071c92f76
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-c9f2d671c8f14dd18a347f3071c92f762024-12-10T05:32:51ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01194e029929710.1371/journal.pone.0299297Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.Andrei C AioaneiRegine R Hunziker-RodewaldKonstantin M KleinDominik L MichelsEpigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
spellingShingle Andrei C Aioanei
Regine R Hunziker-Rodewald
Konstantin M Klein
Dominik L Michels
Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
PLoS ONE
title Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_full Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_fullStr Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_full_unstemmed Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_short Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.
title_sort deep aramaic towards a synthetic data paradigm enabling machine learning in epigraphy
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299297&type=printable
work_keys_str_mv AT andreicaioanei deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy
AT reginerhunzikerrodewald deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy
AT konstantinmklein deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy
AT dominiklmichels deeparamaictowardsasyntheticdataparadigmenablingmachinelearninginepigraphy