The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

The advent of generative artificial intelligence (AI) and large language models (LLMs) has introduced new possibilities in lexicography, particularly in defining dictionary entries with precision, while reducing the time cost compared to more traditional methods or software tools. To test AI’s ling...

Full description

Saved in:

Bibliographic Details
Main Authors:	Katica Balenović, Jakov Proroković
Format:	Article
Language:	English
Published:	Miroslav Krleža Institute of Lexicography 2025-06-01
Series:	Studia Lexicographica
Subjects:	ChatGPT lexicography in language contact overgeneralisation errors corpus-based sampling loanwords
Online Access:	https://studialexicographica.lzmk.hr/sl/article/view/461
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The advent of generative artificial intelligence (AI) and large language models (LLMs) has introduced new possibilities in lexicography, particularly in defining dictionary entries with precision, while reducing the time cost compared to more traditional methods or software tools. To test AI’s linguistic capabilities, our study goes beyond monolingual dictionary compilation and investigates the potential of the ChatGPT model in distinguishing between specific senses of loanwords in an L2 context. A corpus-based sampling of target English words was used to assess ChatGPT’s ability to delineate different word senses in which regularly occurring loanwords can be realised in the Croatian language context. The findings indicate that AI demonstrates notable proficiency in providing definitions in general, albeit with observable flaws when responding to prompts that specifically inquire about the possible senses or word classes of targeted loanwords in their L2 setting. Its accuracy diminishes when dealing with less frequently used loanwords, often exhibiting overgeneralisation from English (L1) to Croatian (L2). The AI’s tendency to produce erroneous examples, with suggested usages that lack attestation in language corpora, is discussed in detail, with the results supporting the notion that the model primarily interprets loanwords from an English perspective, regardless of the language used in the prompt. A comparison between AI responses from early 2024 and early 2025 suggests an improvement in the 2025 model, which exhibits a more nuanced handling of ambiguous cases. However, inconsistencies persist, particularly in how frequency of use correlates with the number of senses, much of which is interpreted as ChatGPT’s tendency to sometimes prioritise generating a response at the cost of accuracy.
ISSN:	1846-6745 2459-5578

The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

Similar Items