ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
The creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article present...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Volgograd State University
2024-11-01
|
Series: | Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie |
Subjects: | |
Online Access: | https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-quality |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841545728328990720 |
---|---|
author | Vladimir A. Klyachin Ekaterina V. Khizhnyakova |
author_facet | Vladimir A. Klyachin Ekaterina V. Khizhnyakova |
author_sort | Vladimir A. Klyachin |
collection | DOAJ |
description | The creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article presents the results of automatic detection of high-quality media texts, as well as text samples with infodemic features carried out using a trained natural language model based on a manually labeled corpus. Manual marking of the corpus was carried out by experts based on the parameterization of the text content. The goal of our work is to build a model of the language
of media messages, assess the quality and identify detection errors caused by the linguistic characteristics of texts. Creating a model of the language of media messages is a condition for increasing the efficiency and quality of
artificial intelligence systems. It has been established that the test use of a trained natural language model allows filtering media texts with fairly high accuracy. The support vector machine method proved to be most effective.
The share of incorrectly recognized informative texts that meet the criteria of reliability and novelty is low and amounts to 6.2 percent. The percentage of incorrectly recognized uninformative texts is approximately 3.9 percent,
which indicates a fairly high efficiency of the developed model. The errors in the detection of informative texts are associated with the use of proper names (anthroponyms, toponyms) and numerals in the headings. Linguistic
features of misclassified texts containing signs of fake and misinformation comprise text samples using statements with speech verbs that are often used in informative texts. |
format | Article |
id | doaj-art-2fa95fe0548242c3bda28e88517cb78a |
institution | Kabale University |
issn | 1998-9911 2409-1979 |
language | English |
publishDate | 2024-11-01 |
publisher | Volgograd State University |
record_format | Article |
series | Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie |
spelling | doaj-art-2fa95fe0548242c3bda28e88517cb78a2025-01-11T16:09:17ZengVolgograd State UniversityVestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie1998-99112409-19792024-11-01235314610.15688/jvolsu2.2024.5.3ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITYVladimir A. Klyachin0https://orcid.org/0000-0003-1922-7849Ekaterina V. Khizhnyakova1https://orcid.org/0000-0002-7914-9988Volgograd State University, Volgograd, RussiaVolgograd State University, Volgograd, RussiaThe creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article presents the results of automatic detection of high-quality media texts, as well as text samples with infodemic features carried out using a trained natural language model based on a manually labeled corpus. Manual marking of the corpus was carried out by experts based on the parameterization of the text content. The goal of our work is to build a model of the language of media messages, assess the quality and identify detection errors caused by the linguistic characteristics of texts. Creating a model of the language of media messages is a condition for increasing the efficiency and quality of artificial intelligence systems. It has been established that the test use of a trained natural language model allows filtering media texts with fairly high accuracy. The support vector machine method proved to be most effective. The share of incorrectly recognized informative texts that meet the criteria of reliability and novelty is low and amounts to 6.2 percent. The percentage of incorrectly recognized uninformative texts is approximately 3.9 percent, which indicates a fairly high efficiency of the developed model. The errors in the detection of informative texts are associated with the use of proper names (anthroponyms, toponyms) and numerals in the headings. Linguistic features of misclassified texts containing signs of fake and misinformation comprise text samples using statements with speech verbs that are often used in informative texts.https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-qualitymedia textneural networklanguage modelmachine learning methodcorpusautomatic detection |
spellingShingle | Vladimir A. Klyachin Ekaterina V. Khizhnyakova ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie media text neural network language model machine learning method corpus automatic detection |
title | ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY |
title_full | ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY |
title_fullStr | ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY |
title_full_unstemmed | ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY |
title_short | ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY |
title_sort | attribution of media texts based on a trained natural language model and linguistic assessment of identification quality |
topic | media text neural network language model machine learning method corpus automatic detection |
url | https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-quality |
work_keys_str_mv | AT vladimiraklyachin attributionofmediatextsbasedonatrainednaturallanguagemodelandlinguisticassessmentofidentificationquality AT ekaterinavkhizhnyakova attributionofmediatextsbasedonatrainednaturallanguagemodelandlinguisticassessmentofidentificationquality |