ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY

The creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article present...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vladimir A. Klyachin, Ekaterina V. Khizhnyakova
Format:	Article
Language:	English
Published:	Volgograd State University 2024-11-01
Series:	Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie
Subjects:	media text neural network language model machine learning method corpus automatic detection
Online Access:	https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-quality
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841545728328990720
author	Vladimir A. Klyachin Ekaterina V. Khizhnyakova
author_facet	Vladimir A. Klyachin Ekaterina V. Khizhnyakova
author_sort	Vladimir A. Klyachin
collection	DOAJ
description	The creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article presents the results of automatic detection of high-quality media texts, as well as text samples with infodemic features carried out using a trained natural language model based on a manually labeled corpus. Manual marking of the corpus was carried out by experts based on the parameterization of the text content. The goal of our work is to build a model of the language of media messages, assess the quality and identify detection errors caused by the linguistic characteristics of texts. Creating a model of the language of media messages is a condition for increasing the efficiency and quality of artificial intelligence systems. It has been established that the test use of a trained natural language model allows filtering media texts with fairly high accuracy. The support vector machine method proved to be most effective. The share of incorrectly recognized informative texts that meet the criteria of reliability and novelty is low and amounts to 6.2 percent. The percentage of incorrectly recognized uninformative texts is approximately 3.9 percent, which indicates a fairly high efficiency of the developed model. The errors in the detection of informative texts are associated with the use of proper names (anthroponyms, toponyms) and numerals in the headings. Linguistic features of misclassified texts containing signs of fake and misinformation comprise text samples using statements with speech verbs that are often used in informative texts.
format	Article
id	doaj-art-2fa95fe0548242c3bda28e88517cb78a
institution	Kabale University
issn	1998-9911 2409-1979
language	English
publishDate	2024-11-01
publisher	Volgograd State University
record_format	Article
series	Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie
spelling	doaj-art-2fa95fe0548242c3bda28e88517cb78a2025-01-11T16:09:17ZengVolgograd State UniversityVestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie1998-99112409-19792024-11-01235314610.15688/jvolsu2.2024.5.3ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITYVladimir A. Klyachin0https://orcid.org/0000-0003-1922-7849Ekaterina V. Khizhnyakova1https://orcid.org/0000-0002-7914-9988Volgograd State University, Volgograd, RussiaVolgograd State University, Volgograd, RussiaThe creation of effective systems for filtering media texts is due to the need to develop artificial intelligence systems, which is a large language model that should be trained using “correct” text samples that do not contain signs of disinformation, infodemic and unreliability. The article presents the results of automatic detection of high-quality media texts, as well as text samples with infodemic features carried out using a trained natural language model based on a manually labeled corpus. Manual marking of the corpus was carried out by experts based on the parameterization of the text content. The goal of our work is to build a model of the language of media messages, assess the quality and identify detection errors caused by the linguistic characteristics of texts. Creating a model of the language of media messages is a condition for increasing the efficiency and quality of artificial intelligence systems. It has been established that the test use of a trained natural language model allows filtering media texts with fairly high accuracy. The support vector machine method proved to be most effective. The share of incorrectly recognized informative texts that meet the criteria of reliability and novelty is low and amounts to 6.2 percent. The percentage of incorrectly recognized uninformative texts is approximately 3.9 percent, which indicates a fairly high efficiency of the developed model. The errors in the detection of informative texts are associated with the use of proper names (anthroponyms, toponyms) and numerals in the headings. Linguistic features of misclassified texts containing signs of fake and misinformation comprise text samples using statements with speech verbs that are often used in informative texts.https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-qualitymedia textneural networklanguage modelmachine learning methodcorpusautomatic detection
spellingShingle	Vladimir A. Klyachin Ekaterina V. Khizhnyakova ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie media text neural network language model machine learning method corpus automatic detection
title	ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
title_full	ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
title_fullStr	ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
title_full_unstemmed	ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
title_short	ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY
title_sort	attribution of media texts based on a trained natural language model and linguistic assessment of identification quality
topic	media text neural network language model machine learning method corpus automatic detection
url	https://l.jvolsu.com/index.php/en/archive-en/928-science-journal-of-volsu-linguistics-2024-vol-23-no-5/artificial-intelligence-potential-in-natural-language-processing-and-machine-translation/2840-klyachin-v-a-khizhnyakova-e-v-attribution-of-media-texts-based-on-a-trained-natural-language-model-and-linguistic-assessment-of-identification-quality
work_keys_str_mv	AT vladimiraklyachin attributionofmediatextsbasedonatrainednaturallanguagemodelandlinguisticassessmentofidentificationquality AT ekaterinavkhizhnyakova attributionofmediatextsbasedonatrainednaturallanguagemodelandlinguisticassessmentofidentificationquality

ATTRIBUTION OF MEDIA TEXTS BASED ON A TRAINED NATURAL LANGUAGE MODEL AND LINGUISTIC ASSESSMENT OF IDENTIFICATION QUALITY

Similar Items