EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM

Abstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic d...

Full description

Saved in:
Bibliographic Details
Main Author: Anna V. Glazkova
Format: Article
Language:Russian
Published: Dagestan State Technical University 2017-07-01
Series:Вестник Дагестанского государственного технического университета: Технические науки
Subjects:
Online Access:https://vestnik.dgtu.ru/jour/article/view/370
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849250190043643904
author Anna V. Glazkova
author_facet Anna V. Glazkova
author_sort Anna V. Glazkova
collection DOAJ
description Abstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of age restrictions on the content of Internet webpages and text resources. Moreover, there has been little coverage of this issue in the works of Russian researchers. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Results The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified. A description of the sample texts and characteristics of category designations are given for a computational experiment. The computational experiment was carried out using texts included in the National Corpus of the Russian language. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was also confirmed. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients.
format Article
id doaj-art-b817f8ef93724466ac20cc771fe125ea
institution Kabale University
issn 2073-6185
2542-095X
language Russian
publishDate 2017-07-01
publisher Dagestan State Technical University
record_format Article
series Вестник Дагестанского государственного технического университета: Технические науки
spelling doaj-art-b817f8ef93724466ac20cc771fe125ea2025-08-20T03:57:21ZrusDagestan State Technical UniversityВестник Дагестанского государственного технического университета: Технические науки2073-61852542-095X2017-07-01441869310.21822/2073-6185-2017-44-1-86-93330EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEMAnna V. Glazkova0Tyumen State UniversityAbstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of age restrictions on the content of Internet webpages and text resources. Moreover, there has been little coverage of this issue in the works of Russian researchers. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Results The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified. A description of the sample texts and characteristics of category designations are given for a computational experiment. The computational experiment was carried out using texts included in the National Corpus of the Russian language. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was also confirmed. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients.https://vestnik.dgtu.ru/jour/article/view/370euclidean distancemahalanobis distancedocument classificationnatural language processingtext characteristicstextclassification feature
spellingShingle Anna V. Glazkova
EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
Вестник Дагестанского государственного технического университета: Технические науки
euclidean distance
mahalanobis distance
document classification
natural language processing
text characteristics
text
classification feature
title EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
title_full EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
title_fullStr EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
title_full_unstemmed EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
title_short EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
title_sort efficiency assessment of euclidean and makhalanobis distances for solving a major text classification problem
topic euclidean distance
mahalanobis distance
document classification
natural language processing
text characteristics
text
classification feature
url https://vestnik.dgtu.ru/jour/article/view/370
work_keys_str_mv AT annavglazkova efficiencyassessmentofeuclideanandmakhalanobisdistancesforsolvingamajortextclassificationproblem