EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM
Abstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic d...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | Russian |
| Published: |
Dagestan State Technical University
2017-07-01
|
| Series: | Вестник Дагестанского государственного технического университета: Технические науки |
| Subjects: | |
| Online Access: | https://vestnik.dgtu.ru/jour/article/view/370 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849250190043643904 |
|---|---|
| author | Anna V. Glazkova |
| author_facet | Anna V. Glazkova |
| author_sort | Anna V. Glazkova |
| collection | DOAJ |
| description | Abstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of age restrictions on the content of Internet webpages and text resources. Moreover, there has been little coverage of this issue in the works of Russian researchers. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Results The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified. A description of the sample texts and characteristics of category designations are given for a computational experiment. The computational experiment was carried out using texts included in the National Corpus of the Russian language. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was also confirmed. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients. |
| format | Article |
| id | doaj-art-b817f8ef93724466ac20cc771fe125ea |
| institution | Kabale University |
| issn | 2073-6185 2542-095X |
| language | Russian |
| publishDate | 2017-07-01 |
| publisher | Dagestan State Technical University |
| record_format | Article |
| series | Вестник Дагестанского государственного технического университета: Технические науки |
| spelling | doaj-art-b817f8ef93724466ac20cc771fe125ea2025-08-20T03:57:21ZrusDagestan State Technical UniversityВестник Дагестанского государственного технического университета: Технические науки2073-61852542-095X2017-07-01441869310.21822/2073-6185-2017-44-1-86-93330EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEMAnna V. Glazkova0Tyumen State UniversityAbstract. Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of age restrictions on the content of Internet webpages and text resources. Moreover, there has been little coverage of this issue in the works of Russian researchers. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Results The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified. A description of the sample texts and characteristics of category designations are given for a computational experiment. The computational experiment was carried out using texts included in the National Corpus of the Russian language. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was also confirmed. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients.https://vestnik.dgtu.ru/jour/article/view/370euclidean distancemahalanobis distancedocument classificationnatural language processingtext characteristicstextclassification feature |
| spellingShingle | Anna V. Glazkova EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM Вестник Дагестанского государственного технического университета: Технические науки euclidean distance mahalanobis distance document classification natural language processing text characteristics text classification feature |
| title | EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM |
| title_full | EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM |
| title_fullStr | EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM |
| title_full_unstemmed | EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM |
| title_short | EFFICIENCY ASSESSMENT OF EUCLIDEAN AND MAKHALANOBIS DISTANCES FOR SOLVING A MAJOR TEXT CLASSIFICATION PROBLEM |
| title_sort | efficiency assessment of euclidean and makhalanobis distances for solving a major text classification problem |
| topic | euclidean distance mahalanobis distance document classification natural language processing text characteristics text classification feature |
| url | https://vestnik.dgtu.ru/jour/article/view/370 |
| work_keys_str_mv | AT annavglazkova efficiencyassessmentofeuclideanandmakhalanobisdistancesforsolvingamajortextclassificationproblem |