Enhanced Category-Feature Association Measure

Text classification is one of the severe challenges for categorizing large and high-dimensional text data accurately and efficiently. Many features confuse the classification process, and feature selection (FS) strategies should be used to deal with the problem of high dimensionality. This paper pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Soran S. Badawi, Ari M. Saeed, Sara A. Ahmed, Diyari A. Hassan
Format: Article
Language:English
Published: Koya University 2025-08-01
Series:ARO-The Scientific Journal of Koya University
Subjects:
Online Access:https://aro.koyauniversity.org/index.php/aro/article/view/2034
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849228952430706688
author Soran S. Badawi
Ari M. Saeed
Sara A. Ahmed
Diyari A. Hassan
author_facet Soran S. Badawi
Ari M. Saeed
Sara A. Ahmed
Diyari A. Hassan
author_sort Soran S. Badawi
collection DOAJ
description Text classification is one of the severe challenges for categorizing large and high-dimensional text data accurately and efficiently. Many features confuse the classification process, and feature selection (FS) strategies should be used to deal with the problem of high dimensionality. This paper proposes a novel FS technique based on enhanced category-feature association measure (ECFAM). ECFAM utilizes the existence and elimination of terms and the complicated relationships among the terms across different sections. This one-of-a-kind approach emphasizes the key role of ancillary terms in classifying and differentiating categories. The comparison is done on two important datasets, Reuters-21578 and 20-Newsgroups, through two widely employed supervised machine learning classifiers and one deep learning algorithm. Throughout our experiments, we investigate the feature sizes in nine different feature sets, ranging from 50 to 4000. Experimental data show that ECFAM always performs better than other methods concerning accuracy and computational cost.
format Article
id doaj-art-a038ff00cef24f04b959da7b2c67bb9d
institution Kabale University
issn 2410-9355
2307-549X
language English
publishDate 2025-08-01
publisher Koya University
record_format Article
series ARO-The Scientific Journal of Koya University
spelling doaj-art-a038ff00cef24f04b959da7b2c67bb9d2025-08-22T10:18:52ZengKoya UniversityARO-The Scientific Journal of Koya University2410-93552307-549X2025-08-0113210.14500/aro.12034Enhanced Category-Feature Association MeasureSoran S. Badawi0https://orcid.org/0000-0001-9117-3078Ari M. Saeed1https://orcid.org/0000-0003-1350-9386Sara A. Ahmed2https://orcid.org/0000-0001-7330-6105Diyari A. Hassan3https://orcid.org/0000-0003-0710-1923Language Center, Charmo University, Chamchamal, KRG, Iraq., Kurdistan Region – F.R. IraqDepartment of Computer Science, University of Halabja, Halabja, Kurdistan Region – F.R. IraqDepartment of Computer Engineering, Komar University of Science and Technology, Sulaimaniyah, Kurdistan Region – F.R. IraqDepartment of Biomedical Engineering, Faculty of Engineering and Computer Science, Qaiwan International University, Sulaimaniyah, Kurdistan Region – F.R. Iraq Text classification is one of the severe challenges for categorizing large and high-dimensional text data accurately and efficiently. Many features confuse the classification process, and feature selection (FS) strategies should be used to deal with the problem of high dimensionality. This paper proposes a novel FS technique based on enhanced category-feature association measure (ECFAM). ECFAM utilizes the existence and elimination of terms and the complicated relationships among the terms across different sections. This one-of-a-kind approach emphasizes the key role of ancillary terms in classifying and differentiating categories. The comparison is done on two important datasets, Reuters-21578 and 20-Newsgroups, through two widely employed supervised machine learning classifiers and one deep learning algorithm. Throughout our experiments, we investigate the feature sizes in nine different feature sets, ranging from 50 to 4000. Experimental data show that ECFAM always performs better than other methods concerning accuracy and computational cost. https://aro.koyauniversity.org/index.php/aro/article/view/2034Dimension reductionFeature selectionLong short-term memoryMultinomial Naive BayesSupport vector machinesText classification
spellingShingle Soran S. Badawi
Ari M. Saeed
Sara A. Ahmed
Diyari A. Hassan
Enhanced Category-Feature Association Measure
ARO-The Scientific Journal of Koya University
Dimension reduction
Feature selection
Long short-term memory
Multinomial Naive Bayes
Support vector machines
Text classification
title Enhanced Category-Feature Association Measure
title_full Enhanced Category-Feature Association Measure
title_fullStr Enhanced Category-Feature Association Measure
title_full_unstemmed Enhanced Category-Feature Association Measure
title_short Enhanced Category-Feature Association Measure
title_sort enhanced category feature association measure
topic Dimension reduction
Feature selection
Long short-term memory
Multinomial Naive Bayes
Support vector machines
Text classification
url https://aro.koyauniversity.org/index.php/aro/article/view/2034
work_keys_str_mv AT soransbadawi enhancedcategoryfeatureassociationmeasure
AT arimsaeed enhancedcategoryfeatureassociationmeasure
AT saraaahmed enhancedcategoryfeatureassociationmeasure
AT diyariahassan enhancedcategoryfeatureassociationmeasure