CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification

Active learning has been a research area conducted across various domains for a long time, from traditional machine learning to the latest deep learning research. Particularly, obtaining high-quality labeled datasets for supervised learning requires human annotation, and an effective active learning...

Full description

Saved in:

Bibliographic Details
Main Authors:	Keuntae Kim, Yong Suk Choi
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	NLP (Natural Language Processing) supervised fine-tuning active learning sentence classification
Online Access:	https://www.mdpi.com/2076-3417/15/1/482
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841549337628246016
author	Keuntae Kim Yong Suk Choi
author_facet	Keuntae Kim Yong Suk Choi
author_sort	Keuntae Kim
collection	DOAJ
description	Active learning has been a research area conducted across various domains for a long time, from traditional machine learning to the latest deep learning research. Particularly, obtaining high-quality labeled datasets for supervised learning requires human annotation, and an effective active learning strategy can greatly reduce annotation costs. In this study, we propose a new insight, CFP-AL (Combining model Features and Prediction for Active Learning), from the perspective of feature space by analyzing and diagnosing methods that have shown good performance in NLP (Natural Language Processing) sentence classification. According to our analysis, while previous active learning strategies that focus on finding data near the decision boundary to facilitate classifier tuning are effective, there are very few data points near the decision boundary. Therefore, a more detailed active learning strategy is needed beyond simply finding data near the decision boundary or data with high uncertainty. Based on this analysis, we propose CFP-AL, which considers the model’s feature space, and it demonstrated the best performance across six tasks and also outperformed others in three Out-Of-Domain (OOD) tasks. While suggesting that data sampling through CFP-AL is the most differential classification standard, it showed novelty in suggesting a method to overcome the anisotropy phenomenon of supervised models. Additionally, through various comparative experiments with basic methods, we analyzed which data are most beneficial or harmful for model training. Through our research, researchers will be able to expand into the area of considering features in active learning, which has been difficult so far.
format	Article
id	doaj-art-384e5c21430a4151bd70fd9d2e573da6
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-384e5c21430a4151bd70fd9d2e573da62025-01-10T13:15:43ZengMDPI AGApplied Sciences2076-34172025-01-0115148210.3390/app15010482CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence ClassificationKeuntae Kim0Yong Suk Choi1Department of Computer Science, Hanyang University, Seoul 04763, Republic of KoreaDepartment of Computer Science, Hanyang University, Seoul 04763, Republic of KoreaActive learning has been a research area conducted across various domains for a long time, from traditional machine learning to the latest deep learning research. Particularly, obtaining high-quality labeled datasets for supervised learning requires human annotation, and an effective active learning strategy can greatly reduce annotation costs. In this study, we propose a new insight, CFP-AL (Combining model Features and Prediction for Active Learning), from the perspective of feature space by analyzing and diagnosing methods that have shown good performance in NLP (Natural Language Processing) sentence classification. According to our analysis, while previous active learning strategies that focus on finding data near the decision boundary to facilitate classifier tuning are effective, there are very few data points near the decision boundary. Therefore, a more detailed active learning strategy is needed beyond simply finding data near the decision boundary or data with high uncertainty. Based on this analysis, we propose CFP-AL, which considers the model’s feature space, and it demonstrated the best performance across six tasks and also outperformed others in three Out-Of-Domain (OOD) tasks. While suggesting that data sampling through CFP-AL is the most differential classification standard, it showed novelty in suggesting a method to overcome the anisotropy phenomenon of supervised models. Additionally, through various comparative experiments with basic methods, we analyzed which data are most beneficial or harmful for model training. Through our research, researchers will be able to expand into the area of considering features in active learning, which has been difficult so far.https://www.mdpi.com/2076-3417/15/1/482NLP (Natural Language Processing)supervised fine-tuningactive learningsentence classification
spellingShingle	Keuntae Kim Yong Suk Choi CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification Applied Sciences NLP (Natural Language Processing) supervised fine-tuning active learning sentence classification
title	CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification
title_full	CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification
title_fullStr	CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification
title_full_unstemmed	CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification
title_short	CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification
title_sort	cfp al combining model features and prediction for active learning in sentence classification
topic	NLP (Natural Language Processing) supervised fine-tuning active learning sentence classification
url	https://www.mdpi.com/2076-3417/15/1/482
work_keys_str_mv	AT keuntaekim cfpalcombiningmodelfeaturesandpredictionforactivelearninginsentenceclassification AT yongsukchoi cfpalcombiningmodelfeaturesandpredictionforactivelearninginsentenceclassification

CFP-AL: Combining Model Features and Prediction for Active Learning in Sentence Classification

Similar Items