Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis

As Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text em...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alejandro de Leon Langure, Mahdi Zareei
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Affective computing natural language processing sentiment analysis text emotion detection text emotion recognition
Online Access:	https://ieeexplore.ieee.org/document/10744050/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846163809341800448
author	Alejandro de Leon Langure Mahdi Zareei
author_facet	Alejandro de Leon Langure Mahdi Zareei
author_sort	Alejandro de Leon Langure
collection	DOAJ
description	As Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.
format	Article
id	doaj-art-07c32e39fbcf4b80aaedd4e5bb7b13dd
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-07c32e39fbcf4b80aaedd4e5bb7b13dd2024-11-19T00:01:58ZengIEEEIEEE Access2169-35362024-01-011216651216653610.1109/ACCESS.2024.349185610744050Improving Text Emotion Detection Through Comprehensive Dataset Quality AnalysisAlejandro de Leon Langure0https://orcid.org/0000-0002-8362-2045Mahdi Zareei1https://orcid.org/0000-0001-6623-1758School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, MexicoSchool of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, MexicoAs Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.https://ieeexplore.ieee.org/document/10744050/Affective computingnatural language processingsentiment analysistext emotion detectiontext emotion recognition
spellingShingle	Alejandro de Leon Langure Mahdi Zareei Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis IEEE Access Affective computing natural language processing sentiment analysis text emotion detection text emotion recognition
title	Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_full	Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_fullStr	Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_full_unstemmed	Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_short	Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_sort	improving text emotion detection through comprehensive dataset quality analysis
topic	Affective computing natural language processing sentiment analysis text emotion detection text emotion recognition
url	https://ieeexplore.ieee.org/document/10744050/
work_keys_str_mv	AT alejandrodeleonlangure improvingtextemotiondetectionthroughcomprehensivedatasetqualityanalysis AT mahdizareei improvingtextemotiondetectionthroughcomprehensivedatasetqualityanalysis

Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis

Similar Items