Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis

As Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text em...

Full description

Saved in:
Bibliographic Details
Main Authors: Alejandro de Leon Langure, Mahdi Zareei
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10744050/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846163809341800448
author Alejandro de Leon Langure
Mahdi Zareei
author_facet Alejandro de Leon Langure
Mahdi Zareei
author_sort Alejandro de Leon Langure
collection DOAJ
description As Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.
format Article
id doaj-art-07c32e39fbcf4b80aaedd4e5bb7b13dd
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-07c32e39fbcf4b80aaedd4e5bb7b13dd2024-11-19T00:01:58ZengIEEEIEEE Access2169-35362024-01-011216651216653610.1109/ACCESS.2024.349185610744050Improving Text Emotion Detection Through Comprehensive Dataset Quality AnalysisAlejandro de Leon Langure0https://orcid.org/0000-0002-8362-2045Mahdi Zareei1https://orcid.org/0000-0001-6623-1758School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, MexicoSchool of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, MexicoAs Artificial Intelligence assistants like OpenAI’s Chat-GPT or Google’s Gemini become increasingly integrated into our daily lives, their ability to understand and respond to human emotions expressed in natural language becomes essential. Affective computing, including text emotion detection (TED), has become crucial for human-computer interaction. However, the quality of datasets used for training supervised machine learning algorithms in TED often receives insufficient attention, potentially impacting model performance and comparability. This study addresses this gap by proposing a comprehensive framework for assessing dataset quality in TED. We introduce 14 quantitative metrics across four dimensions: representativity, readability, structure, and part-of-speech tag distribution, and investigate their impact on model performance. We conduct experiments on datasets with varying quality characteristics Using Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. Our findings demonstrate that changes in these quality metrics can lead to statistically significant variations in model performance, with most metrics showing over 5% impact on prediction accuracy. Notably, pre-trained models like BERT exhibit more robustness to dataset quality variations than models trained from scratch. These results underscore the importance of considering and reporting dataset quality metrics in TED research, as they significantly influence model performance and generalizability. Our study lays the groundwork for more rigorous dataset quality assessment in affective computing, potentially leading to more reliable and comparable TED models in the future.https://ieeexplore.ieee.org/document/10744050/Affective computingnatural language processingsentiment analysistext emotion detectiontext emotion recognition
spellingShingle Alejandro de Leon Langure
Mahdi Zareei
Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
IEEE Access
Affective computing
natural language processing
sentiment analysis
text emotion detection
text emotion recognition
title Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_full Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_fullStr Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_full_unstemmed Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_short Improving Text Emotion Detection Through Comprehensive Dataset Quality Analysis
title_sort improving text emotion detection through comprehensive dataset quality analysis
topic Affective computing
natural language processing
sentiment analysis
text emotion detection
text emotion recognition
url https://ieeexplore.ieee.org/document/10744050/
work_keys_str_mv AT alejandrodeleonlangure improvingtextemotiondetectionthroughcomprehensivedatasetqualityanalysis
AT mahdizareei improvingtextemotiondetectionthroughcomprehensivedatasetqualityanalysis