Enhancing Moroccan Dialect Sentiment Analysis Through Optimized Preprocessing and Transfer Learning Techniques

This work investigates the challenges of sentiment analysis for Moroccan Arabic dialect (MD), where the lack of dialect-specific preprocessing methods complicates natural language processing tasks and affects sentiment classification performance. The research evaluates various preprocessing techniqu...

Full description

Saved in:
Bibliographic Details
Main Authors: Yassir Matrane, Faouzia Benabbou, Zineb Ellaky
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10788699/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work investigates the challenges of sentiment analysis for Moroccan Arabic dialect (MD), where the lack of dialect-specific preprocessing methods complicates natural language processing tasks and affects sentiment classification performance. The research evaluates various preprocessing techniques, including stemming and feature extraction, using two main transfer learning approaches: feature extraction with deep learning models and fine-tuning pre-trained models. Experimentations were conducted on four MD datasets to assess combinations of stemmers, feature extractors, and architectures. In the feature extraction approach, omitting stemming and employing the QARiB feature extractor with a BiGRU model yielded the highest accuracy on the FB and MAC datasets, reaching 90.45% and 75.50%, respectively. In the fine-tuning approach, DarijaBERT excelled on the FB dataset with an accuracy of 93.37% and an F1-score of 88.55%, while QaRIB and AraBERT performed comparably well on the MAC and MSAC datasets. Results suggest that excluding base form reduction methods, such as stemming and lemmatization, during fine-tuning enhances sentiment analysis performance in MD, highlighting the limitations of Modern Standard Arabic techniques for MD processing. This study provides valuable insights for improving Natural language processing (NLP) applications in Arabic dialects, particularly in sentiment analysis, by optimizing model performance without relying on standard preprocessing methods.
ISSN:2169-3536