Empowering sentiment analysis in social media: a comprehensive approach to enhance the classification of abusive Tamil comments using transformer models
Abstract Targeting individuals with abusive language based on social group membership such as gender, religion, or sexual orientation is a severe form of online verbal abuse. In today’s digital world, the extensive spread of harmful and abusive content on social media has escalated, frequently fueli...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-08-01
|
| Series: | Journal of Big Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s40537-025-01268-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Targeting individuals with abusive language based on social group membership such as gender, religion, or sexual orientation is a severe form of online verbal abuse. In today’s digital world, the extensive spread of harmful and abusive content on social media has escalated, frequently fueling real-world violence against marginalized groups. Recognizing such content is especially difficult in code-mixed contexts and low-resource languages such as Tamil, where the linguistic complexity and data scarcity create substantial challenges. This study proposes a novel use of adapter-based tuning in multilingual transformer models, specifically mBERT, MuRIL (base and large), XLM-RoBERTa (base and large), and mDeBERTa, for classifying abusive comments in Tamil. We use the dataset provided by the DravidianLangTech@ACL 2022 shared task and increased the number of comments in training set from 2240 to 3742 using preprocessing and text augmentation methods. Unlike previous approaches that rely heavily on feature extraction and fine-tuning, this research shows that adapter-based models not only reduce the number of trainable parameters but also outperform fine-tuned and feature-extraction-based models in terms of classification accuracy, especially for imbalanced and under-resourced text data. Of the models explored, the adapter-based mDeBERTa achieved the highest accuracy of 75.393%, validating the effectiveness of parameter-efficient transfer learning for abusive language detection in low-resource, code-mixed contexts. |
|---|---|
| ISSN: | 2196-1115 |