AI's effectiveness in language testing and feedback provision

Background: Language assessment is essential for evaluating proficiency, but traditional assessment methods often lack scalability and objectivity. While Artificial Intelligence (AI) enables automated solutions, its pedagogical impact is still understudied. Objectives: This study evaluated the effec...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmed Alshehri
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Social Sciences and Humanities Open
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590291125006205
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Language assessment is essential for evaluating proficiency, but traditional assessment methods often lack scalability and objectivity. While Artificial Intelligence (AI) enables automated solutions, its pedagogical impact is still understudied. Objectives: This study evaluated the effectiveness of the three most widely used AI-based tools (Grammarly, Duolingo, and ELSA Speak) for assessment of language and feedback provision. It seeks to explore the efficacy of AI, how AI accuracy is perceived, and the influence of AI on language learning outcomes. Methods: The study employed a mixed-method study design comprising quantitative and qualitative approaches involving 50 participants from diverse language learning backgrounds. Surveys and questionnaires were used to collect quantitative data. Semi-structured interviews, focus groups, and observational studies were conducted to assess in-depth user experiences. Quantitative data was analyzed using ANOVA and regression analysis, while qualitative data was analyzed using thematic analysis. Results: Quantitative analysis revealed a weak but statistically significant correlation between AI feedback and perceived learning outcomes (R2 = 0.08, p < 0.05). Users appreciated immediate error rectification via AI, especially in grammar and pronunciation, while also recording skepticism regarding their coherence, creativity, and discourse-level examination. The qualitative component reported mixed user trust while underscoring the technical and contextual limitations. Notably, utilization of feedback varied across findings while favoring structured assessments regarding AI use. Conclusion: AI tools, while being limited in their contextual judgment, still provide support for basic language corrections. Findings were reliant on general user perception, not including tool-specific efficacy. The study recommends a hybrid AI-human assessment model that balances efficiency with comprehensive assessment. Future research should focus on longitudinal research for tool-specific influences and adaptive feedback mechanisms.
ISSN:2590-2911