AI's effectiveness in language testing and feedback provision
Background: Language assessment is essential for evaluating proficiency, but traditional assessment methods often lack scalability and objectivity. While Artificial Intelligence (AI) enables automated solutions, its pedagogical impact is still understudied. Objectives: This study evaluated the effec...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Social Sciences and Humanities Open |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590291125006205 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Background: Language assessment is essential for evaluating proficiency, but traditional assessment methods often lack scalability and objectivity. While Artificial Intelligence (AI) enables automated solutions, its pedagogical impact is still understudied. Objectives: This study evaluated the effectiveness of the three most widely used AI-based tools (Grammarly, Duolingo, and ELSA Speak) for assessment of language and feedback provision. It seeks to explore the efficacy of AI, how AI accuracy is perceived, and the influence of AI on language learning outcomes. Methods: The study employed a mixed-method study design comprising quantitative and qualitative approaches involving 50 participants from diverse language learning backgrounds. Surveys and questionnaires were used to collect quantitative data. Semi-structured interviews, focus groups, and observational studies were conducted to assess in-depth user experiences. Quantitative data was analyzed using ANOVA and regression analysis, while qualitative data was analyzed using thematic analysis. Results: Quantitative analysis revealed a weak but statistically significant correlation between AI feedback and perceived learning outcomes (R2 = 0.08, p < 0.05). Users appreciated immediate error rectification via AI, especially in grammar and pronunciation, while also recording skepticism regarding their coherence, creativity, and discourse-level examination. The qualitative component reported mixed user trust while underscoring the technical and contextual limitations. Notably, utilization of feedback varied across findings while favoring structured assessments regarding AI use. Conclusion: AI tools, while being limited in their contextual judgment, still provide support for basic language corrections. Findings were reliant on general user perception, not including tool-specific efficacy. The study recommends a hybrid AI-human assessment model that balances efficiency with comprehensive assessment. Future research should focus on longitudinal research for tool-specific influences and adaptive feedback mechanisms. |
|---|---|
| ISSN: | 2590-2911 |