Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task

Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, re...

Full description

Saved in:
Bibliographic Details
Main Authors: Vladimir Starchenko, Darya Kharlamova, Elizaveta Klykova, Anastasia Shavrina, Aleksey Starchenko, Olga Vinogradova, Olga Lyashevskaya
Format: Article
Language:English
Published: National Research University Higher School of Economics 2024-12-01
Series:Journal of Language and Education
Subjects:
Online Access:https://jle.hse.ru/article/view/22272
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555963334623232
author Vladimir Starchenko
Darya Kharlamova
Elizaveta Klykova
Anastasia Shavrina
Aleksey Starchenko
Olga Vinogradova
Olga Lyashevskaya
author_facet Vladimir Starchenko
Darya Kharlamova
Elizaveta Klykova
Anastasia Shavrina
Aleksey Starchenko
Olga Vinogradova
Olga Lyashevskaya
author_sort Vladimir Starchenko
collection DOAJ
description Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems. Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle. Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts. Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters. Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC.
format Article
id doaj-art-348b1f1486e64c499484938a91c9c4a4
institution Kabale University
issn 2411-7390
language English
publishDate 2024-12-01
publisher National Research University Higher School of Economics
record_format Article
series Journal of Language and Education
spelling doaj-art-348b1f1486e64c499484938a91c9c4a42025-01-07T16:17:16ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22272Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction TaskVladimir Starchenko0Darya Kharlamova1Elizaveta Klykova2Anastasia Shavrina3Aleksey Starchenko4Olga Vinogradova5Olga Lyashevskaya6HSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, Russia; Vinogradov Russian Language Institute, Russian Academy of Sciences, Moscow, Russia Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems. Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle. Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts. Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters. Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC. https://jle.hse.ru/article/view/22272Grammatical error correctionL2 errorsESLconcentrated datasetscross-sentence GEC
spellingShingle Vladimir Starchenko
Darya Kharlamova
Elizaveta Klykova
Anastasia Shavrina
Aleksey Starchenko
Olga Vinogradova
Olga Lyashevskaya
Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
Journal of Language and Education
Grammatical error correction
L2 errors
ESL
concentrated datasets
cross-sentence GEC
title Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_full Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_fullStr Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_full_unstemmed Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_short Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_sort fighting evaluation inflation concentrated datasets for grammatical error correction task
topic Grammatical error correction
L2 errors
ESL
concentrated datasets
cross-sentence GEC
url https://jle.hse.ru/article/view/22272
work_keys_str_mv AT vladimirstarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT daryakharlamova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT elizavetaklykova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT anastasiashavrina fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT alekseystarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT olgavinogradova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask
AT olgalyashevskaya fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask