Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, re...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
National Research University Higher School of Economics
2024-12-01
|
Series: | Journal of Language and Education |
Subjects: | |
Online Access: | https://jle.hse.ru/article/view/22272 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841555963334623232 |
---|---|
author | Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya |
author_facet | Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya |
author_sort | Vladimir Starchenko |
collection | DOAJ |
description |
Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems.
Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle.
Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts.
Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters.
Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC.
|
format | Article |
id | doaj-art-348b1f1486e64c499484938a91c9c4a4 |
institution | Kabale University |
issn | 2411-7390 |
language | English |
publishDate | 2024-12-01 |
publisher | National Research University Higher School of Economics |
record_format | Article |
series | Journal of Language and Education |
spelling | doaj-art-348b1f1486e64c499484938a91c9c4a42025-01-07T16:17:16ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22272Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction TaskVladimir Starchenko0Darya Kharlamova1Elizaveta Klykova2Anastasia Shavrina3Aleksey Starchenko4Olga Vinogradova5Olga Lyashevskaya6HSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, Russia; Vinogradov Russian Language Institute, Russian Academy of Sciences, Moscow, Russia Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems. Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle. Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts. Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters. Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC. https://jle.hse.ru/article/view/22272Grammatical error correctionL2 errorsESLconcentrated datasetscross-sentence GEC |
spellingShingle | Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task Journal of Language and Education Grammatical error correction L2 errors ESL concentrated datasets cross-sentence GEC |
title | Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task |
title_full | Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task |
title_fullStr | Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task |
title_full_unstemmed | Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task |
title_short | Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task |
title_sort | fighting evaluation inflation concentrated datasets for grammatical error correction task |
topic | Grammatical error correction L2 errors ESL concentrated datasets cross-sentence GEC |
url | https://jle.hse.ru/article/view/22272 |
work_keys_str_mv | AT vladimirstarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT daryakharlamova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT elizavetaklykova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT anastasiashavrina fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT alekseystarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT olgavinogradova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT olgalyashevskaya fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask |