Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task

Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, re...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vladimir Starchenko, Darya Kharlamova, Elizaveta Klykova, Anastasia Shavrina, Aleksey Starchenko, Olga Vinogradova, Olga Lyashevskaya
Format:	Article
Language:	English
Published:	National Research University Higher School of Economics 2024-12-01
Series:	Journal of Language and Education
Subjects:	Grammatical error correction L2 errors ESL concentrated datasets cross-sentence GEC
Online Access:	https://jle.hse.ru/article/view/22272
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841555963334623232
author	Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya
author_facet	Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya
author_sort	Vladimir Starchenko
collection	DOAJ
description	Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems. Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle. Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts. Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters. Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC.
format	Article
id	doaj-art-348b1f1486e64c499484938a91c9c4a4
institution	Kabale University
issn	2411-7390
language	English
publishDate	2024-12-01
publisher	National Research University Higher School of Economics
record_format	Article
series	Journal of Language and Education
spelling	doaj-art-348b1f1486e64c499484938a91c9c4a42025-01-07T16:17:16ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22272Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction TaskVladimir Starchenko0Darya Kharlamova1Elizaveta Klykova2Anastasia Shavrina3Aleksey Starchenko4Olga Vinogradova5Olga Lyashevskaya6HSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, RussiaHSE University, Moscow, Russiaindependent researcherHSE University, Moscow, Russia; Vinogradov Russian Language Institute, Russian Academy of Sciences, Moscow, Russia Background: Grammatical error correction (GEC) systems have greatly developed over the recent decade. According to common metrics, they often reach the level of or surpass human experts. Nevertheless, they perform poorly on several kinds of errors that are effortlessly corrected by humans. Thus, reaching the resolution limit, evaluation algorithms and datasets do not allow for further enhancement of GEC systems. Purpose: To solve the problem of the resolution limit in GEC. The suggested approach is to use for evaluation concentrated datasets with a higher density of errors that are difficult for modern GEC systems to handle. Method: To test the suggested solution, we look at distant-context-sensitive errors that have been acknowledged as challenging for GEC systems. We create a concentrated dataset for English with a higher density of errors of various types, half-manually aggregating pre-annotated examples from four existing datasets and further expanding the annotation of distant-context-sensitive errors. Two GEC systems are evaluated using this dataset, including traditional scoring algorithms and a novel approach modified for longer contexts. Results: The concentrated dataset includes 1,014 examples sampled manually from FCE, CoNLL-2014, BEA-2019, and REALEC. It is annotated for types of context-sensitive errors such as pronouns, verb tense, punctuation, referential device, and linking device. GEC systems show lower scores when evaluated on the dataset with a higher density of challenging errors, compared to a random dataset with otherwise the same parameters. Conclusion: The lower scores registered on concentrated datasets confirm that they provide a way for future improvement of GEC models. The dataset can be used for further studies focusing on distant-context-sensitive GEC. https://jle.hse.ru/article/view/22272Grammatical error correctionL2 errorsESLconcentrated datasetscross-sentence GEC
spellingShingle	Vladimir Starchenko Darya Kharlamova Elizaveta Klykova Anastasia Shavrina Aleksey Starchenko Olga Vinogradova Olga Lyashevskaya Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task Journal of Language and Education Grammatical error correction L2 errors ESL concentrated datasets cross-sentence GEC
title	Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_full	Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_fullStr	Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_full_unstemmed	Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_short	Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task
title_sort	fighting evaluation inflation concentrated datasets for grammatical error correction task
topic	Grammatical error correction L2 errors ESL concentrated datasets cross-sentence GEC
url	https://jle.hse.ru/article/view/22272
work_keys_str_mv	AT vladimirstarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT daryakharlamova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT elizavetaklykova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT anastasiashavrina fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT alekseystarchenko fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT olgavinogradova fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask AT olgalyashevskaya fightingevaluationinflationconcentrateddatasetsforgrammaticalerrorcorrectiontask

Fighting Evaluation Inflation: Concentrated Datasets for Grammatical Error Correction Task

Similar Items