Language Model Based Grammatical Error Correction without Annotated Training Data

Christopher Bryant, Ted Briscoe


Abstract
Since the end of the CoNLL-2014 shared task on grammatical error correction (GEC), research into language model (LM) based approaches to GEC has largely stagnated. In this paper, we re-examine LMs in GEC and show that it is entirely possible to build a simple system that not only requires minimal annotated data (∼1000 sentences), but is also fairly competitive with several state-of-the-art systems. This approach should be of particular interest for languages where very little annotated training data exists, although we also hope to use it as a baseline to motivate future research.
Anthology ID:
W18-0529
Volume:
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Joel Tetreault, Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
247–253
Language:
URL:
https://aclanthology.org/W18-0529
DOI:
10.18653/v1/W18-0529
Bibkey:
Cite (ACL):
Christopher Bryant and Ted Briscoe. 2018. Language Model Based Grammatical Error Correction without Annotated Training Data. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 247–253, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Language Model Based Grammatical Error Correction without Annotated Training Data (Bryant & Briscoe, BEA 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-0529.pdf
Data
CoNLL-2014 Shared Task: Grammatical Error CorrectionFCEJFLEG