Noisy Channel for Low Resource Grammatical Error Correction

Simon Flachs, Ophélie Lacroix, Anders Søgaard


Abstract
This paper describes our contribution to the low-resource track of the BEA 2019 shared task on Grammatical Error Correction (GEC). Our approach to GEC builds on the theory of the noisy channel by combining a channel model and language model. We generate confusion sets from the Wikipedia edit history and use the frequencies of edits to estimate the channel model. Additionally, we use two pre-trained language models: 1) Google’s BERT model, which we fine-tune for specific error types and 2) OpenAI’s GPT-2 model, utilizing that it can operate with previous sentences as context. Furthermore, we search for the optimal combinations of corrections using beam search.
Anthology ID:
W19-4420
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WS
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–196
URL:
https://www.aclweb.org/anthology/W19-4420
DOI:
10.18653/v1/W19-4420
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/W19-4420.pdf