Noisy Channel for Low Resource Grammatical Error Correction

Simon Flachs, Ophélie Lacroix, Anders Søgaard


Abstract
This paper describes our contribution to the low-resource track of the BEA 2019 shared task on Grammatical Error Correction (GEC). Our approach to GEC builds on the theory of the noisy channel by combining a channel model and language model. We generate confusion sets from the Wikipedia edit history and use the frequencies of edits to estimate the channel model. Additionally, we use two pre-trained language models: 1) Google’s BERT model, which we fine-tune for specific error types and 2) OpenAI’s GPT-2 model, utilizing that it can operate with previous sentences as context. Furthermore, we search for the optimal combinations of corrections using beam search.
Anthology ID:
W19-4420
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–196
Language:
URL:
https://aclanthology.org/W19-4420
DOI:
10.18653/v1/W19-4420
Bibkey:
Cite (ACL):
Simon Flachs, Ophélie Lacroix, and Anders Søgaard. 2019. Noisy Channel for Low Resource Grammatical Error Correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 191–196, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Noisy Channel for Low Resource Grammatical Error Correction (Flachs et al., BEA 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4420.pdf