Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages

Mathieu Dehouck; Carlos Gómez-Rodríguez

doi:10.18653/v1/2020.coling-main.339

Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages

Abstract

The lack of annotated data is a big issue for building reliable NLP systems for most of the world’s languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.

Anthology ID:: 2020.coling-main.339
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 3818–3830
Language:
URL:: https://aclanthology.org/2020.coling-main.339
DOI:: 10.18653/v1/2020.coling-main.339
Bibkey:
Cite (ACL):: Mathieu Dehouck and Carlos Gómez-Rodríguez. 2020. Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3818–3830, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages (Dehouck & Gómez-Rodríguez, COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.339.pdf

PDF Cite Search