Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Michael A. Hedderich; Dietrich Klakow

doi:10.18653/v1/W18-3402

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Abstract

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier’s performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.

Anthology ID:: W18-3402
Volume:: Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:: July
Year:: 2018
Address:: Melbourne
Editors:: Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12–18
Language:
URL:: https://aclanthology.org/W18-3402
DOI:: 10.18653/v1/W18-3402
Bibkey:
Cite (ACL):: Michael A. Hedderich and Dietrich Klakow. 2018. Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 12–18, Melbourne. Association for Computational Linguistics.
Cite (Informal):: Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data (Hedderich & Klakow, ACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-3402.pdf
Data: CoNLL 2003

PDF Cite Search