Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning

Yanfei Wang, Yangdong Chen, Yuejie Zhang


Abstract
Neural network based models have achieved impressive results on the sentence classification task. However, most of previous work focuses on designing more sophisticated network or effective learning paradigms on monolingual data, which often suffers from insufficient discriminative knowledge for classification. In this paper, we investigate to improve sentence classification by multilingual data augmentation and consensus learning. Comparing to previous methods, our model can make use of multilingual data generated by machine translation and mine their language-share and language-specific knowledge for better representation and classification. We evaluate our model using English (i.e., source language) and Chinese (i.e., target language) data on several sentence classification tasks. Very positive classification performance can be achieved by our proposed model.
Anthology ID:
2020.ccl-1.78
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Editors:
Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
842–852
Language:
English
URL:
https://aclanthology.org/2020.ccl-1.78
DOI:
Bibkey:
Cite (ACL):
Yanfei Wang, Yangdong Chen, and Yuejie Zhang. 2020. Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 842–852, Haikou, China. Chinese Information Processing Society of China.
Cite (Informal):
Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning (Wang et al., CCL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ccl-1.78.pdf
Data
SST