Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks

Othman Zennaki, Nasredine Semmar, Laurent Besacier


Abstract
This work focuses on the development of linguistic analysis tools for resource-poor languages. We use a parallel corpus to produce a multilingual word representation based only on sentence level alignment. This representation is combined with the annotated source side (resource-rich language) of the parallel corpus to train text analysis tools for resource-poor languages. Our approach is based on Recurrent Neural Networks (RNN) and has the following advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about foreign languages, which makes it applicable to a wide range of resource-poor languages, (c) it provides truly multilingual taggers. In a previous study, we proposed a method based on Simple RNN to automatically induce a Part-Of-Speech (POS) tagger. In this paper, we propose an improvement of our neural model. We investigate the Bidirectional RNN and the inclusion of external information (for instance low level information from Part-Of-Speech tags) in the RNN to train a more complex tagger (for instance, a multilingual super sense tagger). We demonstrate the validity and genericity of our method by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual POS and super sense taggers.
Anthology ID:
C16-1044
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
450–460
Language:
URL:
https://aclanthology.org/C16-1044
DOI:
Bibkey:
Cite (ACL):
Othman Zennaki, Nasredine Semmar, and Laurent Besacier. 2016. Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 450–460, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks (Zennaki et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1044.pdf