Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

Chenchen Ding, Masao Utiyama, Eiichiro Sumita


Abstract
This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.
Anthology ID:
W16-4614
Volume:
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Toshiaki Nakazawa, Hideya Mino, Chenchen Ding, Isao Goto, Graham Neubig, Sadao Kurohashi, Ir. Hammam Riza, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
149–156
Language:
URL:
https://aclanthology.org/W16-4614
DOI:
Bibkey:
Cite (ACL):
Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2016. Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 149–156, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian (Ding et al., WAT 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4614.pdf