Identifying bilingual Multi-Word Expressions for Statistical Machine Translation

Dhouha Bouamor, Nasredine Semmar, Pierre Zweigenbaum


Abstract
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWE S in M OSES, a phrase based Statistical Machine Translation (SMT) system. We experimentally show that these textual units can improve translation quality.
Anthology ID:
L12-1527
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
674–679
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/886_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Dhouha Bouamor, Nasredine Semmar, and Pierre Zweigenbaum. 2012. Identifying bilingual Multi-Word Expressions for Statistical Machine Translation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 674–679, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation (Bouamor et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/886_Paper.pdf