Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension

Takanori Kusumoto, Tomoyosi Akiba


Abstract
Statistical machine translation (SMT) requires a parallel corpus between the source and target languages. Although a pivot-translation approach can be applied to a language pair that does not have a parallel corpus directly between them, it requires both source―pivot and pivot―target parallel corpora. We propose a novel approach to apply SMT to a resource-limited source language that has no parallel corpus but has only a word dictionary for the pivot language. The problems with dictionary-based translations lie in their ambiguity and incompleteness. The proposed method uses a word lattice representation of the pivot-language candidates and word lattice decoding to deal with the ambiguity; the lattice expansion is accomplished by using a pivot―target phrase translation table to compensate for the incompleteness. Our experimental evaluation showed that this approach is promising for applying SMT, even when a source-side parallel corpus is lacking.
Anthology ID:
L12-1393
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3929–3932
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/677_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Takanori Kusumoto and Tomoyosi Akiba. 2012. Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3929–3932, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension (Kusumoto & Akiba, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/677_Paper.pdf