Context-Based Machine Translation

Jaime Carbonell, Steve Klein, David Miller, Mike Steinbaum, Tomer Grassiany, Jochen Frei


Abstract
Context-Based Machine TranslationTM (CBMT) is a new paradigm for corpus-based translation that requires no parallel text. Instead, CBMT relies on a lightweight translation model utilizing a fullform bilingual dictionary and a sophisticated decoder using long-range context via long n-grams and cascaded overlapping. The translation process is enhanced via in-language substitution of tokens and phrases, both for source and target, when top candidates cannot be confirmed or resolved in decoding. Substitution utilizes a synonym and near-synonym generator implemented as a corpus-based unsupervised learning process. Decoding requires a very large target-language-only corpus, and while substitution in target can be performed using that same corpus, substitution in source requires a separate (and smaller) source monolingual corpus. Spanish-to-English CBMT was tested on Spanish newswire text, achieving a BLEU score of 0.6462 in June 2006, the highest BLEU reported for any language pair. Further testing also shows that quality increases above the reported score as the target corpus size increases and as dictionary coverage of source words and phrases becomes more complete.
Anthology ID:
2006.amta-papers.3
Volume:
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
August 8-12
Year:
2006
Address:
Cambridge, Massachusetts, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
19–28
Language:
URL:
https://aclanthology.org/2006.amta-papers.3
DOI:
Bibkey:
Cite (ACL):
Jaime Carbonell, Steve Klein, David Miller, Mike Steinbaum, Tomer Grassiany, and Jochen Frei. 2006. Context-Based Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 19–28, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Context-Based Machine Translation (Carbonell et al., AMTA 2006)
Copy Citation:
PDF:
https://aclanthology.org/2006.amta-papers.3.pdf