Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models

Mauro Cettolo, Marcello Federico, Daniele Pighin, Nicola Bertoldi


Abstract
This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. per-word projections of chunk labels. Both rely on n-gram models of target sequences with different granularity: single words, micro-tags, chunks. In particular, n-grams defined over syntactic chunks should model syntactic constraints coping with word-group movements. Experimental analysis and evaluation conducted on two popular Chinese-English tasks suggest that the shallow-syntax joint-translation model has potential to outperform state-of-the-art phrase-based translation, with a reasonable computational overhead.
Anthology ID:
2008.amta-papers.3
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
56–64
Language:
URL:
https://aclanthology.org/2008.amta-papers.3
DOI:
Bibkey:
Cite (ACL):
Mauro Cettolo, Marcello Federico, Daniele Pighin, and Nicola Bertoldi. 2008. Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 56–64, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models (Cettolo et al., AMTA 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.amta-papers.3.pdf