Large aligned treebanks for syntax-based machine translation

Gideon Kotzé, Vincent Vandeghinste, Scott Martens, Jörg Tiedemann


Abstract
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present evaluation scores of both the nonterminal constituent alignments and the MT system itself, and in the latter case, compare them with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.
Anthology ID:
L12-1553
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
467–473
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/924_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Gideon Kotzé, Vincent Vandeghinste, Scott Martens, and Jörg Tiedemann. 2012. Large aligned treebanks for syntax-based machine translation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 467–473, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Large aligned treebanks for syntax-based machine translation (Kotzé et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/924_Paper.pdf