Comparing Constituency and Dependency Representations for SMT Phrase-Extraction

Mary Hearne, Sylwia Ozdowska, John Tinsley


Abstract
We consider the value of replacing and/or combining string-basedmethods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.
Anthology ID:
2008.jeptalnrecital-court.14
Volume:
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Month:
June
Year:
2008
Address:
Avignon, France
Editors:
Frédéric Béchet, Jean-Francois Bonastre
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
131–140
Language:
URL:
https://aclanthology.org/2008.jeptalnrecital-court.14
DOI:
Bibkey:
Cite (ACL):
Mary Hearne, Sylwia Ozdowska, and John Tinsley. 2008. Comparing Constituency and Dependency Representations for SMT Phrase-Extraction. In Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, pages 131–140, Avignon, France. ATALA.
Cite (Informal):
Comparing Constituency and Dependency Representations for SMT Phrase-Extraction (Hearne et al., JEP/TALN/RECITAL 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.jeptalnrecital-court.14.pdf