Incorporating Alternate Translations into English Translation Treebank

Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland, Colin Warner


Abstract
New annotation guidelines and new processing methods were developed to accommodate English treebank annotation of a parallel English/Chinese corpus of web data that includes alternate English translations (one fluent, one literal) of expressions that are idiomatic in the Chinese source. In previous machine translation programs, alternate translations of idiomatic expressions had been present in untreebanked data only, but due to the high frequency of such expressions in informal genres such as discussion forums, machine translation system developers requested that alternatives be added to the treebanked data as well. In consultation with machine translation researchers, we chose a pragmatic approach of syntactically annotating only the fluent translation, while retaining the alternate literal translation as a segregated node in the tree. Since the literal translation alternates are often incompatible with English syntax, this approach allows us to create fluent trees without losing information. This resource is expected to support machine translation efforts, and the flexibility provided by the alternate translations is an enhancement to the treebank for this purpose.
Anthology ID:
L14-1113
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1863–1868
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1159_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland, and Colin Warner. 2014. Incorporating Alternate Translations into English Translation Treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1863–1868, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Incorporating Alternate Translations into English Translation Treebank (Bies et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1159_Paper.pdf