Re-ordering Source Sentences for SMT

Amit Sangodkar, Om Damani


Abstract
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are re-ordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.
Anthology ID:
L12-1163
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2164–2171
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/340_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Amit Sangodkar and Om Damani. 2012. Re-ordering Source Sentences for SMT. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2164–2171, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Re-ordering Source Sentences for SMT (Sangodkar & Damani, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/340_Paper.pdf