CombAlign: a Tool for Obtaining High-Quality Word Alignments

Steinþór Steingrímsson, Hrafn Loftsson, Andy Way


Abstract
Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good results in unsupervised scenarios. We evaluate an ensemble method for word alignment on four language pairs and demonstrate that by combining multiple tools, taking advantage of their different approaches, substantial gains can be made. This holds for settings ranging from very low-resource to high-resource. Furthermore, we introduce a new gold alignment test set for Icelandic and a new easy-to-use tool for creating manual word alignments.
Anthology ID:
2021.nodalida-main.7
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
64–73
Language:
URL:
https://aclanthology.org/2021.nodalida-main.7
DOI:
Bibkey:
Cite (ACL):
Steinþór Steingrímsson, Hrafn Loftsson, and Andy Way. 2021. CombAlign: a Tool for Obtaining High-Quality Word Alignments. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 64–73, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
CombAlign: a Tool for Obtaining High-Quality Word Alignments (Steingrímsson et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.7.pdf
Code
 steinst/alignman