Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages

Juho Leinonen, Sami Virpioja, Mikko Kurimo


Abstract
Forced alignment is an effective process to speed up linguistic research. However, most forced aligners are language-dependent, and under-resourced languages rarely have enough resources to train an acoustic model for an aligner. We present a new Finnish grapheme-based forced aligner and demonstrate its performance by aligning multiple Uralic languages and English as an unrelated language. We show that even a simple non-expert created grapheme-to-phoneme mapping can result in useful word alignments.
Anthology ID:
2021.nodalida-main.36
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
345–350
Language:
URL:
https://aclanthology.org/2021.nodalida-main.36
DOI:
Bibkey:
Cite (ACL):
Juho Leinonen, Sami Virpioja, and Mikko Kurimo. 2021. Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 345–350, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages (Leinonen et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.36.pdf
Code
 aalto-speech/finnish-forced-alignment
Data
LibriSpeech