Latin script keyboards for South Asian languages with finite-state normalization

Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, Michael Riley


Abstract
The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script. We explore several compact finite-state architectures that permit variable spellings of words during mobile text entry. We find that approaches making use of transliteration transducers provide large accuracy improvements over baselines, but that simpler approaches involving a compact representation of many attested alternatives yields much of the accuracy gain. This is particularly important when operating under constraints on model size (e.g., on inexpensive mobile devices with limited storage and memory for keyboard models), and on speed of inference, since people typing on mobile keyboards expect no perceptual delay in keyboard responsiveness.
Anthology ID:
W19-3114
Volume:
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Month:
September
Year:
2019
Address:
Dresden, Germany
Editors:
Heiko Vogler, Andreas Maletti
Venue:
FSMNLP
SIG:
SIGFSM
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–117
Language:
URL:
https://aclanthology.org/W19-3114
DOI:
10.18653/v1/W19-3114
Bibkey:
Cite (ACL):
Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, and Michael Riley. 2019. Latin script keyboards for South Asian languages with finite-state normalization. In Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, pages 108–117, Dresden, Germany. Association for Computational Linguistics.
Cite (Informal):
Latin script keyboards for South Asian languages with finite-state normalization (Wolf-Sonkin et al., FSMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3114.pdf