Is Word Segmentation Child’s Play in All Languages?

Georgia R. Loukatou, Steven Moran, Damian Blasi, Sabine Stoll, Alejandrina Cristia


Abstract
When learning language, infants need to break down the flow of input speech into minimal word-like units, a process best described as unsupervised bottom-up segmentation. Proposed strategies include several segmentation algorithms, but only cross-linguistically robust algorithms could be plausible candidates for human word learning, since infants have no initial knowledge of the ambient language. We report on the stability in performance of 11 conceptually diverse algorithms on a selection of 8 typologically distinct languages. The results consist evidence that some segmentation algorithms are cross-linguistically valid, thus could be considered as potential strategies employed by all infants.
Anthology ID:
P19-1383
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3931–3937
Language:
URL:
https://aclanthology.org/P19-1383
DOI:
10.18653/v1/P19-1383
Bibkey:
Cite (ACL):
Georgia R. Loukatou, Steven Moran, Damian Blasi, Sabine Stoll, and Alejandrina Cristia. 2019. Is Word Segmentation Child’s Play in All Languages?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3931–3937, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Is Word Segmentation Child’s Play in All Languages? (Loukatou et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1383.pdf