Towards Large-Scale Data Mining for Data-Driven Analysis of Sign Languages

Boris Mocialov, Graham Turner, Helen Hastie


Abstract
Access to sign language data is far from adequate. We show that it is possible to collect the data from social networking services such as TikTok, Instagram, and YouTube by applying data filtering to enforce quality standards and by discovering patterns in the filtered data, making it easier to analyse and model. Using our data collection pipeline, we collect and examine the interpretation of songs in both the American Sign Language (ASL) and the Brazilian Sign Language (Libras). We explore their differences and similarities by looking at the co-dependence of the orientation and location phonological parameters.
Anthology ID:
2020.signlang-1.24
Volume:
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie A. Hochgesang, Jette Kristoffersen, Johanna Mesch
Venue:
SignLang
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
145–150
Language:
English
URL:
https://aclanthology.org/2020.signlang-1.24
DOI:
Bibkey:
Cite (ACL):
Boris Mocialov, Graham Turner, and Helen Hastie. 2020. Towards Large-Scale Data Mining for Data-Driven Analysis of Sign Languages. In Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, pages 145–150, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Towards Large-Scale Data Mining for Data-Driven Analysis of Sign Languages (Mocialov et al., SignLang 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.signlang-1.24.pdf