DaCToR: A Data Collection Tool for the RELATER Project

Juan Hussain, Oussama Zenkri, Sebastian Stüker, Alex Waibel


Abstract
Collecting domain-specific data for under-resourced languages, e.g., dialects of languages, can be very expensive, potentially financially prohibitive and taking long time. Moreover, in the case of rarely written languages, the normalization of non-canonical transcription might be another time consuming but necessary task. In order to collect domain-specific data in such circumstances in a time and cost-efficient way, collecting read data of pre-prepared texts is often a viable option. In order to collect data in the domain of psychiatric diagnosis in Arabic dialects for the project RELATER, we have prepared the data collection tool DaCToR for collecting read texts by speakers in the respective countries and districts in which the dialects are spoken. In this paper we describe our tool, its purpose within the project RELATER and the dialects which we have started to collect with the tool.
Anthology ID:
2020.lrec-1.817
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6627–6632
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.817
DOI:
Bibkey:
Cite (ACL):
Juan Hussain, Oussama Zenkri, Sebastian Stüker, and Alex Waibel. 2020. DaCToR: A Data Collection Tool for the RELATER Project. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6627–6632, Marseille, France. European Language Resources Association.
Cite (Informal):
DaCToR: A Data Collection Tool for the RELATER Project (Hussain et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.817.pdf