From the attic to the cloud: mobilization of endangered language resources with linked data

Sebastian Nordhoff


Abstract
This paper describes a collection of 20k ELAN annotation files harvested from five different endangered language archives. The ELAN files form a very heterogeneous set, but the hierarchical configuration of their tiers allow, in conjunction with the tier content, to identify transcriptions, translations, and glosses. These transcriptions, translations, and glosses are queryable across archives. Small analyses of graphemes (transcription tier), grammatical and lexical glosses (gloss tier), and semantic concepts (translation tier) show the viability of the approach. The use of identifiers from OLAC, Wikidata and Glottolog allows for a better integration of the data from these archives into the Linguistic Linked Open Data Cloud.
Anthology ID:
2020.lr4sshoc-1.3
Volume:
Proceedings of the Workshop about Language Resources for the SSH Cloud
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Daan Broeder, Maria Eskevich, Monica Monachini
Venue:
LR4SSHOC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
10–18
Language:
English
URL:
https://aclanthology.org/2020.lr4sshoc-1.3
DOI:
Bibkey:
Cite (ACL):
Sebastian Nordhoff. 2020. From the attic to the cloud: mobilization of endangered language resources with linked data. In Proceedings of the Workshop about Language Resources for the SSH Cloud, pages 10–18, Marseille, France. European Language Resources Association.
Cite (Informal):
From the attic to the cloud: mobilization of endangered language resources with linked data (Nordhoff, LR4SSHOC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lr4sshoc-1.3.pdf