Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings

Renato Rocha Souza, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt


Abstract
In order to access indigenous, regional knowledge contained in language corpora, semantic tools and network methods are most typically employed. In this paper we present an approach for the identification of dialectal variations of words, or words that do not pertain to High German, on the example of non-standard language legacy collection questionnaires of the Bavarian Dialects in Austria (DBÖ). Based on selected cultural categories relevant to the wider project context, common words from each of these cultural categories and their lemmas using GermaLemma were identified. Through word embedding models the semantic vicinity of each word was explored, followed by the use of German Wordnet (Germanet) and the Hunspell tool. Whilst none of these tools have a comprehensive coverage of standard German words, they serve as an indication of dialects in specific semantic hierarchies. Methods and tools applied in this study may serve as an example for other similar projects dealing with non-standard or endangered language collections, aiming to access, analyze and ultimately preserve native regional language heritage.
Anthology ID:
2020.lrec-1.118
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
943–947
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.118
DOI:
Bibkey:
Cite (ACL):
Renato Rocha Souza, Amelie Dorn, Barbara Piringer, and Eveline Wandl-Vogt. 2020. Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 943–947, Marseille, France. European Language Resources Association.
Cite (Informal):
Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings (Rocha Souza et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.118.pdf