Etymological Wordnet: Tracing The History of Words

Gerard de Melo


Abstract
Research on the history of words has led to remarkable insights about language and also about the history of human civilization more generally. This paper presents the Etymological Wordnet, the first database that aims at making word origin information available as a large, machine-readable network of words in many languages. The information in this resource is obtained from Wiktionary. Extracting a network of etymological information from Wiktionary requires significant effort, as much of the etymological information is only given in prose. We rely on custom pattern matching techniques and mine a large network with over 500,000 word origin links as well as over 2 million derivational/compositional links.
Anthology ID:
L14-1063
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1148–1154
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1083_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Gerard de Melo. 2014. Etymological Wordnet: Tracing The History of Words. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1148–1154, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Etymological Wordnet: Tracing The History of Words (de Melo, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1083_Paper.pdf