Acquiring a Taxonomy from the German Wikipedia

Laura Kassner, Vivi Nastase, Michael Strube


Abstract
This paper presents the process of acquiring a large, domain independent, taxonomy from the German Wikipedia. We build upon a previously implemented platform that extracts a semantic network and taxonomy from the English version of the Wikipedia. We describe two accomplishments of our work: the semantic network for the German language in which isa links are identified and annotated, and an expansion of the platform for easy adaptation for a new language. We identify the platform’s strengths and shortcomings, which stem from the scarcity of free processing resources for languages other than English. We show that the taxonomy induction process is highly reliable - evaluated against the German version of WordNet, GermaNet, the resource obtained shows an accuracy of 83.34%.
Anthology ID:
L08-1267
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/544_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Laura Kassner, Vivi Nastase, and Michael Strube. 2008. Acquiring a Taxonomy from the German Wikipedia. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Acquiring a Taxonomy from the German Wikipedia (Kassner et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/544_paper.pdf