Enriching ODIN

Fei Xia, William Lewis, Michael Wayne Goodman, Joshua Crowgey, Emily M. Bender


Abstract
In this paper, we describe the expansion of the ODIN resource, a database containing many thousands of instances of Interlinear Glossed Text (IGT) for over a thousand languages harvested from scholarly linguistic papers posted to the Web. A database containing a large number of instances of IGT, which are effectively richly annotated and heuristically aligned bitexts, provides a unique resource for bootstrapping NLP tools for resource-poor languages. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we propose a new XML format for IGT, called Xigt. We call the updated release ODIN-II.
Anthology ID:
L14-1055
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3151–3157
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1072_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Fei Xia, William Lewis, Michael Wayne Goodman, Joshua Crowgey, and Emily M. Bender. 2014. Enriching ODIN. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3151–3157, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Enriching ODIN (Xia et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1072_Paper.pdf