Active Learning for Building a Corpus of Questions for Parsing

Jordi Atserias, Giuseppe Attardi, Maria Simi, Hugo Zaragoza


Abstract
This paper describes how we built a dependency Treebank for questions. The questions for the Treebank were drawn from questions from the TREC 10 QA task and from Yahoo! Answers. Among the uses for the corpus is to train a dependency parser achieving good accuracy on parsing questions without hurting its overall accuracy. We also explore active learning techniques to determine the suitable size for a corpus of questions in order to achieve adequate accuracy while minimizing the annotation efforts.
Anthology ID:
L10-1447
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/656_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jordi Atserias, Giuseppe Attardi, Maria Simi, and Hugo Zaragoza. 2010. Active Learning for Building a Corpus of Questions for Parsing. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Active Learning for Building a Corpus of Questions for Parsing (Atserias et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/656_Paper.pdf