Tools for Collocation Extraction: Preferences for Active vs. Passive

Ulrich Heid, Marion Weller


Abstract
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision evaluation, on administrative texts from the European Union: we find a considerable amount of specialized collocations, as well as general ones and complex predicates; overall the precision is considerably higher than that of a statistical extractor used as a baseline.
Anthology ID:
L08-1076
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/323_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ulrich Heid and Marion Weller. 2008. Tools for Collocation Extraction: Preferences for Active vs. Passive. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Tools for Collocation Extraction: Preferences for Active vs. Passive (Heid & Weller, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/323_paper.pdf