Improved CCG Parsing with Semi-supervised Supertagging

Mike Lewis, Mark Steedman


Abstract
Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.
Anthology ID:
Q14-1026
Volume:
Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:
2014
Address:
Cambridge, MA
Editors:
Dekang Lin, Michael Collins, Lillian Lee
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
327–338
Language:
URL:
https://aclanthology.org/Q14-1026
DOI:
10.1162/tacl_a_00186
Bibkey:
Cite (ACL):
Mike Lewis and Mark Steedman. 2014. Improved CCG Parsing with Semi-supervised Supertagging. Transactions of the Association for Computational Linguistics, 2:327–338.
Cite (Informal):
Improved CCG Parsing with Semi-supervised Supertagging (Lewis & Steedman, TACL 2014)
Copy Citation:
PDF:
https://aclanthology.org/Q14-1026.pdf
Data
Penn Treebank