Unsupervised Parsing via Constituency Tests

Steven Cao, Nikita Kitaev, Dan Klein


Abstract
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using an unsupervised neural acceptability model to make grammaticality decisions. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. While this approach already achieves performance in the range of current methods, we further improve accuracy by fine-tuning the grammaticality model through a refinement procedure, where we alternate between improving the estimated trees and improving the grammaticality model. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previously best published result.
Anthology ID:
2020.emnlp-main.389
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4798–4808
Language:
URL:
https://aclanthology.org/2020.emnlp-main.389
DOI:
10.18653/v1/2020.emnlp-main.389
Bibkey:
Cite (ACL):
Steven Cao, Nikita Kitaev, and Dan Klein. 2020. Unsupervised Parsing via Constituency Tests. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4798–4808, Online. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Parsing via Constituency Tests (Cao et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.389.pdf
Video:
 https://slideslive.com/38938920
Data
CoLAPenn Treebank