Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

Ted Briscoe, John Carroll


Abstract
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.
Anthology ID:
1995.iwpt-1.8
Volume:
Proceedings of the Fourth International Workshop on Parsing Technologies
Month:
September 20-24
Year:
1995
Address:
Prague and Karlovy Vary, Czech Republic
Editors:
Eva Hajicova, Bernard Lang, Robert Berwick, Harry Bunt, Bob Carpenter, Ken Church, Aravind Joshi, Ronald Kaplan, Martin Kay, Makoto Nagao, Anton Nijholt, Mark Steedman, Henry Thompson, Masaru Tomita, K. Vijay-Shanker, Yorick Wilks, Kent Wittenburg
Venues:
IWPT | WS
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–58
Language:
URL:
https://aclanthology.org/1995.iwpt-1.8
DOI:
Bibkey:
Cite (ACL):
Ted Briscoe and John Carroll. 1995. Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels. In Proceedings of the Fourth International Workshop on Parsing Technologies, pages 48–58, Prague and Karlovy Vary, Czech Republic. Association for Computational Linguistics.
Cite (Informal):
Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels (Briscoe & Carroll, IWPT-WS 1995)
Copy Citation:
PDF:
https://aclanthology.org/1995.iwpt-1.8.pdf