Graph Parsing (State of the Art)

From ACL Wiki
Revision as of 00:29, 30 May 2016 by MarcoKuhlmann (talk | contribs) (Paste paragraph from the article)
Jump to navigation Jump to search

Background and Motivation

Graphs exceeding the formal complexity of rooted trees are of growing relevance to much NLP research. We interpret the term graph parsing broadly as mapping from surface strings to graph-structured target representations, which typically provide some level of syntactico-semantic analysis. Although formally well-understood in graph theory, there is substantial variation in the types of linguistic graphs, as well as in the interpretation of various structural properties. To provide a common terminology and transparent statistics across different collections of graphs in NLP, we propose to establish a ‘catalogue’ of graph banks and associated parsing results.

We anticipate a bit of a cottage industry in linguistic graph banks and graph processing tasks over the next few years, which may make it difficult to keep track of contentful similarities and differences across frameworks and approaches. This page is intended to stimulate community work towards an up-to-date resource combining the following components: (a) formal definitions of (relevant) structural graph properties; (b) in-depth descriptions of how these apply to different graph banks; (c) constantly growing surveys of graph bank statistics; and (d) a continuously evolving record of state-of-the-art processing results. Of these, components (a) and (b) are provided by Kuhlmann & Oepen (2016; in press), while (c) and (d) are maintained below.

This page was initiated by Marco Kuhlmann and Stephan Oepen, and for the time being (mid-May 2016) is very much a work in progress. We intend to have a first complete draft available for community review by early June 2016.

Software: Graph Analysis Toolkit

AMR: Abstract Meaning Representation

CCD: Combinatory Categorial Grammar Dependencies

Hockenmaier and Steedman (2007) construct CCGbank from a combination of careful interpretation of the syntactic annotations in the PTB with additional, manually curated lexical and constructional knowledge. In CCGbank (LDC2005T13), the strings of the venerable PTB Wall Street Journal (WSJ) corpus are annotated with pairs of (a) CCG syntactic derivations and (b) sets of semantic bi-lexical dependency triples, which we term CCD. The latter “include most semantically relevant non-anaphoric local and long-range dependencies” and are suggested by the CCGbank creators as a proxy for predicate–argument structure. While CCD has mainly been used for contrastive parser evaluation (Clark and Curran [2007], Fowler and Penn [2010]; inter alios), there is current work that views each set of triples as a directed graph and parses directly into these target representations (Du, Sun, and Wan 2015).

EDS: Elementary Dependency Structures

SDP: Semantic Dependency Parsing