DynamicPower at SemEval-2016 Task 8: Processing syntactic parse trees with a Dynamic Semantics core

This is a system description paper for a sub-mission to Task 8 of SemEval-2016: Meaning Representation Parsing. No use was made of the training data provided by the task. Instead existing components were combined to form a pipeline able to take raw sentences as input and output meaning representations. Components are a part-of-speech tagger and parser trained on the Penn Parsed Corpus of Modern British English to produce syntactic parse trees, a semantic role labeller and a named entity recogniser to supplement obtained parse trees with word sense, functional and named entity information, followed by an adapted Tarskian satisfaction relation for a Dynamic Semantics that is used to transform a syntactic parse into a predicate logic based meaning representation, followed by conversion to pen-man/AMR notation required for the task appraisal.

1 Introduction This is a system description paper for a submission to Task 8 of SemEval-2016: Meaning Representation Parsing. Syntactic structures are first obtained by parsing raw language input, from which meaning representations are derived by printing off information accumulated with an adapted Tarskian satisfaction relation for a Dynamic Semantics (Dekker, 2012). This is akin to compositional approaches of formal semantics that view the task of reaching a semantic value as being rooted in first obtaining a syntactic parse. Key advantages are modularity and domain independence. This paper is structured as follows. Section 2 sketches the method used to obtain a syntactic parse. Section 3 covers reaching a semantic representation. Section 4 outlines conversion to penman/AMR notation. Section 5 reports experiment results. Section 6 is a conclusion. An appendix details how to run the available implementation.

Obtaining a syntactic parse
The approach first needs a way to obtain syntactic parse trees. Major components used were the Stanford Log-linear Part-Of-Speech Tagger 1 (Toutanova et al., 2003) and the Berkeley Parser 2 (Petrov and Klein, 2003), both trained on data from the years of 1840-1908 of the Penn Parsed Corpus of Modern British English 3 (Kroch et al., 2010). The particular setup followed suggestions of pre-processing and post-processing made by Kulick et al. (2014) and used tools, notably create_stripped, from the system of Fang, Butler and Yoshimoto (2014). 4 Dating from over one hundred years ago, training data for the syntactic parser was not chosen for suitability to a potential task domain, but instead for the practical benefit that parse results would conform to the scheme proposed in the Annotation manual for the Penn Historical Corpora and the Parsed Corpus of Early English Correspondence (PCEEC) (Santorini, 2010). This scheme is exceptionally consistent, especially with regards to facilitating identi-fication of construction types (small clause, comparative, cleft, etc.) and for its handling of coordination, offering the least obstacle for a robust conversion to the structures fed to the semantic component (seen in the next section).
As an example, consider: (1) Upon turning 80, Mao Zedong felt that he would die soon.
To arrive at a more complete parse for the task, word sense and functional information was obtained with mateplus 5 (Roth and Woodsend, 2014), which is an extended version of the mate-tools semantic role labeller (Björkelund et al., 2009). In addition, named entity information was gathered with the Stanford Named Entity Recognizer 6 (Finkel et al., 2005) using the MUC model that labels e.g., PER-SON, ORGANIZATION, and LOCATION. Furthermore, pronouns, e.g., he, she, they, are default 5 https://github.com/microth/mateplus 6 http://nlp.stanford.edu/software/CRF-NER.html marked PERSON. From (1) as input, the combination of these tools collects the following information: The first column gives word lemmas, the second contains functional information to identify syntactic subjects (SBJ) and objects (OBJ) as well as adjunct roles such as LOC, MSR and TMP, the third column provides word sense information related to PropBank (Bonial et al., 2010) semantic frames ('turn.02', 'feel.02', 'die.01'), and the fourth column provides entity information. This column information is integrated with the parse to return: The tree now includes extended phrase labels marking function (e.g., NP-SBJ=subject, NP-OB1=direct object, ADVP-TMP=temporal modifier). Terminal nodes are either word lemmas or, whenever available, PropBank word senses. Furthermore entity information is integrated with a BIND tag.

Obtaining semantic analysis
Having syntactic structures, the next step is to reach a level of semantic analysis. This is derived by printing off information accumulated with an adapted Tarskian satisfaction relation for a Dynamic Semantics (Dekker, 2012). Specifically, use is made of the Treebank Semantics implementation, 7 with syntactic structures converted into expressions of a formal language (Scope Control Theory or SCT) with a number of primitive operations (notably, among others: Namely to make available fresh bindings, T to create bound arguments, At to allocate semantic roles, Close to bring about quantificational closures, Rel to establish predicate or connective relations, and If to conditionalise how calculation of a semantic value proceeds based on assignment state). The full list of operations and details are given in Butler (2015). Operations access or possibly alter a sequence based information state (Vermeulen, 2000) that retains binding information by assigning (possibly empty) sequences of values to binding names. This can be demonstrated with Rel creating an "and" relation with four arguments, each of which are processed against a different assignment state determined by instances of Namely embedded in occurrences of Someone: Someone x smiles. He x/ * y laughs. Someone y sees him x/ * y . The end.
Pronouns are able to link to " * " bindings, which are accessible bindings that have reached the discourse context because of prior indefinites, while indefinites take bindings from ".e", which is a source for fresh bindings. This approach gives a handle on discourse, and more generally governs the interaction of quantification to capture the empirical results of accessibility from Discourse Representation Theory (Kamp and Reyle, 1993), as well as intra-sentential binding conditions (Butler, 2010).
With conversion to ex1, part of speech tags transform to operations (some (indefinite), nn (noun predicate), verb (predicate with an event argument), pro (pronoun), gensym (trigger to create a fresh binding), free (ensures no quantificational closure), etc.). Also, constructions can bring about the inclusion of operations (e.g., someFact with the participial clause, and THAT with the thatcomplement). Conversion also adds (i) information about local binding names (e.g., "ARG0" (logical subject role) and "ARG1" (logical object role)) and (ii) information about sources for fresh bindings from quantificational closures ("@e", ".e" and ".event"). Once built ex1 reduces to primitives of the SCT language, the start of which is as follows: Sct.Close ("∃",["@e", ".e", ".event"], Sct.Clean (0, ["ARG0"], " * ", Sct.Namely (Lang.C ("mao_zedong", "PERSON"),"@e", Sct.Lam ("@e", "ARG0", Such an expression is given to the adapted Tarskian satisfaction procedure which, instead of returning a semantic value (e.g., true or false with respect to a model), is used to produce a meaning representation by printing accumulated information, thus: This gives a Davidsonian representation (Davidson, 1967) in which verbs are encoded with minimally an event argument. All bindings are existentially quantified over at the highest level, which is a convenience for reaching penman/AMR notation and not an approach limitation. Also note the pronoun is resolved to the only accessible PERSON antecedent.
4 Conversion to penman/AMR notation Conversion to penman/AMR notation (Matthiessen and Bateman, 1991;Banarescu et al., 2015) involves transforming obtained semantic structures into trees with explicit argument role information. An argument of each predicate (e.g., '@EVENT' if present, or the sole argument of a one-place predicate) is made the parent of the predicate. Also, binding is made implicit with the removal of quantification levels. Thus, the running example becomes: Content is further re-packaged: a daughter D of an AND level is moved inside a sister S when the argument name at the root of D is contained as an argument within S. Movement is to only one location. In addition, flatter structures are arrived at by excising redundant linking information, e.g., @TIME;1 FACT :THAT and folding tree material around inverse roles (signalled by ending the role name with '-of'). The latter is seen with the modal (would), but also serves to compact long distance dependencies that arise with relative clauses, comparatives, clefts, etc. There is also expansion of name information and some reordering of role placement. The final step involves pretty printing the assembled tree into the penman/AMR format, as well as the removal of tense information and remapping role names, e.g., :THAT changes to :ARG1, :TMP to :time, :ON to :prep-on, :CARDINAL to :quant, :ATTRIBUTE to :mod, and :POS to :poss.

Experiments
The method of this paper is evaluated on the shared task evaluation data, which includes 1053 sentences. The smatch score on the evaluation data is 0.47. Table 1 also reports smatch score on the LDC2015E86 dataset, which includes 1371 test sentences. The scores are calculated with Smatch v2.0.2 (Cai and Knight, 2013), which evaluates the precision, recall, and F1 of the concepts and relations all together. The score for Task 8 is higher than the performance on the LDC2015E86 test data. Reasons for the difference include parser performance being better on the evaluate data, and there being fewer non-compositional aspects of representation in the evaluate data.

Conclusion
To sum up, this paper has described a modularised approach for building meaning representations, with a key role for an adapted Tarskian satisfaction relation for a Dynamic Semantics as the method to integrate and connect information sourced from a syntactic parser, semantic role labeller and named entity recogniser. Task performance was limited by not using the training data provided by the task, in particular: lacking information to allocate PropBank roles, neglecting wikification, and missing entity information to replicate non-compositional aspects of the Abstract Meaning Representation (AMR) specification (Banarescu et al., 2015). Nevertheless, this contribution indicates that AMRs are not far removed from what a compositional semantics can achieve, which is of interest for connecting to results from the formal semantics literature, such as gaining a treatment for quantification, as well as for relating to "Sembank" resources built with Discourse Representation Theory/Dynamic Semantics, such as the Groningen Meaning Bank 8 (Basile et al., 2012) and Treebank Semantics Corpus 9 (Butler and Yoshimoto, 2012).

Appendix: Implementation 10
Assuming text is some original (multi-)sentence segmented data, text.psd contains the output from a parser trained on the PPCMBE, text.mate is output from mateplus, and text.ner is output from the Stanford Named Entity Recognizer, the following pipeline creates fully parsed data as described in section 2. The following pipeline achieves the semantic analysis of section 3, as well as the conversion to penman/AMR notation of section 4. cat fullparse.psd | prepare_PPCMBE | segment.sh | parse_normalize -propbank -free -bind | see_sct -free -reset | run_sct -penman | penman_like_amr | pretty_penman which is very gratefully acknowledged. This research is supported by the Japan Society for the Promotion of Science (JSPS), Research Project Number: 15K02469.