Broad-coverage CCG Semantic Parsing with AMR

We propose a grammar induction technique for AMR semantic parsing. While previous grammar induction techniques were designed to re-learn a new parser for each target application, the recently annotated AMR Bank provides a unique opportunity to induce a single model for understanding broad-coverage newswire text and support a wide range of applications. We present a new model that combines CCG parsing to recover compositional aspects of meaning and a factor graph to model non-compositional phenomena, such as anaphoric dependencies. Our approach achieves 66.2 Smatch F1 score on the AMR bank, signiﬁcantly outperforming the previous state of the art.


Introduction
Semantic parsers map sentences to formal representations of their meaning (Zelle and Mooney, 1996;Zettlemoyer and Collins, 2005;Liang et al., 2011). Existing learning algorithms have primarily focused on building actionable meaning representations which can, for example, directly query a database (Liang et al., 2011;Kwiatkowski et al., 2013) or instruct a robotic agent (Chen, 2012;Artzi and Zettlemoyer, 2013b). However, due to their end-to-end nature, such models must be relearned for each new target application and have only been used to parse restricted styles of text, such as questions and imperatives.
Recently, AMR (Banarescu et al., 2013) was proposed as a general-purpose meaning representation language for broad-coverage text, and work is ongoing to study its use for variety of applications such as machine translation (Jones et al., 2012) and summarization (Liu et al., 2015). The * Work done at the University of Washington.
AMR meaning bank provides a large new corpus that, for the first time, enables us to study the problem of grammar induction for broad-coverage semantic parsing. However, it also presents significant challenges for existing algorithms, including much longer sentences, more complex syntactic phenomena and increased use of noncompositional semantics, such as within-sentence coreference. In this paper, we introduce a new, scalable Combinatory Categorial Grammar (CCG; Steedman, 1996Steedman, , 2000 induction approach that solves these challenges with a learned joint model of both compositional and non-compositional semantics, and achieves state-of-the-art performance on AMR Bank parsing. We map sentences to AMR structures in a twostage process (Section 5). First, we use CCG to construct lambda-calculus representations of the compositional aspects of AMR. CCG is designed to capture a wide range of linguistic phenomena, such as coordination and long-distance dependencies, and has been used extensively for semantic parsing. To use CCG for AMR parsing we define a simple encoding for AMRs in lambda calculus, for example, as seen with the logical form z and AMR a in Figure 1 for the sentence Pyongyang officials denied their involvement. However, using CCG to construct such logical forms requires a new mechanism for non-compositional reasoning, for example to model the long-range anaphoric dependency introduced by their in Figure 1.
To represent such dependencies while maintaining a relatively compact grammar, we follow Steedman's (2011) use of generalized Skolem terms, a mechanism to allow global references in lambda calculus. We then allow the CCG derivation to mark when non-compositional reasoning is required with underspecified placeholders. For example, Figure 1 shows an underspecified logical form u that would be constructed by the grammar with the bolded placeholder ID indicating an un-resolved anaphoric reference. These placeholders are resolved by a factor graph model that is defined over the output logical form and models which part of it they refer to, for example to find the referent for a pronoun. Although primarily motivated by non-compositional reasoning, we also use this mechanism to underspecify certain relations during parsing, allowing for more effective search.
Following most work in semantic parsing, we consider two learning challenges: grammar induction, which assigns meaning representations to words and phrases, and parameter estimation, where we learn a model for combining these pieces to analyze full sentences. We introduce a new CCG grammar induction algorithm which incorporates ideas from previous algorithms (Zettlemoyer and Collins, 2005;Kwiatkowski et al., 2010) in a way that scales to the longer sentences and more varied syntactic constructions observed in newswire text. During lexical generation (Section 6.1), the algorithm first attempts to use a set of templates to hypothesize new lexical entries. It then attempts to combine bottom-up parsing with top-down recursive splitting to select the best entries and learn new templates for complex syntactic and semantic phenomena, which are re-used in later sentences to hypothesize new entries.
Finally, while previous algorithms (e.g., Zettlemoyer and Collins, 2005) have assumed the existence of a grammar that can parse nearly every sentence to update its parameters, this does not hold for AMR Bank. Due to sentence complexity and search errors, our model cannot produce fully correct logical forms for a significant portion of the training data. To learn from as much of the data as possible and accelerate learning, we adopt an early update strategy to generate effective updates from partially correct analyses (Section 6.2).

Technical Overview
Task Let X be the set of all possible sentences and A the set of all AMR structures. Given a sentence x ∈ X , we aim to generate an AMR a ∈ A.
We define a simple, deterministic and invertible conversion process between AMRs and lambdacalculus logical forms; roughly speaking, each AMR variable gets its own lambda term, which is scoped as low as possible, and each AMR role becomes a binary predicate applied to these variables. Figure 1 shows an example, and the full details are provided in the supplementary materials. Therefore, henceforth we discuss the task of mapping a sentence x ∈ X to a logical form z ∈ Z, where Z is the set of all logical forms. For example, in Figure 1, we would map the sentence x to the logical form z. We evaluate system performance using SMATCH .
Model Given a sentence x and lexicon Λ, we generate the set of possible derivations GEN(x, Λ) using a two-stage process (Section 5). First, we use a weighted CCG to map x to an underspecified logical form u (Section 5.1), a logical form with placeholder constants for unresolved elements. For example, in the underspecified logical form u in Figure 1, the constants REL-of, REL and ID are placeholders. We then resolve these placeholders by defining a factor graph to find their optimal mapping and generate the final logical form z. In the figure, REL-of is mapped to ARG0-of, REL to ARG2 and ID to 2.
Learning We assume access to a training set of N examples {(x i , z i ) : i = 1 . . . N }, each containing a sentence x i and a logical form z i . Our goal is to learn a CCG, which constitutes learning the lexicon and estimating the parameters of both the grammar and the factor graph. We define a learning procedure (Section 6) that alternates between expanding the lexicon and updating the parameters. Learning new lexical entries relies on a two-pass process that combines learning the meaning of words and new syntactic structures, and supports learning with and without alignment heuristics (e.g., from Flanigan et al., 2014).

Related Work
The problem of learning semantic parsers has received significant attention. Algorithms have been developed for learning from different forms of supervision, including logical forms (Wong and Mooney, 2007;Muresan, 2011), question-answer pairs (Clarke et al., 2010;Liang et al., 2011;Cai and Yates, 2013;Kwiatkowski et al., 2013), sentences paired with demonstrations (Goldwasser and Roth, 2011;Chen and Mooney, 2011), conversational logs (Artzi and Zettlemoyer, 2011), distant supervision Mitchell, 2012, 2015;Reddy et al., 2014) and without explicit semantic supervision (Poon, 2013). Although we are first to consider using CCG to build AMR representations, our work is closely related to existing methods for CCG semantic parsing. Previous CCG induction techniques have either used hand-engineered lexical templates (e.g., Zettlemoyer and Collins, 2005) or learned templates from the data directly (e.g., Kwiatkowski et al., 2010Kwiatkowski et al., , 2012. Our two-pass reasoning for lexical generation combines ideas from both methods in a way that greatly improves scalability to long, newswire-style sentences. CCG has also been used for broad-coverage recovery of firstorder logic representations (Bos, 2008;Lewis and Steedman, 2013). However, this work lacked corpora to evaluate the logical forms recovered.
AMR (Banarescu et al., 2013) is a generalpurpose meaning representation and has been used in a number of applications (Pan et al., 2015;Liu et al., 2015). There is also work on recovering Happy people dance A(λc.content-01(c))))) Figure 2: Example CCG tree with three lexical entries, two forward applications (>) and type-shifting of a plural noun to a noun phrase.

Background
Combinatory Categorial Grammar CCG is a categorial formalism that provides a transparent interface between syntax and semantics (Steedman, 1996(Steedman, , 2000. Section 7 details our instantiation of CCG. In CCG trees, each node is a category. Figure 2 shows a simple CCG tree.
is a category for an intransitive verb phrase. The syntactic type S\N P [pl] indicates that an argument of type N P [pl] is expected and the returned syntactic type will be S. The backward slash \ indicates the argument is expected on the left, while a forward slash / indicates it is expected on the right. The syntactic attribute pl specifies that the argument must be plural. Attribute variables enforce agreement between syntactic attributes. For example, as in Figure 2, adjectives are assigned the syntax , where x is used to indicate that the attribute of the argument will determine the attribute of the returned category. The simply-typed lambda calculus logical form in the category represents its semantic meaning. The typing system includes basic types (e.g., entity e, truth value t) and functional types (e.g., e, t is the type of a function from e to t). In the example category, λx.λd.dance-01(d) ∧ ARG0(d, x) is a e, e, t -typed function expecting an ARG0 argument, and the conjunction specifies the roles of the dance-01 frame. A CCG is defined by a lexicon and a set of combinators. The lexicon pairs words and phrases with their categorial meaning. For example, dance λx.λd.dance-01(d) ∧ ARG0(d, x) pairs dance with the category above. We adopt a factored representation of the lexicon (Kwiatkowski et al., 2011), where entries are dynamically generated by combining lexemes and templates. For example, the above lexical entry can be generated by pairing the lexeme dance, {dance-01} with the template λv 1 .[S\N P : λx.λa.v 1 (a) ∧ ARG0(a, x)].
Skolem Terms and IDs Generalized Skolem terms (henceforth, Skolem terms) for CCG were introduced by Steedman (2011) to capture complex dependencies with relatively local quantification. We define here a simplified version of the theory to represent entities and allow distant references. Let A be a e, t , e -typed predicate. Given a e, t -typed logical expression C, the logical form A n (C) is a Skolem term with the Skolem ID n. For example, A 2 (λy.boy(y)) is a Skolem term that could represent the noun phrase the boy, which introduces a new entity. Skolem IDs are globally scoped, i.e., they can be referred from anywhere in the logical form without scoping constraints. To refer to Skolem terms, we define the id, e -typed predicate R. For example, the sentence the boy loves himself may be represented with A 1 (λx.love-01(x) ∧ ARG0(x, A 2 (λy.boy(y))) ∧ ARG1(x, R(2))), where R(2) references A 2 (λy.boy(y)).

Mapping Sentences to Logical Form
Given a sentence x and lexicon Λ, the function GEN(x, Λ) defines the set of possible derivations. Each derivation d is a tuple y, M , where y is a CCG parse tree and M is a mapping of constants from u, the underspecified logical form at the root of y, to their fully specified form.

Underspecified Logical Forms
An underspecified logical form represents multiple logical forms via a mapping function that maps its constants to sets of constants and Skolem IDs. For example, consider the underspecified logical form u at the top of Figure 3b. If, for example, REL can be mapped to manner or ARG2, then the sub-expression REL(h, A 6 (λo.official(o))) represents manner(h, A 6 (λo.official(o))) or ARG2(h, A 6 (λo.official(o))). During learning, we assume access to fully specified logical forms, which we convert to underspecified form as needed. In practice, all binary relations, except ARG0 and ARG1, and all Skolem ID references are underspecified.
Formally, let C be the set of all constants and I(u) the set of all Skolem IDs in the logical form u. Let S u : C → 2 C∪I(u) be a specification func-tion, such that its inverse is deterministic. We call a constant c a placeholder if |S u (c)| > 1. Given an underspecified logical form u, applying S u to all constants u contains, generates a set of fully specified logical forms.

Derivations
The first part of a derivation d = y, M is a CCG parse tree y with an underspecified logical form u at its root. For example, Figure 3a shows such a CCG parse tree, where the logical form contains the placeholders REL, REL-of and ID.
The second part of the derivation is a function M : CONSTS(u) → C ∪ I(u), where CONSTS(u) is the set of all occurrences of constants in u. For example, in Figure 3b, CONSTS(u) contains, among others, three different occurrences of ARG1 and one of ID, and M maps REL to ARG2, REL-of to ARG0-of and ID to the Skolem ID 2. The set of potential assignments for each occurrence of constant c is S u (c), and M, which returns a single element for each constant, is a disambiguation of S u . Applying M to all constants in u results in the final logical form z.
Decomposing the derivation provides two advantages. First, we are able to defer decisions from the CCG parse to the factor graph, thereby considering fewer hypotheses during parsing and simplifying the computation. Second, we can represent distant references while avoiding the complex parse trees that would have been required to represent these dependencies with scoped variables instead of Skolem IDs. 2

Model
Given a sentence x, we use a weighted log-linear CCG (Lafferty et al., 2001;Clark and Curran, 2007) to rank the space of possible parses under the grammar Λ. At the root of each CCG derivation is the underspecified logical form u.
where V = CONSTS(u) is the set of variables, F is the set of factors and E is the set of edges. Each edge is of the form (v, f ) where v ∈ V and f ∈ F . Figure 4 shows the factor graph used in generating the derivation in Figure 3, including all the variables and a subset of the factors. For each variable v c ∈ V such that c ∈ CONSTS(u) the set of possible assignments is determined by S u (c).
To generate the factors F and edges E we use the function Φ(V ) that maps a set of variables V ⊆ V to a factor f and a set of edges, each one of the form (v, f ), where v ∈ V . Factors express various features (Section 7), such as selectional preferences and control structures. In the figure, Factor A captures the selectional preference for the assignment of the relation REL between have-org-role-91 and official. Factor B captures a similar preference, this time to resolve REL-of. Factor C2 captures a selectional preference triplet involve-01/ARG1/person that will be created if ID is resolved to the Skolem ID 2. Finally, C3 captures a similar preference for resolving ID to 3. Since the assignment of many of the variables is fixed, i.e., they are fully specified constants, in practice our factor graph representation simply conditions on them.
Derivations are scored using a log-linear model that includes both CCG parse features and those defined by the factor graph. Let D(z) be the subset of derivations with the final logical form z and θ ∈ R l be a l-dimensional parameter vector. We define the probability of the logical form z as where φ(x, d) ∈ R l is a feature vector (Section 7).

Inference
To compute the set of derivations GEN(x, Λ) we define a two-stage process. We first run the CCG parser to generate underspecified logical forms. Following previous work (Zettlemoyer and Collins, 2005), we use CKY parsing to enumerate the top-K underspecified logical forms. 3 During the CKY chart construction, we ignore Skolem IDs when comparing categories. This allows us to properly combine partial derivations and to fully benefit from the dynamic programming. We dynamically generate lexical entries for numbers and dates using regular expression patterns and for named-entities using a recognizer. For every underspecified logical form u, we construct a factor graph and use beam search to find the top-L configurations of the graph. 4 During learning, we use the function GENMAX(x, z, θ, Λ) to get all derivations that map the sentence x to the logical form z, given parameters θ and lexicon Λ. To compute GENMAX, we follow Zettlemoyer and Collins (2005) and collect constant co-occurrence counts from z to prune from the CKY chart any category that cannot participate in a derivation leading to z. Since only constant names are changed during the second stage, setting the factor graph to get z is trivial: if the underspecified logical form is identical to z except the placeholders, we replace the placeholders with the correct final assignment, otherwise the derivation cannot result in z.

Learning
Learning the two-stage model requires inducing the entries of the CCG lexicon Λ and estimating the parameters θ, which score both stages of the derivation. We assume access to a training set each containing a sentence x i and a logical form z i . This data does not include information about the lexical entries and CCG parsing operations required to construct the correct derivations. We consider all these decisions as latent.
The main learning algorithm (Algorithm 1) starts by initializing the lexicon (line 1) and then COMPUTEGRAD(x, z, θ, Λ) computes the gradient for sentence x and logical form z, given the parameters θ and lexicon Λ, and it described in Section 6.2. ADAGRAD(∆) applies a per-feature learning rate to the gradient ∆ (Duchi et al., 2011). Output: Lexicon Λ and model parameters θ.
1: » Augment lexicon with sample-specific entries. 2: Λ+ ← Λ ∪ GENLEX(x, z, Λ) 3: » Get max-scoring correct derivations. 4: D+ ← GENMAX(x, z, Λ+, θ) 5: if |D+| > 0 then 6: » Return entries from max-scoring derivations. 7: return d∈D + LEX(d) 8: else 9: » Top-down splitting to generate new entries. 10: return RECSPLIT(x, z, θ, Λ+) processes the data T times (line 2), each time alternating between batch expansion of the lexicon and a sequence of mini-batch parameter updates. An iteration starts with a batch pass to expand the lexicon. The subroutine GENENTRIES, described in Section 6.1 and Algorithm 2, is called to generate a set of new entries for each sample (line 5). Next, we update the parameters θ with minibatch updates. Given a mini-batch size of M , we use the procedure SUB(D, i, M ) to get the i-th segment of the data D of size M . We process this segment (line 10) to accumulate the mini-batch gradient ∆ by calling the procedure COMPUTEGRAD(x, z, θ, Λ) (line 12), which computes the gradient for x and z given θ and Λ, as described in Section 6.2. We use AdaGrad (Duchi et al., 2011) parameter updates (line 13).
Each iteration concludes with removing all lexical entries not used in max-scoring correct derivations, to correct for overgeneration (lines 14-17).

Lexicon Expansion: GENENTRIES
Given a sentence x, a logical form z, parameters θ and a lexicon Λ, GENENTRIES(x, z, θ, Λ) (Algorithm 2) computes a set of lexical entries, such that there exists at least one derivation d using these entries from x to z. We first use GENLEX(x, z, Λ) to generate a large set of potential lexical entries from u, the underspecified form of z, by generating lexemes (Section 4) and pairing them with all templates in Λ. We then use a two-pass process to select the entries to return. The set of generated lexemes is a union of: (a) the set G gen that includes all pairings of subsets of constants from z with spans in x up to length k gen and (b) the set that is constructed by matching named-entity constants 5 in z with their corresponding mentions in the text to create new lexemes with potentially any other constant (for lexemes with multiple constants). Λ is augmented with the generated set of lexical entries to create Λ + (line 2).
Following Artzi and Zettlemoyer (2013b), we constrain the set of derivations to include only those that use at most one lexeme from G gen . If generating new lexemes is sufficient to derive z from x, D + will contain these derivations and we return their lexical entries to be added to the lexicon Λ (lines 5-7). Otherwise, we proceed to do a second pass, where we try to generate new templates to parse the sentence.
Second Pass: RECSPLIT In this pass we try to generate max-scoring derivations in a top-down process. Starting from u, the underspecified form of z, we search for CCG parsing steps that will connect to existing partial derivations in the CKY chart to create a complete parse tree. Since the space of possible operations is extremely large, we use CCGBank (Hockenmaier and Steedman, 2007) categories to prune, as described below.
The second pass is executed by calling RECSPLIT(x, z, θ, Λ + ), which returns a set of lexical entries to add to the model (line 10). We recursively apply the splitting operation introduced by Kwiatkowski et al. (2010). Given a CCG category, splitting outputs all possible category pairs that could have originally generated it. For example, given the category S\N P λy.λd.deny-01(d) ∧ ARG0(d, y) ∧ ARG1(d, A 1 (λi.involve-01(i) ∧ ARG1(i, R(ID)))), one of the possible splits will include the categories S\N P/N P λx.λy.λd.deny-01(d) ∧ ARG0(d, y) ∧ ARG1(d, x) and N P A 1 (λi.involve-01(i) ∧ ARG1(i, R(ID))) which would combine with forward application (>). Kwiatkowski et al. (2010) present the full details. 6 The process starts from u, the underspecified form of z, and recursively applies the splitting operation while ensuring that: (1) there is at most one entry from G gen or one entry where both the template and lexemes are new in the derivation, (2) each parsing step must have at least one child that may be constructed from an existing partial derivation, and (3) for each new parsing step, the syntax of a newly generated child must match the syntax of a CCGBank category for the same span. To search the space of derivations we populate a CKY chart and do a top-down beam search, where in each step we split categories for smaller spans.

Gradient Computation: COMPUTEGRAD
Given a sentence x, its labeled logical form z, parameters θ and lexicon Λ, the procedure COMPUTEGRAD(x, z, θ, Λ) computes the gradient for the sample (x, z).
Let D * (z) = GENMAX(x, z, θ, Λ), the set of max-scoring correct derivations. The hard gradient update is: where φ(x, d) ∈ R l is a l-dimensional feature vector (Section 5.3) and the positive portion of the gradient, rather than using expected features, averages over all max-scoring correct derivations.
Early updates To generate an effective update when no correct derivation is observed, we follow Collins and Roark (2004) and do an early update if D * (z) is empty or if GEN(x, Λ), the set of derivations for x, does not contain a derivation with the correct final logical form z. Given the partial derivations, our gradient computation is identical to Equation 2. However, in contrast to Collins and Roark (2004) our data does not include gold derivations. Therefore, we attempt to identify max-scoring partial derivations that may lead to the correct derivation. We extract sub-expressions from u, 7 the underspecified form of z, and search the CKY chart for the top-scoring non-overlapping spans that contain categories with these logical forms. We use the partial derivations leading to these cells to compute the gradient.
The benefit of early updates is two-fold. First, as expected, it leads to higher quality updates that are focused on the errors the model makes. Second, given the complexity of the data, it allows us to have updates for many examples that would be otherwise ignored. In our experiments, we observe this behavior with nearly 40% of the training set.

Experimental Setup
Data, Tools and Metric For evaluation, we use AMR Bank release 1.0 (LDC2014T12). We use the proxy report portion, which includes newswire articles from the English Gigaword corpus, and follow the official split for training, development and evaluation (6603/826/823 sentences). We use EasyCCG (Lewis and Steedman, 2014) trained with the re-banked CCGBank (Hockenmaier and Steedman, 2007;Honnibal et al., 2010) to generate CCGBank categories, the Illinois Named Entity Tagger (Ratinov and Roth, 2009) for NER, Stanford CoreNLP (Manning et al., 2014) for tokenization and part-of-speech tagging and UW SPF (Artzi and Zettlemoyer, 2013a) to develop our system. We use SMATCH  to evaluate logical forms converted back to AMRs.
CCG We use three syntactic attributes: singular sg, mass nouns nb and plural pl. When factoring lexical entries, we avoid extracting binary relations and references, and leave them in the template. We use backward and forward binary combinators for application, composition and crossing composition. We allow non-crossing composition up to the third order. We also add rules to handle punctuation and unary rules for typeshifting non-adjectives in adjectival positions and verb phrases in adverbial positions. We allow shifting of bare plurals, mass nouns and named entities to noun phrases. To avoid spurious ambiguity during parsing, we use normal-form constraints (Hockenmaier and Bisk, 2010). We use five basic lambda calculus types: entity e, truth value t, identifier id, quoted text txt and integer i.
Features During CCG parsing, we use indicator features for unary type shifting, crossing composition, lexemes, templates and dynamically generated lexical entries. We also use indicators for co-occurrence of part-of-speech tags and syntactic attributes, repetitions in logical conjunctions and attachments in the logical form. In the factor graph, we use indicator features for control structures, parent-relation-child selectional preferences and for mapping a relation to its final form. See the supplementary material for a detailed description.

Initialization and Parameters
We created the seed lexicon from the training data by sampling and annotating 50 sentences with lexical entries, adding entries for pronouns and adding lexemes for all alignments generated by JAMR (Flanigan et al., 2014). We initialize features weights as follows: 10 for all lexeme feature for seed entries and entries generated by named-entity matching (Section 6.1), IBM Model 1 scores for all other lexemes (Kwiatkowski et al., 2011), -3 for unary type shifting and crossing composition features, 3 for features that pair singular and plural part-ofspeech tags with singular and plural attributes and 0 for all other features. We set the number of iterations T = 10 and select the best model based on development results. We set the max number of tokens for lexical generation k gen = 2, learning rate µ = 0.1, CCG parsing beam K = 50, factor graph beam L = 100, mini batch size M = 40 and use a beam of 100 for GENMAX.
Two-pass Inference During testing, we perform two passes of inference for every sentence. First, we run our inference procedure (Section 5.4). If no derivations are generated, we run inference again, allowing the parser to skip words at a fixed cost and use the entries for related words if a word is unknown. We find related words in the lexicon using case, plurality and inflection string transformations. Finally, if necessary, we heuristically transform the logical forms at the root of the CCG parse trees to valid AMR logical forms. We set the cost of logical form transformation and word skipping to 10 and the cost of using related entries to 5. Table 1 shows SMATCH test results. We compare our approach to the latest, fixed version of JAMR (Flanigan et al., 2014) available online, 8 the only system to report test results on the official LDC release. Our approach outperforms JAMR by 3 SMATCH F1 points, with a significant gain in recall. Given consensus inter-annotator agreement of 83 SMATCH F1 (Flanigan et al., 2014), this improvement reduces the gap between automated methods and human performance by 15%.

Results
Although not strictly comparable, Table 1 also includes results on the pre-release AMR Bank corpus, including the published JAMR results, their fixed results and the results of Wang et al. (2015). Table 2 shows SMATCH scores for the developments set, with ablations. The supplementary material includes example output derivations and qualitative comparison to JAMR outputs. We first remove underspecifying constants, which leaves the factor graph to resolve only references. While the expressivity of the model remains the same, more decisions are considered during parsing, modestly impacting performance.
We also study the different methods for lexical generation. Skipping the second recursive splitting pass in GENENTRIES creates an interesting tradeoff. As we are unable to learn templates without splitting, we induce a significantly smaller lexicon (500K vs. 1.6M entries). Although we are unable to recover many syntactic constructions, our search problem is in general much simpler. We therefore see a relatively mild drop in overall performance (1.1 F1). Removing G gen during lexical generation (Section 6.1) creates a more significant drop in performance (3.4 F1), demonstrating how considering all possible lexemes allows the system to recover entries that are not covered by heuristic alignments. We are also able for the first time to report AMR parsing results without any surface-form similarity heuristics, by removing both JAMR alignments and named-entity matching lexical generation (Section 6.1). The significant drop in performance (20 points F1) demonstrates the need for better alignment algorithm.
Finally, Figure 5 plots development SMATCH F1 with and without early updates. As expected, early updates increase the learning rate significantly and have a large impact on overall performance. Without early updates we are unable to 8 JAMR is available at http://tiny.cc/jamr.   learn from almost half of the data, and performance drops by nearly 15 points.

Conclusion
We described an approach for broad-coverage CCG induction for semantic parsing, including a joint representation of compositional and noncompositional semantics, a new grammar induction technique and an early update procedure. We used AMR as the target representation and present new state-of-the-art AMR parsing results.
While we focused on recovering noncompositional dependencies, other noncompositional phenomena remain to be studied. Although our technique is able to learn certain idioms as multi-word phrases, learning to recognize discontinuous idioms remains open. Similarly, resolving cross-sentence references, which are not annotated in AMR Bank, is important future work. Finally, we would like to reduce the dependency on surface-form heuristics, for example to better generalize to other languages.