Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information from sentences with canonical clause structure. In this paper, we present two methods for parsing to a Universal Dependencies graph representation that explicitly encodes the elided material with additional nodes and edges. We find that both methods can reconstruct elided material from dependency trees with high accuracy when the parser correctly predicts the existence of a gap. We further demonstrate that one of our methods can be applied to other languages based on a case study on Swedish.


Introduction
Sentences with gapping (Ross, 1970) such as Paul likes coffee and Mary tea are characterized by having one or more conjuncts that contain multiple arguments or modifiers of an elided predicate. In this example, the predicate likes is elided for the relation Mary likes tea. While these sentences appear relatively infrequently in most written texts, they are often used to convey a lot of factual information that is highly relevant for language understanding (NLU) tasks such as open information extraction and semantic parsing. For example, consider the following sentence from the WSJ portion of the Penn Treebank (Marcus et al., 1993 Figure 1: Overview of our two approaches. Both methods first parse a sentence with gapping to one of two different dependency tree representations and then reconstruct the elided predicate from this tree. To extract the information about unemployment rates in the various countries, an NLU system has to identify that the percentages indicate unemployment rates and the locational modifiers indicate the corresponding country. Given only this sentence, or this sentence and a strict surface syntax representation that does not indicate elided predicates, this is a challenging task. However, given a dependency graph that reconstructs the elided predicate for each conjunct, the problem becomes much easier and methods developed to extract information from dependency trees of clauses with canonical structures are much more likely to extract the correct information from a gapped clause. While gapping constructions receive a lot of attention in the theoretical syntax literature (e.g., Ross 1970;Jackendoff 1971;Steedman 1990;Coppock 2001;Osborne 2006;Johnson 2014;Toosarvandani 2016;Kubota and Levine 2016), they have been almost entirely neglected by the NLP community so far. The Penn Treebank explicitly annotates gapping constructions, by coindexing arguments in the clause with a predicate and the clause with the gap, but these co-indices are not included in the standard parsing metrics and almost all parsers ignore them. 1 Despite the sophisticated analysis of gapping within CCG (Steedman, 1990), sentences with gapping were deemed too difficult to represent within the CCG-Bank (Hockenmaier and Steedman, 2007). Similarly the treebanks for the Semantic Dependencies Shared Task (Oepen et al., 2015) exclude all sentences from the Wall Street Journal that contain gapping. Finally, while the tectogrammatical layer of the Prague Dependency Treebank (Bejček et al., 2013) as well as the enhanced Universal Dependencies (UD) representation (Nivre et al., 2016) provide an analysis with reconstructed nodes for gapping constructions, there exist no methods to automatically parse to these representations.
Here, we provide the first careful analysis of parsing of gapping constructions, and we present two methods for reconstructing elided predicates in sentences with gapping within the UD framework. As illustrated in Figure 1, we first parse to a dependency tree and then reconstruct the elided material. The methods differ in how much information is encoded in the dependency tree. The first method adapts an existing procedure for parsing sentences with elided function words (Seeker et al., 2012), which uses composite labels that can be deterministically turned into dependency graphs in most cases. The second method is a novel procedure that relies on the parser only to identify a gap, and then employs an unsupervised method to reconstruct the elided predicates and reattach the arguments to the reconstructed predicate. We find that both methods can reconstruct elided predicates with very high accuracy from gold standard dependency trees. When applied to the output of a parser, which often fails to identify gapping, our methods achieve a sentence-level accuracy of 32% and 34%, significantly outperforming the recently proposed constituent parser by Kummerfeld and Klein (2017).

Gapping constructions
Gapping constructions in English come in many forms that can be broadly classified as follows.
(2) Single predicate gaps: John bought books, and Mary flowers.
(3) Contiguous predicate-argument gap (including ACCs): Eve gave flowers to Al and Sue to Paul. Eve gave a CD to Al and roses to Sue.
(4) Non-contiguous predicate-argument gap: Arizona elected Goldwater Senator, and Pennsylvania Schwelker . (Jackendoff, 1971) (5) Verb cluster gap: I want to try to begin to write a novel and ... Mary a play. ... Mary to write a play. ... Mary to begin to write a play. ... Mary to try to begin to write a play. (Ross, 1970) The defining characteristic of gapping constructions is that there is a clause that lacks a predicate (the gap) but still contains two or more arguments or modifiers of the elided predicate (the remnants or orphans). In most cases, the remnants have a corresponding argument or modifier (the correspondent) in the clause with the overt predicate. These types of gapping also make up the majority of attested constructions in other languages. However, Wyngaerd (2007) notes that Dutch permits gaps in relative clauses, and Farudi (2013) notes that Farsi permits gaps in finite embedded clauses even if the overt predicate is not embedded. 2

Target representation
We work within the UD framework, which aims to provide cross-linguistically consistent dependency annotations that are useful for NLP tasks. UD defines two types of representation: the basic UD representation which is a strict surface syntax dependency tree and the enhanced UD representation (Schuster and Manning, 2016) which may be a graph instead of a tree and may contain additional nodes. The analysis of gapping in the enhanced representation makes use of copy nodes for elided predicates and additional edges for elided arguments, which we both try to automatically reconstruct in this paper. In the simple case in which only one predicate was elided, there is exactly one copy node for the elided predicate, which leads to a structure that is identical to the structure of the same sentence without a gap. 3 John bought books and Mary bought flowers nsubj obj cc nsubj conj obj If a clause contains a more complex gap, the enhanced representation contains copies for all content words that are required to attach the remnants. The motivation behind this analysis is that the semantically empty markers to are not needed for interpreting the sentence and minimizing the number of copy nodes leads to less complex graphs.
Finally, if a core argument was elided along with the predicate, we introduce additional dependencies between the copy nodes and the shared arguments, as for example, the open clausal complement (xcomp) dependency between the copy node and Senator in the following example. The rationale for not copying all arguments is again to keep the graph simple, while still encoding all relations between content words. Arguments can be arbitrarily complex and it seems misguided to copy entire subtrees of arguments which, e.g., could contain multiple adverbial clauses. Note that linking to existing nodes would not work in the case of verb clusters because they do not satisfy the subtree constraint.

Composite relations
Our first method adapts one of the procedures by Seeker et al. (2012), which represents gaps in dependency trees by attaching dependents of an elided predicate with composite relations. These relations represent the dependency path that would have existed if nothing had been elided. For example, in the following sentence, the verb bought, which would have been attached to the head of the first conjunct with a conj relation, was elided from the second conjunct and hence all nodes that would have depended on the elided verb, are attached to the first conjunct using a composite relation consisting of conj and the type of argument.
John bought books and Mary flowers nsubj obj conj>cc conj>nsubj conj>obj The major advantage of this approach is that the dependency tree contains information about the types of arguments and so it should be straightforward to turn dependency trees of this form into enhanced UD graphs. For most dependency trees, one can obtain the enhanced UD graph by splitting the composite relations into its atomic parts and inserting copy nodes at the splitting points. 4 At the same time, this approach comes with the drawback of drastically increasing the label space. For sentences with more complex gaps as in (5), one has to use composite relations that consist of more than two atomic relations and theoretically, the number of composite relations is unbounded: ... and Mary a play det conj>xcomp>xcomp>xcomp>obj conj>nsubj conj>cc

Orphan procedure
Our second method also uses a two-step approach to resolve gaps, but compared to the previous method, it puts less work on the parser. We first parse sentences to the basic UD v2 representation, which analyzes gapping constructions as follows. One remnant is promoted to be the head of the clause and all other remnants are attached to the promoted phrase. For example, in this sentence, the subject of the second clause, Mary, is the head of the clause and the other remnant, flowers, is attached to Mary with the special orphan relation: This analysis can also be used for more complex gaps, as in the example with a gap that consists of a chain of non-finite embedded verbs in (5).
... and Mary a play cc conj det orphan When parsing to this representation, the parser only has to identify that there is a gap but does not have to recover the elided material or determine the type of remnants. As a second step, we use an unsupervised procedure to determine which nodes to copy and how and where to attach the remnants. In developing this procedure, we made use of the fact that in the vast majority of cases, all arguments and modifiers that are expressed in gapped conjunct are also expressed in the full conjunct. The problem of determining which nodes to copy and which relations to use can thus be reduced to the problem of aligning arguments in the gapped conjunct to arguments in the full conjunct. We apply the following procedure to all sentences that contain at least one orphan relation.
1. Create a list F of arguments of the head of the full conjunct by considering all core argument dependents of the conjunct's head as well as clausal and nominal non-core dependents, and adverbial modifiers.
2. Create a list G of arguments in the gapped conjunct that contains the head of the gapped conjunct and all its orphan dependents.
3. Find the highest-scoring monotonic alignment of arguments in G to arguments in F . 4. Copy the head of the full conjunct and attach the copy node c to the head of the full conjunct with the original relation of the head of the gapped conjunct (usually conj).
5. For each argument g ∈ G that has been aligned to f ∈ F , attach g to c with the same relation as the parent relation of f , e.g., if f is attached to the head of the full conjunct with an nsubj relation, also attach g to c with an nsubj relation. Attach arguments g ∈ G that were not aligned to any token in F to c using the general dep relation.
6. For each copy node c, add dependencies to all core arguments of the original node which do not have a corresponding remnant in the gapped clause. For example, if the full conjunct contains a subject, an object, and an oblique modifier but the clause with the gap, only a subject and an oblique modifier, add an object dependency between the copy node and the object in the full conjunct.
A crucial step is the third step, determining the highest-scoring alignment. This can be done straightforwardly with the sequence alignment algorithm by Needleman and Wunsch (1970) if one defines a similarity function sim(g, f ) that returns a similarity score between the arguments g and f . We defined sim based on the intuitions that often, parallel arguments are of the same syntactic category, that they are introduced by the same function words (e.g., the same preposition), and that they are closely related in meaning. The first intuition can be captured by penalizing mismatching POS tags, and the other two by computing the distance between argument embeddings. We compute these embeddings by averaging over the 100dim. pretrained GloVe (Pennington et al., 2014) embeddings for each token in the argument. Given the POS tags t g and t f and the argument embeddings v g and v f , sim is defined as follows. 5 We set pos_mismatch_penalty, a parameter that penalizes mismatching POS tags, to −2. 6 This procedure can be used for almost all sentences with gapping constructions. However, if parts of an argument were elided along with the main predicate, it can become necessary to copy multiple nodes. We therefore consider the alignment not only between complete arguments in the full clause and the gapped clause but also between partial arguments in the full clause and the complete arguments in the gapped clause. For example, for the sentence "Mary wants to write a play and Sue a book" the complete arguments of the full clause are {Mary, to write a play} and the arguments of the gapped clause are {Sue, a book}. In this case, we also consider the partial arguments {Mary, a play} and if the arguments of the gapped  conjunct align better to the partial arguments, we use this alignment. However, now that the token write is part of the dependency path between want and play, we also have to make a copy of write to reconstruct the UD graph of the gapped clause.

Experiments
Both methods rely on a dependency parser followed by a post-processing step. We evaluated the individual steps and the end-to-end performance.

Data
We used the UD English Web Treebank v2.1 (henceforth EWT; Silveira et al., 2014;Nivre et al., 2017) for training and evaluating parsers. As the treebank is relatively small and therefore only contains very few sentences with gapping, we also extracted gapping constructions from the WSJ and Brown portions of the PTB (Marcus et al., 1993) and the GENIA corpus (Ohta et al., 2002). Further, we copied sentences from the Wikipedia page on gapping 7 and from published papers on gapping. The sentences in the EWT already contain annotations with the orphan relation and copy nodes for the enhanced representation, and we manually added both of these annotations for the remaining examples. The composite relations can be automatically obtained from the enhanced representation by removing the copy nodes and concatenating the dependency labels, which we did to build the training and test corpus for the composite relation procedure. Table 1 shows properties of the data splits of the original treebank, the additional sentences with gapping, and their combination; Table 2 shows the number of sentences in our corpus for each of the gap types.

Parsing experiments
Parser We used the parser by Dozat and Manning (2017) for parsing to the two different intermediate dependency representations. This parser is a graph-based parser (McDonald et al., 2005) that uses a biLSTM to compute token representations and then uses a multi-layer perceptron with biaffine attention to compute arc and label scores.
Setup We trained the parser on the COMBINED training corpus with gold tokenization, and predicted fine-grained and universal part-of-speech tags, for which we used the tagger by . We trained the tagger on the COMBINED training corpus. As pre-trained embeddings, we used the word2vec (Mikolov et al., 2013) embeddings that were provided for the CoNLL 2017 Shared Task , and we used the same hyperparameters as .
Evaluation We evaluated the parseability of the two dependency representations using labeled and unlabeled attachment scores (LAS and UAS). Further, to specifically evaluate how well parsers are able to parse gapping constructions according to the two annotation schemes, we also computed the LAS and UAS just for the head tokens of remnants (LAS g and UAS g ). For all our metrics, we excluded punctuation tokens. To determine sta-   Table 4: Labeled (LAS g ) and unlabeled attachment score (UAS g ) of head tokens of remnants for parsers trained and evaluated on the UD representation (ORPHAN) and the composite relations representation (COMPOSITE) on the development and test sets of the COMBINED treebank. Results that differ significantly are marked with * (p < 0.05) or *** (p < 0.001).
tistical significance of pairwise comparisons, we performed two-tailed approximate randomization tests (Noreen, 1989;Yeh, 2000) with an adapted version of the sigf package (Padó, 2006). Table 3 shows the overall parsing results on the development and test sets of the two treebanks. There was no significant difference between the parser that was trained on the UD representation (ORPHAN) and the parser trained on the composite representation (COMPOSITE) when tested on the EWT data sets, which is not surprising considering that there is just one sentence with gapping each in the development and the test split. When evaluated on the GAPPING datasets, the OR-PHAN parser performs significantly better (p < 0.01) in terms of labeled attachment score, which suggests that the parser trained on the COMPOS-ITE representation is indeed struggling with the greatly increased label space. This is further confirmed by the attachment scores of the head tokens of remnants (Table 4). The labeled attachment score of remnants is significantly higher for the ORPHAN parser than for the COMPOSITE parser. Further, the unlabeled attachment score on the test set is also higher for the ORPHAN parser, which suggests that the COMPOSITE parser is sometimes struggling with finding the right attachment for the multiple long-distance composite dependencies.

Recovery experiments
Our second set of experiments concerns the recovery of the elided material and the reattachment of the orphans. We conducted two experiments: an oracle experiment that used gold standard dependency trees and an end-to-end experiment that used the output of the parser as input. For all experiments, we used the COMBINED treebank.
Evaluation Here, we evaluated dependency graphs and therefore used the labeled and unlabeled precision and recall metrics. However, as our two procedures are only changing the attachment of orphans, we only computed these metrics for copy nodes and their dependents. Further, we excluded punctuation and coordinating conjunctions as their attachment is usually trivial and including them would inflate scores. Lastly, we computed the sentence-level accuracy for all sentences with gapping. For this metric, we considered a sentence to be correct if all copy nodes and their dependents of a sentence were attached to the correct head with the correct label.
Oracle results The top part of Table 5 shows the results for the oracle experiment. Both methods are able to reconstruct the elided material and the canonical clause structure from gold dependency trees with high accuracy. This was expected for the COMPOSITE procedure, which can make use of the composite relations in the dependency trees, but less so for the ORPHAN procedure which has to recover the structure and the types of relations. The two methods work equally well in terms of all metrics except for the sentence-level accuracy, which is significantly higher for the COMPOSITE procedure. This difference is caused by a difference in the types of mistakes. All errors of the COMPOSITE procedure are of a structural nature and stem from copying the wrong number of nodes while the dependency labels are always correct because they are part of the dependency tree. The majority of errors of the ORPHAN procedure stem from incorrect dependency labels, and these mistakes are scattered across more examples, which leads to the lower sentence-level accuracy.
End-to-end results The middle part of Table 5 shows the results for the end-to-end experiment. The performance of both methods is considerably lower than in the oracle experiment, which is pri-  Table 5: Labeled and unlabeled precision and recall as well as sentence-level accuracy of the two gapping reconstructions methods and the K&K parser on the development and test set of the COMBINED treebank. Results that differ significantly from the other result within the same section are marked with * (p < 0.05) or ** (p < 0.01).
marily driven by the much lower recall. Both methods assume that the parser detects the existence of a gap and if the parser fails to do so, neither method attempts to reconstruct the elided material. In general, precision tends to be a bit higher for the ORPHAN procedure whereas recall tends to be a bit higher for the COMPOSITE method but overall and in terms of sentence-level accuracy both methods seem to perform equally well.
Error analysis For both methods, the primary issue is low recall, which is a result of parsing errors. When the parser correctly predicts the orphan relation, the main sources of error for the ORPHAN procedure are missing correspondents for remnants (e.g., [for good] has no correspondent in They had left the company, many for good) or that the types of argument of the remnant and its correspondent differ (e.g., in She was convicted of selling unregistered securities in Florida and of unlawful phone calls in Ohio, [of selling unregistered securities] is an adverbial clause whereas [of unlawful phone calls] is an oblique modifier). Apart from the cases where the COMPOSITE procedure leads to an incorrect structure, the remaining errors are all caused by the parser predicting the wrong composite relation. Kummerfeld and Klein (henceforth K&K;2017) recently proposed a one-endpoint-crossing graph parser that is able to directly parse to PTB-style trees with traces. They also briefly discuss gapping constructions and their parser tries to output the co-indexing that is used for gapping constructions in the PTB. The EWT and all the sentences that we took from the WSJ, Brown, and GENIA treebanks already come with constituency tree annotations, and we manually annotated the remaining sentences according to the PTB guide-lines (Bies et al., 1995). This allowed us to train the K&K parser with exactly the same set of sentences that we used in our previous experiments. As this parser outputs constituency trees, we could not compute dependency graph metrics for this method. For the sentence-level accuracy, we considered an example to be correct if a) each argument in the gapped conjunct was the child of a single constituent node, which in return was the sibling of the full clause/verb phrase, and b) the coindexing of each argument in the gapped conjunct was correct. For example, the following bracketing would be considered correct despite the incorrect internal structure of the first conjunct:

Comparison to Kummerfeld and Klein
The last row of Table 5 shows the results of the K&K parser. The parser failed to output the correct constituency structure or co-indexing for every single example in the development and test sets. The parser struggled in particular with outputting the correct co-indices: For 32.5% of the test sentences with gapping, the bracketing of the gapped clause was correct but one or more of the co-indices were missing from the output.
Overall these results suggest that our dependency-based approach is much more reliable at identifying gapping constructions than the parser by K&K, which, in their defense, was optimized to output traces for other phenomena. Our method is also faster and took only seconds to parse the test set, while the K&K parser took several hours.

Resolving gaps in other languages
One of the appeals of the ORPHAN procedure is that it can be easily applied to other languages even if there exist no annotated enhanced dependency graphs. 8 On the one hand, this is because our method does not make use of lexical information, and on the other hand, this is because we developed our method on top of the UD annotation scheme, which has already been applied to many languages and for which many treebanks exist.
Currently, all treebanks but the English one lack copy nodes for gapping constructions and many of them incorrectly use the orphan relation  and therefore we could not evaluate our method on a large variety of languages. In order to demonstrate that our method can be applied to other languages, we therefore did a case study on the Swedish UD treebank. The Swedish UD treebank is an automatic conversion from a section of the Talbanken (Einarsson, 1976) with extensive manual corrections. While the treebank is overall of high quality, we noticed conversion errors that led to incorrect uses of the orphan relation in 11 of the 29 sentences with orphan relations, which we excluded from our evaluation. We applied our gapping resolution procedure without any modifications to the remaining 18 sentences. We used the Swedish word2vec embeddings that were prepared for the CoNLL 2017 Shared Task. Our method correctly predicts the insertion of 29 copy nodes and is able to predict the correct structure of the enhanced representation in all cases, including complex ones with elided verb clusters such as the example in Figure 2. It also predicts the correct dependency label for 108/110 relations, leading to a labeled precision and labeled recall of 98.18%, which are both higher than the English numbers despite the fact that we optimized our procedure for English. The main reason for the higher performance seems to be that many of the Swedish examples come from informational texts from public organizations, which are more likely to be written to be clear and unambiguous. Further, the Swedish data does not contain challenging examples from the linguistic literature.
As Swedish is a Germanic language like English and thus shares many structural properties, we cannot conclude that our method is applicable to any language based on just this experiment. However, given that our method does not rely on language-specific structural patterns, we expect it to work well for a wide range of languages. but given that UD treebanks are annotated with orphan relations, using the the COMPOSITE procedure would require additional manual annotations in practice.

Related work
Gapping constructions have been little studied in NLP, but several approaches (e.g., Dukes and Habash 2011;Simkó and Vincze 2017) parse to dependency trees with empty nodes. Seeker et al. (2012) compared three ways of parsing with empty heads: adding a transition that inserts empty nodes, using composite relation labels for nodes that depend on an elided node, and pre-inserting empties before parsing. These papers all focus on recovering nodes for elided function words such as auxiliaries; none of them attempt to recover and resolve the content word elisions of gapping. Ficler and Goldberg (2016) modified PTB annotations of argument-cluster coordinations (ACCs), i.e., gapping constructions with two post-verbal orphan phrases, which make up a subset of the gapping constructions in the PTB. While the modified annotation style leads to higher parsing accuracy of ACCs, it is specific to ACCs and does not generalize to other gapping constructions. Moreover, they did not reconstruct gapped ACC clauses. Traditional grammarbased chart parsers (Kay, 1980;Klein and Manning, 2001) did handle empty nodes and so could in principle provide a parse of gapping sentences though additional mechanisms would be needed for reconstruction. In practice, though, dealing with gapping in a grammar-based framework is not straightforward and can lead to a combinatorial explosion that slows down parsing in general, as has been noted for the English Resource Grammar (Flickinger, 2017, p.c.) and for an HPSG implementation for Norwegian (Haugereid, 2017). The grammar-based parser built with augmented transition networks (Woods, 1970) provided an extension in the form of the SYSCONJ operation (Woods, 1973) to parse some gapping constructions, but also this approach lacked explicit reconstruction mechanisms and provided only limited coverage.
There also exists a long line of work on postprocessing surface-syntax constituency trees to recover traces in the PTB (Johnson, 2002;Levy and Manning, 2004;Campbell, 2004;Gabbard et al., 2006), pre-processing sentences such that they contain tokens for traces before parsing (Dienes and Dubey, 2003b), or directly parsing sentences to either PTB-style trees with empty elements or pre-processed trees that can be deterministically converted to PTB-style trees (Collins,  The Ullna area is expected to grow by 9,000 (new workplaces), the Märsta industrial area by 7,000, Jordbro by 4,000, ...' Figure 2: Dependency graph for part of the sentence sv-ud-train-1102 as output by the ORPHAN procedure.
The system correctly predicts the copy nodes for the matrix and the embedded verb, and correctly attaches the arguments to the copy nodes.
1997; Dienes and Dubey, 2003a;Schmid, 2006;Cai et al., 2011;Hayashi and Nagata, 2016;Kato and Matsubara, 2016;Kummerfeld and Klein, 2017). However, all of these works are primarily concerned with recovering traces for phenomena such as Wh-movement or control and raising constructions and, with the exception of Kummerfeld and Klein (2017), none of these works attempt to output the co-indexing that is used for analyzing gapping constructions. And again, none of these works try to reconstruct elided material. Lastly, several methods have been proposed for resolving other forms of ellipsis, including VP ellipsis (Hardt, 1997;Nielsen, 2004;Lappin, 2005; McShane and Babkin, 2016) and sluicing (Anand and Hardt, 2016) but none of these methods consider gapping constructions.

Conclusion
We presented two methods to recover elided predicates in sentences with gapping. Our experiments suggest that both methods work equally well in a realistic end-to-end setting. While in general, recall is still low, the oracle experiments suggest that both methods can recover elided predicates from correct dependency trees, which suggests that as parsers become more and more accurate, the gap recovery accuracy should also increase.
We also demonstrated that our method can be used to automatically add the enhanced UD representation to UD treebanks in other languages than English. Apart from being useful in a parsing pipeline, we therefore also expect our method to be useful for building enhanced UD treebanks.

Reproducibility
All data, pre-trained models, system outputs as well as a package for running the enhancement procedure are available from https:// github.com/sebschu/naacl-gapping.