Peking: Building Semantic Dependency Graphs with a Hybrid Parser

This paper is a description of our system for SemEval-2015 Task 18: Broad-Coverage Semantic Dependency Parsing. We implement a hybrid parser which beneﬁts from both transition-based and graph-based parsing approaches. In particular, the tree approximation method is explored to take advantage of well-studied tree parsing techniques. Evaluation on multilingual data sets demonstrates that considerably good semantic analysis can be automatically built by applying state-of-the-art data-driven parsing techniques.


Introduction
Dependency grammar is a long-standing tradition that determines syntacto-semantic structures on the basis of word-to-word connections. It names a family of approaches to linguistic analysis that all share a commitment to typed relations between ordered pairs of words. Partially due to the powerful expressiveness of bi-lexical dependency structures, the corresponding parsing problem has been widely studied especially in the last decade. The majority of these studies, however, only focus on tree-structured representations. Beyond tree-shaped structures, SemEval-2014 Task 8 (Oepen et al., 2014) seeked to stimulate the dependency parsing community to move towards more general graph processing. Quite a number of teams all over the world participated in this shared task, which suggests a growing community interest in parsing into graph-shaped dependency representations. * Email correspondence.
SemEval-2015 Task 18 is a subsequent task of SemEval-2014 Task 8. Following several wellestablished syntactic theories, this task proposes using graphs to represent semantics and provides highquality annotations for three typologically different languages. We have developed a system, dubbed DZSW14 (Du et al., 2014) for the task last year. The system employed a hybrid architecture which benefits from both transition-based and graph-based parsing approaches. Evaluation on multiple English data sets provided by SemEval-2014 indicated that DZSW14 is able to obtain high-quality parsing results. Following the key idea to employ heterogeneous models to enhance hybrid parsing, we extend DZSW14 by developing more tree approximation models, namely the weighted tree approximation models. Evaluation on multilingual data sets provided by this year's task confirms the effectiveness of the techniques we have studied.
In this paper, we first give an introduction of the architecture of the baseline system DZSW14. Then we demonstrate the weighted tree approximation models. Finally we show the experiment results on SemEval-2015 Task 18. The tree approximation system can be downloaded at http://www.icst. pku.edu.cn/lcwm/grass.

Baseline System: DZSW14
Our system is based on the system we constructed for SemEval-2014 Task 8. In this section we present a brief overview of its architecture. Refer to (Du et al., 2014) for more information.
Inspired by the research on discriminative dependency tree parsing, DZSW14 employed a hybrid parsing architecture. DZSW14 explored two kinds of heterogeneous approaches: transition-based and tree approximation approaches. The transitionbased model use transitions on configurations to obtain graph parses, while the tree approximation model transform graphs into trees for training and test. To further combine the complementary prediction power, DZWS14 applied a voting-based ensemble method.

Transition-Based Models
Transition-based models consist of transitions and configurations that can be manipulated by the transitions. The configurations generally encode the information of the current parsing state, especially including partial parsing results, and the transitions can be applied to a configuration, turning it into a new one. When the system reaches any acceptable configuration, a coherent semantic graph is also successfully built. The key to the success of building transition-based parsers is to train good classifiers to approximate transition oracles. DZSW14 implements 5 different transition systems for graph parsing. Experiments from last year's evaluation suggest that this method can be applied to build considerably good parsers for more general linguistic graphs.

Tree Approximation Models
The core of tree approximation is transformations between graphs and trees. At the training time, we convert the dependency graphs from the training data into dependency trees, and train second-order arc-factored models 1 (Bohnet, 2010). At the test phase, we parse sentences using this tree parser, and convert the output trees back into semantic graphs. In DZSW14, We develop several different methods to convert a semantic graph into a tree. The main idea is to apply graph traversal algorithms to convert a directed graph to a directed tree. During the traversal, we may lose or modify some dependency relations in order to make a tree.

Experience from DZSW14
From a lot of experiments on DZSW14, we learned several lessons as follows. 1 The mate parser (code.google.com/p/ mate-tools/) is used.
• Overall, the tree approximation models perform better than the transition-based models.
• The outputs of the different models exhibit significant diversity.
• The model ensemble is quite effective, resulting in a boost in performance.
This motivates us to explore more heterogeneous tree approximation models for this year's evaluation.

Weighted Tree Approximation Models
In our system for SemEval-2015, we develop more tree approximation models for model ensemble. We call the graph-to-tree conversions in DZSW14 unweighted conversions since every edge in the graph are treated equally. In this section, we demonstrate weighted conversions which assign weights for different edges.

Weighted Conversion
Given a graph G = V, E , the edge selection in the unweighted conversion is locally decided by its current traversal state. In the weighted conversion, we take the importance of different edges into account and try to globally improve the integrity with respect to the losing edges. For example, in the top of Figure 1, the undirected edges (Ward, was), (Ward, relieved), (was, relieved) form a cycle. Only two edges can be kept by the converted tree T . It allows us to decide which edges to keep according to the sum of the weights of them. Let x → t y denote edges in the tree, and x → g y edges in the graph. We assign each possible edge x → t y a heuristic value ω(x, y), and intend to obtain a tree with maximum weight . More formally, the result T max = (V, E max t ) contains the maximum sum of values of edges:

Weight
We define weight ω(x, y) as follows, where I is the indicator function: • ω(x, y) = A(x, y) + B(x, y) + C(x, y): The weight is separate into 3 parts.
• A(x, y) = I x→y∈E∨y→x∈E (x, y) × a: a is the weight for the existing edge on graph ignoring direction.
• B(x, y) = I x→y∈E (x, y) × b: b is the weight for the directed edge in the graph.
• C(x, y) = n − |x − y|: This is to value the importance of edges where n is the length of sentence. We consider edges linking closer words more important because they are generally easier to be predicted.
• a b n or a > b × n > n × n: First the transformed tree should contain the original edges in G as many as possible. Then we need to consider the quantity of edges with correct direction in G. And the distance between nodes in the sentence is in the last place.

Decoding
After the edges are weighted, the core decoding task for graph transformation can be solved by maximum spanning tree (MST) algorithms, where the search space T consists of all projective and non-projective dependency trees. To transform a graph to a projective tree, we use Eisner's algorithm, and for nonprojective, we use Chu-Liu-Edmonds algorithm.

Adding Labels
Now we get the MST T max (V, E max t ). For each (x, y) ∈ E max t , we assign a new label to (x, y) as follows, Case 1: x → g y, add the original label in G(V, E) to the new edge x → t y; Case 2: y → g x, add the original label with symbol R to x → t y; Case 3: x → g y ∧ y → g x, add label as Case 1; Case 4: x g y ∧ y x, add label None to the edge x → t y.
To improve the coverage of original edges, a variant model with modified labels in trees to help encode more edges in graphs.Suppose that x → y is a lost edge which is not on the new dependency tree but is on the original dependency graph. The statistic shows the structure of a majority of lost edges are in one of three different types:  Figure 1: One dependency graph and two possible dependency trees after converting.
1. The nodes are siblings.
2. One is the grandparent of the other.
3. One is the great-grandparent of the other.
The conversions can be enhanced by adding more symbols to labels to indicate lost edges if they are of the three types above. The method is to append semicolon (;) and exclamation mark (!) to some degree and then add new label with information of lost edge directly. If x → g y is not in the converted tree and its structure is of one of aforementioned types, we change the label connecting node y and its parent node with assumption that y is not higher than x in the dependency tree.
• x is great-grandparent node of y: New label is the label of x → g y following the symbol 'g'.
• x is grandparent node of y: New label is the label of x → g y following the symbol 'f '.
• x and y are siblings: Let z be the two nodes' parent. We sort all the z's children by the order of position in the sentence And we use an integer P to indicate the position. If the two siblings are on the same side of z, P will be the distance of the two siblings' positions in the sorted children sequence and extra symbol will be 'y'. If the two siblings are on the different sides of z, extra symbol will be 'n' and P will be x's rank in the same side's nodes in the sorted children sequence. New label is the symbol 'b' with extra symbol followed and the label of x → g y.
If node y is higher than x in the dependency tree, we would add symbolR to indicate the additional edge is reversed. Figure 1 is an example of a converted tree.

Model Ensemble
We select 9 tree approximation models from DZSW14, and propose 4 new weighted models ({projective, non-projective} × {original, labelmodification-variant}). Together with the transition models, we have to combine the outputs of them into one. We use a simple voter to combine the outputs just like in DZSW14. For each pair of words of a sentence, we count the number of the models that give positive predictions. If the number is greater than a threshold, we put this arc to the final graph, and label the arc with the most common label of what the models give. Furthermore, we find that the performance of the tree approximation models are better than the transition based models, so we assign weights for individual models too. Then instead of just counting, we sum the weights of the models that give positive predictions. The tree approximation models are assigned higher weights.

Sense Labeling
In this task, two representations DM and PSD of English require to label the words additional sense label. We develop a sequence labeler for this requirement. The sequence labeler is based on a secondorder linear-chain global linear model and utilize the perceptron algorithm for parameter estimation. To accelerate processing, we apply a Viterbi decoder but constrain it with beam search. In particular, the number of cells in the dynamic programming table for each word is bounded by a fixed beam size. This decoder can be also viewed as a beam decoder with state-merging.
The representation DM can be labeled directly. However due to the large amount of different senses in representation PSD, it is difficult to label senses without preprocessing. We finally decide to filter out the rare senses that have a frequency lower than 10, substituting "unknown" for them.

Experiments
We participated in the closed track. The tree approximation algorithms may cause some edge loss, and the statistics for the weighted conversions are shown in Table 1. We can see that all the algorithms cause edge loss, and edge loss of the variants is much lower. In addition, non-projective tree conversions cause less loss compared to projective tree conversions. Edge loss may result in a lower recall and higher precision, but we can tune the final results during model ensemble. The final results given by the organizers are shown in Table 2. Here we only give the labeled score.

Conclusion
Based on our previous system DZSW14, we developed a hybrid system for SemEval-2015 Task 18. Our new system extends DZSW14 by providing several more tree approximation models. The final result shows that our system as well as our new models are effective.