DANGNT@UIT.VNU-HCM at SemEval 2019 Task 1: Graph Transformation System from Stanford Basic Dependencies to Universal Conceptual Cognitive Annotation (UCCA)

This paper describes the graph transfor-mation system (GT System) for SemEval 2019 Task 1: Cross-lingual Semantic Parsing with Universal Conceptual Cognitive Annotation (UCCA)1. The input of GT System is a pair of text and its unannotated xml, which is a layer 0 part of UCCA form. The output of GT System is the corresponding full UCCA xml. Based on the idea of graph illustration and transformation, we perform four main tasks when building GT System. At the first task, we illustrate the graph form of stanford dependencies2 of input text. We then transform into an intermediate graph in the second task. At the third task, we continue to transform into ouput graph form. Finally, we create the output UCCA xml. The evaluation results show that our method generates good-quality UCCA xml and has a meaningful contribution to the semantic represetation sub-field in Natural Language Processing.


Introduction
In the past few years, semantic representation is receiving growing attention in NLP. Researchers have recently proposed different semantic schemes. Examples include Abstract Meaning Representation (Banarescu et al. 2013), Broadcoverage Semantic Dependencies (Oepen et al. 2014), Universal Decompositional Semantics (White et al. 2016, Parallel Meaning Bank (Abzianidze et al. 2016), Universal Conceptual Cognitive Annotation (Abend and Rappoport 2013). These advances in semantic representation, along with corresponding advances in semantic parsing, text understanding, summarization, paraphrase detection, and semantic evaluation.

The Graph Transformation System
In this section, we express our GT system for creating UCCA xml of the input text. The general architecture is represented in Figure 2: When building GT System, we perform two processes: training and testing process. At training process, we build the intermediate graph from UCCA and SD basic dependencies of training data 1 . At the testing process, which can be called the inverse process of training, we build the ouput UCCA from intermediate graph of testing data.

Intermediate Graph
In

Training Process
Firstly, at training process, we consider train data 1 and performed main tasks. The first and second task is in turn viewing the graph from of SD basic dependencies and UCCA of input text. At the third task, we propose Left-First-Search liked algorithm with Bottom-Up idea to reduce the graph form of UCCA to intermediate graph. At the final task, we propose rules and heuristics for matching graph form of SD basic dependencies and intermediate graph.
The main steps of Left-First-Search (LFS) algorithm is as follow.
Step 1. Browse to terminal on the left.
Step 2. Back to parent node of this terminal. Check if parent having any other child or not.
Step 2.1. If yes. Repeat Step 1 with root is this child node.
Step 3. Swap the position of root of sub-tree with position of child having important annotation.
Step 4. Back to parent node of this root. Repeat Step 2 with this parent.
To perform LFS algorithm, we determine the priority of SD and UCCA annotations according to two factors. First. The meaning of each annotation, representing the dependency relations and grammatical roles of lexicons. Second. The position of each node in graph.
Apply LFS algorithm for graph in Figure 3, we in turn have three level reductions in Figure 5, 6, 4 (respectively):  After having the final reduction, which is intermediate graph, of graph form of UCCA, we compare with graph form of SD basic dependencies. We consider the similarities between two graphs and propose rules and heuristics to (i) determine the level of one node, and (ii) determine the group of UCCA annotation for each level. The general idea of mechanism is: · Collect all SD-type of relations in UCCA and SD basic dependencies of training data. Below is the collection: · Determine the priority order of SD-type relations. Example 2: dobj -> amod -> dep -> nmod -> case. · Determine the compound (UCCA and SD) relation in each node level. Example 3: type conj at level 7:

Testing Process
At the testing process, which can be called the inverse process of training, we considered development and test data 1 and performed main tasks. The first task is viewing the graph from of SD basic dependencies of input text. At the second task, we applied proposed rules and heuristics to transform this graph to intermediate graph. We then, at the final task, we proposed Breadth-First-Search liked algorithm with Top-Down idea to re-create the graph form of UCCA from intermediate graph. This BFS algorithm is, in fact, the inverse mechanism of LFS algorithm in Section 3.2.
The main steps of Breadth-First-Search (BFS) algorithm is as follow.
Step 1. Reduce the first level of node.
Step 2. Determine the intergrated-Child which adheres to this node.
Step 3. If there is no intergratedChild.
Step 3.1. Repeat Step 1 until node come down to terminal position.
Step 3.2. Repeat from Step 1 to Step 4 with each child of this node.
Step 4. If there is intergratedChild.
Step 4.1. Repeat from Step 1 to Step 4 with each child of this node which are different from intergrated-Child.
Step 4.2. Repeat from Step 1 to Step 4 with this node. Step 4.3. Repeat from Step 1 to Step 4 with intergratedChild.

Experiment and Evaluation
At the evaluation phase, we focus on English indomain setting, using the Wiki corpus. In testing data, this domain consists of 515 small texts with corresponding unannotated UCCA xmls.
We test our method for both open and closed track in the English setting: (i) closed track submission is only allowed to use the gold-standard UCCA annotation distributed for the task in the target language, and limited in its use of additional resources; (ii) open track submission is allowed to use any additional resource.  The testing results show that our GT system creates good quality UCCA semantic representations in English Wiki testing data.

Conclusion
We have presented the graph transformation method for creating UCCA semantic representation from English in-domain setting, using the Wiki corpus 1 . Our method performs four main tasks: (i) illustrate the graph form of Stanford dependencies 2 of input text; (ii) transform into an intermediate graph; (iii) continue to transform into ouput graph form; (iv) create the output UCCA xml. The experiment results show that our method meets the requirements from SemEval Task 1 .
In future works, we intend to improve the transformational algorithms and propose more accurate rules for selecting best nodes and dependency tags. Besides, we expand our method and test with other datasets for a broader comparison.