GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks

This paper describes the system submitted to the SemEval 2019 shared task 1 ‘Cross-lingual Semantic Parsing with UCCA’. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing.


Introduction
Universal Conceptual Cognitive Annotation (UCCA) (Abend and Rappoport, 2013) is a semantically motivated approach to grammatical representation inspired by typological theories of grammar (Dixon, 2012) and Cognitive Linguistics literature (Croft and Cruse, 2004). In parsing, bi-lexical dependencies that are based on binary head-argument relations between lexical units are commonly employed in the representation of syntax (Nivre et al., 2007;Chen and Manning, 2014) and semantics (Hajič et al., 2012;Oepen et al., 2014;Dozat and Manning, 2018).
UCCA differs significantly from traditional dependency approaches in that it attempts to abstract away traditional syntactic structures and relations in favour of employing purely semantic distinctions to analyse sentence structure. The shared task, 'cross-lingual semantic parsing with UCCA' (Hershcovich et al., 2019) consists in parsing English, German, and French datasets using the UCCA semantic tagset. In order to enable multi-task learning, the UCCA-annotated data is automatically converted to other parsing formats, e.g. Abstract Meaning Representation (AMR) and Semantic Dependency Parsing (SDP), inter alia (Hershcovich et al., 2018).
Although the schemes are formally different, they have shared semantic content. In order to perform our experiments, we target the converted CONLLU format, which corresponds to traditional bi-lexical dependencies and rely on the conversion methodology which is provided in the shared task (Hershcovich et al., 2019) to attain UCCA graphs.
UCCA graphs contain both explicit and implicit units 1 However, in bi-lexical dependencies, nodes are text tokens and semantic relations are direct bi-lexical relations between the tokens. The conversion between the two format results in partial loss of information. Nonetheless, we believe that it is worth trying to model the task using one of the available formats (i.e. semantic dependency parsing) which is very popular among NLP researchers.
Typically, transition-based methods are used in syntactic (Chen and Manning, 2014) and semantic (Hershcovich et al., 2017) dependency parsing. By contrast, our proposed system shares several similarities with sequence-to-sequence neural architectures, as it does not specifically deal with parsing transitions. Our model uses word, POS and syntactic dependency tree representations as input and directly produces an edge-labeled graph representation for each sentence (i.e. edges and their labels as two separate outputs). This multilabel neural architecture, which consists of a BiL-STM and a Graph Convolutional Network (GCN), is described in Section 3.

Related Work
A recent trend in parsing research is sequence-tosequence learning (Vinyals et al., 2015b;Kitaev and Klein, 2018), which is inspired from Neural Machine Translation. These methods ignore explicit structural information in favour of relying on long-term memory, attention mechanism (contentbased or position-based) (Kitaev and Klein, 2018) or pointer networks (Vinyals et al., 2015a). By doing so, high-order features are implicitly captured, which results in competitive parsing performance (Jia and Liang, 2016).
Sequence-to-sequence learning has been particularly effective in Semantic Role Labeling (SRL) (Zhou and Xu, 2015). By augmenting these models with syntactic information, researchers have been able to develop state-of-the-art systems for SRL (Marcheggiani and Titov, 2017;Strubell et al., 2018).
As information derived from dependency parse trees can significantly contribute towards understanding the semantics of a sentence, Graph Convolutional Network (GCN) (Kipf and Welling, 2017) is used to help our system perform semantic parsing while attending to structural syntactic information. The architecture is similar to the GCN component employed in Rohanian et al. (2019) for detecting gappy multiword expressions.

Methodology
For this task, we employ a neural architecture utilising structural features to predict semantic parsing tags for each sentence. The system maps a sentence from the source language to a probability distribution over the tags for all the words in the sentence. Our architecture consists of a GCN layer (Kipf and Welling, 2017), a bidirectional LSTM, and a final dense layer on top.
The inputs to our system are sequences of words, alongside their corresponding POS and named-entity tags. 2 Word tokens are represented by contextualised ELMo embeddings (Peters et al., 2018), and POS and named-entity tags are one-hot encoded. We also use sentence-level syntactic dependency parse information as input to the system. In the GCN layer, the convolution filters operate based on the structure of the dependency tree (rather than the sequential order of words).
Graph Convolution. Convolutional Neural Networks (CNNs), as originally conceived, are sequential in nature, acting as detectors of Ngrams (Kim, 2014), and are often used as featuregenerating front-ends in deep neural networks. Graph Convolutional Network (GCN) has been introduced as a way to integrate rich structural relations such as syntactic graphs into the convolution process.
In the context of a syntax tree, a GCN can be understood as a non-linear activation function f and a filter W with a bias term b: where r(v) denotes all the words in relation with a given word v in a sentence, and c represents the output of the convolution. Using adjacency matrices, we define graph relations as mask filters for the inputs (Kipf and Welling, 2017; Schlichtkrull et al., 2017).
In the present task, information from each graph corresponds to a sentence-level dependency parse tree. Given the filter W s and bias b s , we can therefore define the sentence-level GCN as follows: where X n×v , A n×n , and C o×n are tensor representation of words, the adjacency matrix, and the convolution output respectively. 3 In Kipf and Welling (2017), a separate adjacency matrix is constructed for each relation to avoid overparametrising the model; by contrast, our model is limited to the following three types of relations: 1) the head to the dependents, 2) the dependents to the head, and 3) each word to itself (self-loops) similar to Marcheggiani and Titov (2017). The final output is the maximum of the weights from the three individual adjacency matrices. The model architecture is depicted in Figure 1.

Experiments
Our system participated in the closed track for English and German and the open track for French. We exclusively used the data provided in the shared task. The system is trained on the training data only, and the parameters are optimised using the development set. The results are reported on blind-test data in both in-domain and out-ofdomain settings. We focus on predicting the primary edges of UCCA semantic relations and their labels.

Data
The datasets of the shared task are devised for four settings: 1) English in-domain, using the Wiki corpus; 2) English out-of-domain, using the Wiki corpus as training and development data, and 20K Leagues as test data; 3) German in-domain, using the 20K Leagues corpus; 4) French setting with no training data (except trial data), using the 20K Leagues corpus as development and test data.
Whilst the annotated files used by the shared task organisers are in the XML format, several other formats are also available. We decided to use CONLLU, as it is more interpretable. However, according to the shared task description, 4 the conversion between XML and CONLLU, which is a necessary step before evaluation, is lossy. Hershcovich et al. (2017) used the same procedure of performing dependency parsing methods on CON-LLU files and converting the predictions back to UCCA.

Settings
We trained ELMo on each of the shared task datasets using the system implemented by Che et al. (2018). The embedding dimension is set to 1024. The number of nodes is 256 for GCN and 300 for BiLSTM, and we applied a dropout of 0.5 after each layer. We used the Adam optimiser for compiling the model. We tested our model in four different settings, as explained in Section 4.

Official Evaluation
Our model predicts two outputs for each dataset: primary edges and their labels (UCCA semantic categories). 5 Table 1 shows the performance (in terms of precision, recall, and F1-score) for predicting primary edges in both labeled (i.e. with semantic tags) and unlabeled settings (i.e. ignoring semantic tags). Table 2 shows F1-scores for each semantic category separately. Although the overall performance of the system, as shown in the official evaluation in Table 1, is not particularly impressive, there are a few results worth reporting. These are listed in Table 2.
Our system is ranked second in predicting four relations, i.e. L (linker), N (Connector), R (Relator), and G (Ground), in all settings displayed in bold. A plausible explanation would be that these relations are somewhat less affected by the loss of information incurred as a result of the conversions between formats.

Discussion
Our neural model is applied to UCCA corpora, which are converted to bi-lexical semantic dependency graphs and represented in the CONLLU format. The conversion from UCCA annotations to CONLLU tags appears to have a distinctly negative impact on the system's overall performance. As reported in the shared task description, converting the English Wiki corpus to the CONLLU format and back to the standard format results in an F1-score of only 89.7 for primary labeled edges. This means that our system cannot go beyond this upper limit.
Since our system is trained on CONLLU files and the evaluation involves converting the CON-LLU format back to the standard UCCA format,   the reported results for our system can be misleading. In order to further investigate this issue, we performed an evaluation using the English Wiki development data, comparing the predicted labels with the gold standard in development set in the CONLLU format. The average F1-score for labelled edges was 0.71 compared to the 0.685 score our system achieved on the development set using the official evaluation script. This clearly demonstrates that our system fares significantly better if it receives its input in the form of bi-lexical dependency graphs. Therefore, the system is best suited for semantic dependency parsing, although we believe that promising results could also be achieved in UCCA annotation if the conversion between the CONLLU and UCCA formats is improved to map and preserve information more accurately.

Conclusion and Future Work
In this paper, we described the system we submitted to the SemEval-2019 Task 1: 'Semantic Parsing using Graph Convolutional and Recurrent Neural Networks'. The model performs semantic parsing using information derived from syntactic dependencies between words in each sentence. We developed the model using a combination of GCN and BiLSTM components. Due to the penalisation resulting from the use of lossy CONLLU files, we argue that the results cannot be directly compared with those of the other task participants. 6 In the future, we would like to build on the work