Semantic Role Labeling as Syntactic Dependency Parsing

We reduce the task of (span-based) PropBank-style semantic role labeling (SRL) to syntactic dependency parsing. Our approach is motivated by our empirical analysis that shows three common syntactic patterns account for over 98% of the SRL annotations for both English and Chinese data. Based on this observation, we present a conversion scheme that packs SRL annotations into dependency tree representations through joint labels that permit highly accurate recovery back to the original format. This representation allows us to train statistical dependency parsers to tackle SRL and achieve competitive performance with the current state of the art. Our findings show the promise of syntactic dependency trees in encoding semantic role relations within their syntactic domain of locality, and point to potential further integration of syntactic methods into semantic role labeling in the future.


Introduction
Semantic role labeling (SRL;Palmer et al., 2010) analyzes texts with respect to predicate argument structures such as "who did what to whom, and how, when and where". These generic surface semantic representations provide richer linguistic analysis than syntactic parsing alone and are useful in a wide range of downstream applications including question answering (Shen and Lapata, 2007;Khashabi et al., 2018), open-domain information extraction (Christensen et al., 2010), clinical narrative understanding (Albright et al., 2013), automatic summarization (Khan et al., 2015) and machine translation (Liu and Gildea, 2010;Xiong et al., 2012;Bazrafshan and Gildea, 2013), among others.
It is commonly acknowledged that syntax and semantics are tightly coupled with each other (Levin * Work done during an internship at Bloomberg L.P. She wanted to design the bridge . . . pred A1 A0 A1 pred Figure 1: An example sentence with SRL annotations (below) and our joint syntacto-semantic dependency relations (above; described in §3). The two representations can be converted from one to the other. A0 and A1 are short for SRL relations ARG0 and ARG1. and Hovav, 2005). In some forms of linguistic theories (Baker, 1996(Baker, , 1997, semantic arguments are even hypothesized to be assigned under consistent and specific syntactic configurations. As a matter of practice, annotations of semantic roles , inter alia) are typically based on existing syntactic treebanks as an additional annotation layer. Annotators are instructed (Babko-Malaya et al., 2006;Bonial et al., 2015) to identify semantic arguments within the predicates' domain of locality, 1 respecting the strong connection between syntax and semantics.
Empirically, syntax has indeed been shown to be helpful to SRL in a variety of ways. Earlier SRL systems have successfully incorporated syntactic parse trees as features and pruning signals (Punyakanok et al., 2008). Recently, neural models with shared representations trained to predict both syntactic trees and predicate-argument structures in a multi-task learning setting achieve superior performance to syntax-agnostic models (Strubell et al., 2018;Swayamdipta et al., 2018), reinforcing the utility of syntax in SRL.
However, researchers are yet to fully leverage all the theoretical linguistic assumptions and the dataset annotation conventions surrounding the tight connections between syntax and SRL. To do so, ideally, one must perform deep syntactic processing to capture long-distance dependencies and argument sharing. One solution is to introduce traces into phrase-structure trees, which, unfortunately, is beyond the scope of most statistical constituency parsers partially due to their associated increased complexity (Kummerfeld and Klein, 2017). Another solution is to use richer grammar formalisms with feature structures such as combinatory categorial grammar (CCG; Steedman, 2000) and tree adjoining grammar (TAG;Joshi et al., 1975) that directly build syntactic relations within the predicates' extended domain of locality. It is then possible to restrict the semantic argument candidates to only those "local" dependencies (Gildea and Hockenmaier, 2003;Liu, 2009;Liu and Sarkar, 2009;Konstas et al., 2014;Lewis et al., 2015). However, such treebank data are harder to obtain, and their parsing algorithms tend to be less efficient than parsing probabilistic context-free grammars (Kallmeyer, 2010).
On the other hand, syntactic dependency trees directly encode bilexical governor-dependent relations among the surface tokens, which implicitly extend the domain of locality (Schneider, 2008). Dependency parsing (Kübler et al., 2008) is empirically attractive for its simplicity, data availability, efficient and accurate parsing algorithms, and its tight connection to semantic analysis (Reddy et al., 2017). Despite ample research community interest in joint models for dependency parsing and SRL (Surdeanu et al., 2008;Hajič et al., 2009;Henderson et al., 2013), a precise characterization of the mapping between semantic arguments and syntactic configurations has been lacking.
In this paper, we provide a detailed empirical account of PropBank-style SRL annotations on both English and Chinese data. We show that a vast majority (over 98%) of the semantic relations are characterized by one of three basic dependency-based syntactic configurations: the semantic predicate 1) directly dominates, 2) is directly dominated by, or 3) shares a common syntactic governor with the semantic argument. The latter two cases are mostly represented by syntactic constructions including relativization, control, raising, and coordination.
Based on our observations, we design a backand-forth conversion algorithm that embeds SRL relations into dependency trees. The SRL relations are appended to the syntactic labels to form joint labels, while the syntactic governor for each token remains unaltered. The algorithms reach over 99% F1 score on English and over 97% on Chinese data in oracle back-and-forth conversion experiments. Further, we train statistical dependency parsing models that simultaneously predict SRL and dependency relations through these joint labels. Experiments show that our fused syntacto-semantic models achieve competitive performance with the state of the art.
Our findings show the promise of dependency trees in encoding PropBank-style semantic role relations: they have great potential in reducing the task of SRL to dependency parsing with an expanded label space. Such a task reduction facilitates future research into finding an empirically adequate granularity for representing SRL relations. It also opens up future possibilities for further integration of syntactic methods into SRL as well as adaptations of extensively-studied dependency parsing techniques to SRL, including linear-time decoding, efficiency-performance tradeoffs, multilingual knowledge transfer, and more. We hope our work can inspire future research into syntactic treatment of other shallow semantic representations such as FrameNet-style SRL (Baker et al., 1998;Fillmore et al., 2003). Our code is available at https://www.github.com/bloomberg/emnlp20 depsrl.
Contribution Our work (1) provides a detailed empirical analysis of the syntactic structures of semantic roles, (2) characterizes the tight connections between syntax and SRL with three repeating structural configurations, (3) proposes a back-and-forth conversion method that supports a fully-syntactic approach to SRL, and (4) shows through experiments that dependency parsers can reach competitive performance with the state of the art on span-based SRL. Additionally, (5) all our analysis, methods and results apply to two languages from distinctive language families, English and Chinese.

Syntactic Structures of Semantic Roles
It has been widely assumed in linguistic theories that the semantic representations of arguments are closely related to their syntactic positions with respect to the predicates (Gruber, 1965;Jackendoff, 1972Jackendoff, , 1992Fillmore, 1976;Baker, 1985;Levin, 1993). 2 This notion is articulated as linguistic hypotheses underlying many syntactic theories: (1) Universal Alignment Hypothesis: There exist principles of Universal Grammar which predict the initial [grammatical] relation borne by each nominal in a given clause from the meaning of the clause. (Perlmutter and Postal, 1984, p. 97) (2) The Uniformity of Theta Assignment Hypothesis: Identical thematic relationships between items are represented by identical structural relationships between those items at the level of D[eep]-structure. (Baker, 1985, p. 57) For theories that posit one-to-one correspondence between semantic roles and syntactic structures (Baker, 1996(Baker, , 1997, SRL can be treated purely as a syntactic task. However, doing so would require deep structural analysis (Bowers, 2010) that hypothesizes more functional categories than what current syntactic annotations cover. Nonetheless, the Proposition Bank (PropBank; Kingsbury and Palmer, 2002; annotations do capture the domain of locality that is implicitly assumed by these linguistic theories. PropBank defines the domain of locality for verbal predicates to be indicated by "clausal boundary markers" and the annotators are instructed to limit their semantic role annotations to "the sisters of the verb relation (for example, the direct object) and the sisters of the verb phrase (for example, the subject)" (Bonial et al., 2017, p. 746). In cases of syntactically-displaced arguments, the annotators are asked to pick the empty elements that are within the domain of locality, and then syntactic coindexation chains are used to reconstruct the surface semantic role relations. Recognizing displaced arguments is crucial to SRL, so taking full advantage of locality constraints would also require modeling empty elements and movement, for which current NLP systems still lack accurate, efficient, and high-coverage solutions (Gabbard et al., 2006;Kummerfeld and Klein, 2017).
From an empirical perspective, most syntactic realizations for semantic arguments follow certain common patterns even when they are displaced. Indeed, this is partially why syntax-based features and candidate pruning heuristics have been suc-cessful in SRL (Gildea and Palmer, 2002;Gildea and Jurafsky, 2002;Sun et al., 2008). Full parsing might not be necessary to account for the majority of cases in the annotations. Thus, knowing the empirical distributions of the arguments' syntactic positions would be highly useful for deciding how detailed the syntactic analysis needs to be for the purpose of SRL. In this section, we provide such a characterization.
Our analysis is based on dependency syntax and complements prior constituent-based characterizations . One advantage of syntactic dependencies over phrase-structure trees for the purposes of this paper is that the dependents are often more directly connected to the syntactic governors without intervening intermediate constituents. For example, when a verb has multiple adjunct modifiers, each would create an additional intermediate VP constituent in the argument structure, leading to further separation between the verb and the external argument (subject). In contrast, in a dependency representation, the subject is always directly dominated by the verbal predicate.

Material
We use the training splits of the CoNLL 2012 shared task data (Pradhan et al., 2012) on both English and Chinese; sentences are originally from OntoNotes 5.0 (Hovy et al., 2006). The SRL annotations are based on English and Chinese PropBank (Kingsbury and Palmer, 2002;Xue and Palmer, 2003;Xue, 2008), which are extensively used in SRL research. We choose not to use the SRL-targeted CoNLL 2005 shared task (Carreras and Màrquez, 2005) data since earlier versions of PropBank (Babko-Malaya, 2005) contain many resolvable mismatches between syntactic and semantic annotations (Babko-Malaya et al., 2006). Updated annotation guidelines (Bonial et al., 2015) have fixed most of the identified issues. We convert the Penn TreeBank (PTB; Marcus et al., 1993) and the Penn Chinese TreeBank (CTB; Xue et al., 2005) phrase-structure trees into Stanford Dependencies (SD; de Marneffe et al., 2006) for English (de Marneffe and Manning, 2008;Silveira et al., 2014) and for Chinese (Chang et al., 2009)

Observations
We categorize the syntactic configurations between predicates and arguments and present the results in Table 1. For both English and Chinese, the vast majority, more than 98%, of the predicate-argument relations fall into one of three major categories: the semantic argument is a syntactic child, sibling, or parent of the semantic predicate. Next, we give a brief account of our linguistic observations on the English data associated with each category. See Appendix §C and §D for more examples from both English and Chinese.
pred → arg (D) The predicate directly (D) dominates the semantic argument in the syntactic tree. Not surprisingly, this straightforward type of relation is the most prevalent in the PropBank data, accounting for more than 87% (82%) of all English (Chinese) predicate-argument relations.
arg ← → pred (C) The predicate and the argument share a common (C) syntactic parent. There are two major types of constructions resulting in this kind of configuration: 1) the common parent is a control or raising predicate, creating an open clausal complement (xcomp) relation and 2) there is a coordination structure between the predicate and the common parent and both predicates share a same argument in the semantic structure. Both cases are so common that they are converted to direct dependencies in the enhanced Stanford Dependencies (Schuster and Manning, 2016).
arg → pred (R) The dominance relation between the predicate and the argument is reversed (R). This type of relations is frequently realized through relative clauses (rcmod) and verb participles (e.g., broken glass).
Other constructions Many other constructions can be analyzed as combinations of the previously mentioned patterns. 5 For example, a combination of (C)+(C) through control and coordination would derive the structural configuration of the fourth most frequent case in Table 1.
3 Reducing SRL to Dependency Parsing

Joint Labels
Building on the insights obtained from our analysis, we design a joint label space to encode both syntactic and SRL relations. The joint labels have four components: one syntactic relation and three semantic labels, each corresponding to one of the three most common structural patterns in Table 1. 5 Combinations of (D), (C), and (R) can theoretically account for all possible predicate-argument configurations. However, for a lossless back-and-forth conversion with our proposed joint labels ( §3), there are constraints on the argument structures of all the intermediate predicates along the shortest dependency path between the predicate and the argument. See Table 2 for an estimation of how many semantic relations may be decomposed as combinations of the three common structural patterns empirically given our conversion method.
Formally, for a length-n input sentence w = w 1 , . . . , w n , we denote the head of token w i in the syntactic dependency tree t to be w h i , or h i for short. The dependency tree also specifies a dependency relation labeled r i between each (h i , w i ) pair. To encode both syntactic and SRL information, we define a dependency tree t , keeping all the h i 's same as in t, but we modify relation r i to be r i , a concatenation of four labels: r SYN i = r i is the syntactic relation; r (D) i describes the SRL relation directly between the predicate h i and the argument headed by w i ; r (R) i specifies the reverse situation where w i is the predicate and h i the head of the argument; r (C) i encodes the parent-sharing pattern connecting the two predicates and is in the form of a tuple (a, b), corresponding to the case where the SRL argument with label a for predicate h i is an SRL argument labeled b with respect to predicate w i . 6 If there exist no such semantic relations, the component labels can be left unspecified, denoted as " ".
In the example of Fig. 1, the joint label between wanted and design is xcomp-ARG1-(ARG0,ARG0)-. We can break the joint label into four parts: "xcomp" describes the syntactic relation between the two tokens; "ARG1" indicates that the subtree to design the bridge is an argument labeled "ARG1" for predicate wanted; (ARG0,ARG0) establishes the argument sharing strategy that ARG0 she of wanted is an ARG0 for the predicate design; finally, " " indicates there is no argument headed by wanted for the predicate design.

Back-and-Forth Conversion
The joint labels encode both syntactic and semantic relations, and it is straightforward to convert/recover the separate dependency and SRL annotations to/from the joint representations.
In the forward conversion (separate → joint), we first extract the syntactic heads of all SRL arguments. Then we enumerate all predicate-argument pairs, and for each pair falling into one of the three most common patterns as listed in Table 1, we insert the SRL argument label in the corresponding slot in the joint label. For predicates sharing more than one argument, we observe that most cases are due to the two predicates sharing all their ARGM relations, so we augment the (C) label with a binary indicator of whether or not to propagate all ARGM arguments. When the two predicates share more than one core argument, which occurs for around 2% of the argument-sharing predicates, we randomly select and record one of the shared arguments in r (C) i . A more systematic assignment in such cases in future work may lead to further improvement.
As for the backward conversion (joint → separate), the syntactic dependencies can be directly decoupled from the joint label, and we build the SRL relations in three steps: we first identity all the (D) and (R) dependency relations; then, with a topdown traversal of the tree, we identify the shared argument relations through (C) labels; finally, we rebuild the span boundaries using a rule-based approach. Top-down traversal is necessary to allow further propagation of arguments. It allows us to cover some of the less common cases through multiple argument sharings, e.g., the fourth example in Table 1. When a (C) label (a, b) is invalid 7 in that the syntactic governor does not have an argument with label a, we simply ignore this (C) label. In reconstructing the span boundaries, we distinguish among different types of arguments. For (D)-type arguments, we directly take the entire subtrees dominated by the head words of the arguments. For (R)-type arguments, we adopt language-specific heuristics: 8 in English, when the argument (syntactic head) is to the left of the predicate (syntactic child), as commonly happens in relative clause structures, we include all of the argument's children subtrees to the left of the predicate; when the argument is to the right, which usually happens when the predicate is in participle form, we define the right subtree of the argument as its span. For (C)-type arguments, we reuse the span boundaries of the shared arguments. Table 2 shows the oracle results of our back-andforth conversion strategies on the training data. We take gold-standard syntactic and SRL annotations and convert them into joint-label representations. Then, we reconstruct the SRL relations through our backward conversion and measure span-based 7 This should not happen in the oracle conversion but may occur in model predictions. 8 The simple subtree approach does not apply to reconstructing (R)-type arguments since, by definition, the subtree of an (R)-type argument will contain its predicate, which contradicts data annotations. Our heuristics are designed to support a span-based evaluation, and span reconstruction can be omitted if one focuses on a dependency-based evaluation.  exact match metrics. Our procedures can faithfully reconstruct most of the SRL relations for both English and Chinese data. 9 English sees a higher oracle score than Chinese. We attribute this result to the synchronization effort between the syntactic and SRL annotations during the evolution of English PropBank (Babko-Malaya et al., 2006;Bonial et al., 2017).

Models
Given that SRL can be reduced to a dependency parsing task with an extended label space, our model replicates and adapts that of a dependency parser. We follow the basic design of Dozat and Manning (2017), but instead of using LSTMs as input feature extractors, we opt for Transformer encoders (Vaswani et al., 2017), which have previously been shown to be successful in constituency parsing (Kitaev and Klein, 2018;Kitaev et al., 2019), dependency parsing (Kondratyuk and Straka, 2019), and SRL (Tan et al., 2018;Strubell et al., 2018). Next, we score all potential attachment pairs and dependency and SRL relations with the token-level representations through deep biaffine transformation (Dozat and Manning, 2017). After the dependency parsing decoding process, we retrieve the syntactic parse trees and SRL structures via our backward conversion algorithm.
Formally, we associate each token position with a context-sensitive representation by where w 0 denotes the root symbol for the dependency parse tree, and the inputs to the Transformer network are pretrained GloVe embeddings (Pennington et al., 2014). Alternatively, we can finetune a pre-trained contextualized feature extractor 9 The English oracle F1 score is higher than the combined (D)+(C)+(R) occurrences of 98%. This is because (1) our method is precision-focused to minimize error propagation in prediction; recall loss of 1.7% is a direct reflection of the unaccounted less-frequent structures, and (2) many arguments, e.g., the fourth most frequent case in Table 1, can be reconstructed through the propagation of (C)-type labels.
Next, the same representations x serve as inputs to five different scoring modules, one for dependency attachment, one for syntactic labeling, and three modules for the newly-introduced SRL-related labels. All of the scoring modules use a deep biaffine (DBA) scoring function introduced by Dozat and Manning (2017) that is widely used in syntactic parsing (Dozat et al., 2017;Shi et al., 2017;Shi and Lee, 2018), semantic dependency parsing (Dozat and Manning, 2018) and SRL (Strubell et al., 2018). For an ordered pair of input vectors x i and x j , an r-dimensional DBA transforms each vector into a d-dimensional vector with multi-layer perceptrons and then outputs an ] appends an element of 1 to the end of the vector, and MLP I and MLP J are two separate multi-layer perceptrons with nonlinear activation functions. Following Dozat and Manning (2017), we model dependency attachment probabilities with a 1-dimensional DBA function: For syntactic labels from vocabulary V SYN , we use a |V SYN |-dimensional DBA function: The three semantic label components r (D) , r (C) , and r (R) are modeled similarly to r SYN .
All the above components are separately parameterized but they share the same feature extractor (Transformer or BERT). We train them with locallynormalized log-likelihood as objectives. During inference, we use a projective 11 maximum spanning tree algorithm (Eisner, 1996;Eisner and Satta, 1999) for unlabeled dependency parsing and then select the highest-scoring component label for each predicted attachment and each component. 12 10 In this case, we use the final-layer vector of the last subword unit for each word as its representation and the vector from the prepended [CLS] token for the root symbol. 11 The choice of a projective decoder is motivated by the empirical fact that both English and Chinese dependency trees are highly projective. One may consider a non-projective decoder when adapting to other languages. 12 Structured and global inference that considers the interactions among all relation labels is a promising future direction.  To isolate the effects of predicate identification and following most existing work on SRL, we provide our models with pre-identified predicates. We report median performance across 5 runs of different random initialization for our models and our replicated reference models. Implementation details are provided in Appendix §A.  slightly underperform the BIO-CRF baseline models on English, and the gap is larger on Chinese. 14 This can be attributed to the higher back-and-forth conversion loss on the Chinese data. We observe no significant difference in dependency parsing accuracy when training the Dozat and Manning's (2017) parser alone versus jointly training with our SRL labels. Additionally, our models make predictions for all predicates in a given sentence at the same time through O(n) joint syntacto-semantic labels with identical features, while most other competitive methods either use different features extracted for different predicates (Tan et al., 2018;Ouchi et al., 2018;Swayamdipta et al., 2018), effectively requiring executing feature extraction multiple times, or require scoring for all O(n 2 ) or O(n 3 ) possible predicate-argument pairs 15 (Strubell et al., 2018;Li et al., 2019). In our experiments, our models are 40% faster than the BIO-CRF baseline on average. Table 4 presents per-label F1 scores comparing our baseline model with our proposed method. Our method exhibits a similar overall performance to the baseline BIO-CRF model. Most of the difference is materialized on ARG2 and ARGM-ADV. Previous work in the literature finds that these labels are highly predicate-specific and known to be hard to predict (He et al., 2017). We further observe that pretrained feature extractors (BERT) tend to improve the most with respect to these two labels. Table 5 summarizes the results when one or more components of our models are replaced by gold-standard labels.  As expected, it is crucial to predict the syntactic trees correctly: failure to do so amounts to 35% or 29% of errors with or without pretrained feature extractors. Accuracy of (D)-type SRL relations has an even larger impact on the overall performance: it is responsible for half of the errors. This indicates that argument labeling is a harder sub-task than syntactic parsing. Further, we observe that the benefits of pretrained feature extractors mostly stem from improved accuracies of the syntactic component. Even with pretrained BERT features, semantic components remain challenging.  , 2008;Li et al., 2010;Henderson et al., 2013;Swayamdipta et al., 2016). In contrast, our work unifies the two representations into common structures.

Effect of Different Components
Joint labels The idea of using joint labels for performing both syntactic and semantic tasks is similar to that of function parsing (Merlo and Musillo, 2005;Gabbard et al., 2006;Musillo and Merlo, 2006). Ge and Mooney (2005)  They annotated a Chinese SRL corpus from scratch with a similar label scheme, while in this paper, we show that it is possible to extract such joint labels from existing data annotations.
Tree approximation In the task of semantic dependency parsing (Oepen et al., 2014), dependency structures are used to model more aspects of semantic phenomena than predicate-argument structures, and the representations are more general directed acyclic graphs. These graphs can be approximated by trees (Du et al., 2014;Schluter et al., 2014;Schluter, 2015) such that tree-based parsing algorithms become applicable. Unlike this line of research, we limit ourselves to the given syntactic trees, as opposed to finding the optimal approximating trees, and we focus on the close relations between syntax and SRL.
Dependency-based SRL Although predicateargument structures are traditionally defined in constituency terms, dependency-based predicateargument analysis (Hacioglu, 2004;Fundel et al., 2007) has been popularized through the CoNLL 2008 and 2009 shared tasks (Surdeanu et al., 2008;Hajič et al., 2009) and has been adopted by recent proposals of decompositional semantics (White et al., 2017). Choi and Palmer (2010) consider reconstructing constituency-based representations from dependency-based analysis. We confirm their findings that through a few heuristics, the reconstruction can be done faithfully.
Neural SRL The application of neural models to SRL motivates the question of whether modeling syntax is still necessary for the task ( Our results contribute to the ongoing debate by adding further evidence that the two tasks are deeply-coupled. Future work may further explore how much syntactic knowledge has been implicitly obtained in the apparently syntax-agnostic models.
Multi-task learning Our models share neural representations across the syntactic and the SRL labelers. This is an instance of multi-task learning (MTL;Caruana, 1993Caruana, , 1997

Conclusion
Linguistic theories assume a close relationship between the realization of semantic arguments and syntactic configurations. This work provides a detailed analysis of the syntactic structures of PropBank-style SRL and reveals that three common syntactic patterns account for 98% of annotated SRL relations for both English and Chinese data. Accordingly, we propose to reduce the task of SRL to syntactic dependency parsing through back-and-forth conversion to and from a joint label space. Experiments show that dependency parsers achieve competitive results on PropBank-style SRL with the state of the art. This work shows promise of a syntactic treatment of SRL and opens up possibilities of applying existing dependency parsing techniques to SRL. We invite future research into further integration of syntactic methods into shallow semantic analysis in other languages and other formulations, such as frame-semantic parsing, and other semanticallyoriented tasks.  Yang et al. (2018). Contextualized representation at each token's position is passed through a multi-layer perceptron with one hidden layer consisting of 256 hidden units and PReLU (He et al., 2015) activation function to obtain the scores for each tag.
64 training instances (16 when using BERT) are grouped into a minibatch, and the gradients are clipped (Pascanu et al., 2013) at 5.0. We use Adam (Kingma and Ba, 2015) optimizer with β 1 = 0.9, β 2 = 0.999 and = 1 × 10 −8 . When using GloVe embeddings and Transformers, we set the learning rate to be 1 × 10 −4 ; when fine-tuning BERT, the learning rate is lowered to 1 × 10 −5 . Learning rates are multiplied by 0.1 once the development performance stops increasing for 5 epochs. All the models are trained until the learning rates are lowered three times and the performance plateaus on the development sets. Our implementation is based on PyTorch (Paszke et al., 2017).
On a single V100 GPU, the baseline BIO-CRF model parses 96.4 sentences/sec and our proposed model processes at 159.1 sentences/sec on average.
Throughout our experiments, all the hyperparameters are taken directly from relevant suggestions

B.1 Training without Gold Syntactic Trees
Our method leverages the gold-standard dependency trees in the training data to design highfidelity back-and-forth conversion algorithms. Table 6 considers a scenario where we do not have access to such gold trees during training: we jackknife the data into 8 folds, train parsers using 7 folds and predict trees on the remaining fold. Our models show similar F1 scores under this condition as that of using gold trees, while the recall is traded for precision since our conversion method is precision-focused. This is not a realistic scenario given that existing PropBank-style SRL annotations are all based on syntax, so as a matter of practice we always have access to gold trees during training. Nonetheless, these experiments point to the viability of using predicted trees in practice without incurring a significant loss in F1 scores.

B.2 Accuracies by SRL Relation Types
In Table 7, we break down the accuracies by the syntactic patterns of the SRL relations. Compared with our baseline, a replication of Tan et al. (2018), our models achieves higher or competitive results on (D)-type and (R)-type SRL relations. These two types establish a direct or reverse semantic relation with respect to the syntactic structure. In contrast, the (C)-type relations require accurate predictions of sibling relations as well as at least two SRL-related labels and are thus more prone to error propagation. We hypothesize that global scoring of the dependency structures can alleviate   this issue, and we leave that to future work.

B.3 Learning Curve
In Table 8, we train the models with varying amounts of training data. With GloVe embeddings, our models exhibit higher performance when training data is limited, as compared with the corresponding baselines. When the pre-trained BERT feature extractor is used, both the baseline and our model require far less data to reach similar levels of performance. Our model shows significant improvement when the amount of training data is extremely limited (1%), and the baseline edges out for the other two settings (3% and 10%).

C Additional English Data Analysis
Among the three common patterns, (D)-type SRL relations are the most frequent and easiest to understand. In this section, we provide additional examples to shed light on (C)-type and (R)-type relations. We also show some sentences with more complex syntactic phenomena than what can be handled by our joint-label scheme. In all the examples, we boldface the predicates, underline the head words of the arguments, and highlight only the shortest dependency paths connecting them.

C.1 (C)-Type Relations
The (C)-type relations are most frequently used in ARG0 (55%) and ARG1 (19%) relations, in contrast to (D)-type relations, where the percentages are much lower (34% and 17% respectively). This can be explained by the fact that a lot of (C)-type relations are used in control and raising verb constructions. A second major construction associated with (C)-type relations is conjunction, which shares either core or peripheral arguments among the conjuncts. The most common dependency relation labels connecting the common parents and the predicates are: "xcomp" (39%), "conj" (37%), "vmod" (9%), and "dep" (6%). "xcomp" signifies control/raising structures. Popular common parent words (the control/raising verbs) include "want", "expect", "continue", "begin", etc.
"conj" represents a coordination structure. Since the first conjunct is a syntactic head of other conjuncts in Stanford Dependencies, any shared argument will result in a (C)-type relation.
"vmod" denotes non-finite verbal modifiers whose missing subjects can often be found in the main clauses. For example: (3) We use all wisdom to counsel every person.
We use counsel nsubj vmod A lot of problematic instances of "dep" can be attributed to failures of constituency-to-dependency conversion, where it should have been recognized as a relation corresponding to another construction. For example: (4) He calls . . . and pops in every once in a while. The second most common construction involves "vmod" (28%). Different from the "vmod" relations involved in (C)-type relations, the non-finite clauses usually modify noun phrases in (R) The third most common type of cases involves participial adjectives, using "amod" syntactic relation (17%). Since the verb is modifying the noun as an adjective, the syntactic dependency and the semantic relation are reversed. For example: (9) . . . a fact finding American led committee . . .
led committee amod

C.3 Others
The other constructions besides the three most common patterns are a mixture of data annotation errors, constituency-to-dependency failures, and combinations of the frequent patterns. If the argument is shared with other predicates along the dependency path, then our conversion algorithm can recover the SRL relation through multiple (C)-type labels. For example, in the following sentence, the argument "I" is shared across three predicates "trying", "help" and "fix" as ARG0's. Annotation inconsistencies can result in rare patterns beyond the scope of the current design of our joint label. For example, in the following sentence, the SRL annotation decides that "the Museum of Modern Art" is ARGM-LOC of "listed", making the predicate a grandparent of the argument. A simple fix that simply includes the preposition "in" as part of the argument span (as is annotated in most other examples) will change this case into a (D)-type relation.
(11) Now your name is listed in the Museum of Modern Art.
listed in Museum prep pobj

D Chinese Data Analysis
Despite the fact that Chinese and English are very different languages from two distinctive language families, they exhibit similar distributions of patterns when it comes to the syntactic patterns of SRL relations. The three most common types, (D)-, (C)and (R)-type relations, account for over 98% of all annotated predicate-argument relations. In the following examples, BA denotes a ba construction, DE refers to a de particle, and CLASSIFIER represents Chinese measure words for quantity expressions (Huang et al., 2008).