Transforming Complex Sentences into a Semantic Hierarchy

We present an approach for recursively splitting and rephrasing complex English sentences into a novel semantic hierarchy of simplified sentences, with each of them presenting a more regular structure that may facilitate a wide variety of artificial intelligence tasks, such as machine translation (MT) or information extraction (IE). Using a set of hand-crafted transformation rules, input sentences are recursively transformed into a two-layered hierarchical representation in the form of core sentences and accompanying contexts that are linked via rhetorical relations. In this way, the semantic relationship of the decomposed constituents is preserved in the output, maintaining its interpretability for downstream applications. Both a thorough manual analysis and automatic evaluation across three datasets from two different domains demonstrate that the proposed syntactic simplification approach outperforms the state of the art in structural text simplification. Moreover, an extrinsic evaluation shows that when applying our framework as a preprocessing step the performance of state-of-the-art Open IE systems can be improved by up to 346% in precision and 52% in recall. To enable reproducible research, all code is provided online.


Introduction
Text Simplification (TS) is defined as the process of reducing the linguistic complexity of natural language (NL) text by utilizing a more readily accessible vocabulary and sentence structure. Its goal is to improve the readability of a text, making information easier to comprehend for people with reduced literacy, such as non-native speakers (Paetzold and Specia, 2016), aphasics (Carroll et al., 1998), dyslexics (Rello et al., 2013) or deaf persons (Inui et al., 2003). However, not only human readers may benefit from TS. Previous work has established that applying TS as a preprocessing step can improve the performance of a variety of natural language processing (NLP) tasks, such as Open IE (Saha and Mausam, 2018;Cetto et al., 2018), MT (Štajner andPopovic, 2016, 2018), Relation Extraction (Miwa et al., 2010), Semantic Role Labeling (Vickrey and Koller, 2008), Text Summarization (Siddharthan et al., 2004;Bouayad-Agha et al., 2009), Question Generation (Heilman and Smith, 2010;Bernhard et al., 2012), or Parsing (Chandrasekar et al., 1996;Jonnalagadda et al., 2009).
Linguistic complexity stems from the use of either a difficult vocabulary or sentence structure. Therefore, TS is classified into two categories: lexical simplification and syntactic simplification. Through substituting a difficult word or phrase with a more comprehensible synonym, the former primarily addresses a human audience. Most NLP systems, on the contrary, derive greater benefit from syntactic simplification, which focuses on identifying grammatical complexities in a sentence and converting these structures into simpler ones, using a set of text-to-text rewriting operations. Sentence splitting plays a major role here: it divides a sentence into several shorter components, with each of them presenting a simpler and more regular structure that is easier to process for downstream applications.
Many different methods for addressing the task of TS have been presented so far. As noted iň Stajner and Glavaš (2017), data-driven approaches outperform rule-based systems in the area of lexical simplification (Glavaš andŠtajner, 2015;Paetzold and Specia, 2016;Nisioi et al., 2017;Zhang and Lapata, 2017). In contrast, the state-of-the-art syntactic simplification approaches are rule-based (Siddharthan and Mandya, 2014;Ferrés et al., 2016;Saggion et al., 2015), providing more grammatical output and covering a wider range of syn-tactic transformation operations, however, at the cost of being very conservative, often to the extent of not making any changes at all. Acknowledging that existing TS corpora (Zhu et al., 2010;Coster and Kauchak, 2011;Xu et al., 2015) are inappropriate for learning to decompose sentences into shorter, syntactically simplified components, as they contain only a small number of split examples, Narayan et al. (2017) lately compiled the first TS dataset that explicitly addresses the task of sentence splitting. Using this corpus, several encoderdecoder models (Bahdanau et al., 2014) are proposed for breaking down a complex source into a set of sentences with a simplified structure. Aharoni and Goldberg (2018) further explore this idea, augmenting the presented neural models with a copy mechanism (Gu et al., 2016;See et al., 2017). Figure 1: Example of the output that is generated by our proposed TS approach. A complex input sentence is transformed into a semantic hierarchy of simplified sentences in the form of minimal, self-contained propositions that are linked via rhetorical relations.
In contrast to above-mentioned end-to-end neural approaches, we followed a more systematic approach. First, we performed an in-depth study of the literature on syntactic sentence simplification, followed by a thorough linguistic analysis of the syntactic phenomena that need to be tackled in the sentence splitting task. Next, we materialized our findings into a small set of 35 hand-crafted transformation rules that decompose sentences with a complex linguistic structure into shorter constituents that present a simpler and grammatically sound structure, leveraging downstream semantic applications whose predictive quality deteriorates with sentence length and complexity.
One of our major goals was to overcome the conservatism exhibited by state-of-the-art syntactic TS approaches, i.e. their tendency to retain the input sentence rather than transforming it. For this purpose, we decompose each source sentence into minimal semantic units and turn them into self-contained propositions. In that way, we provide a fine-grained output that is easy to process for subsequently applied NLP tools. Another major drawback of the structural TS approaches described so far is that they do not preserve the semantic links between the individual split components, resulting in a set of incoherent utterances. Consequently, important contextual information is lost, impeding the interpretability of the output for downstream semantic tasks. To prevent this, we establish a contextual hierarchy between the split components and identify the semantic relationship that holds between them. An example of the resulting output is displayed in Figure 1.

Related Work
To date, three main classes of techniques for syntactic TS with a focus on the task of sentence splitting have been proposed. The first uses a set of syntax-based hand-crafted transformation rules to perform structural simplification operations, while the second exploits machine learning (ML) techniques where the model learns simplification rewrites automatically from examples of aligned complex source and simplified target sentences. In addition, approaches based on the idea of decomposing a sentence into its main semantic constituents using a semantic parser were described.

Syntax-driven Rule-based Approaches
The line of work on structural TS starts with Chandrasekar et al. (1996), who manually defines a set of rules to detect points where sentences may be split, such as relative pronouns or conjunctions, based on chunking and dependency parse representations. Siddharthan (2002) presents a pipelined architecture for a simplification framework that extracts a variety of clausal and phrasal components from a source sentence and transforms them into stand-alone sentences using a set of hand-written grammar rules based on shallow syntactic features. More recently, Siddharthan and Mandya (2014) propose RegenT, a hybrid TS approach that combines an extensive set of 136 hand-written gram-mar rules defined over dependency tree structures for tackling 7 types of linguistic constructs with a much larger set of automatically acquired rules for lexical simplification. Taking a similar approach, Ferrés et al. (2016) describe a linguistically-motivated rule-based TS approach called YATS, which relies on part-of-speech tags and syntactic dependency information to simplify a similar set of linguistic constructs, using a set of only 76 hand-crafted transformation patterns in total. These two state-of-the-art rule-based structural TS approaches primarily target reader populations with reading difficulties, such as people suffering from dyslexia, aphasia or deafness. According to Siddharthan (2014), those groups most notably benefit from splitting long sentences that contain clausal constructions. Consequently, simplifying clausal components is the main focus of the proposed TS systems of this category.
Finally,Štajner and Glavaš (2017) present LEXEV and EVLEX, which combine a syntactic simplification approach that uses an even smaller set of 11 hand-written rules to perform sentence splitting and deletion of irrelevant sentences or sentence parts with an unsupervised lexical simplifier based on word embeddings (Glavaš anď Stajner, 2015).

Approaches based on Semantic Parsing
While the TS approaches described above are based on syntactic information, there are a variety of methods that use semantic structures for sentence splitting. These include the work of Narayan and Gardent (2014) and Narayan and Gardent (2016), who propose a framework that takes semantically-shared elements as the basis for splitting and rephrasing a sentence. It first generates a semantic representation of the input to identify splitting points in the sentence. In a second step, the split components are then rephrased by completing them with missing elements in order to reconstruct grammatically sound sentences. Lately, with DSS, Sulem et al. (2018c) describe another semantic-based structural simplification framework that follows a similar approach.

Data-driven Approaches
More recently, data-driven approaches for the task of sentence splitting emerged. Narayan et al. (2017) propose a set of sequence-to-sequence models trained on the WebSplit corpus, a dataset of over one million tuples that map a single com-plex sentence to a sequence of structurally simplified sentences. Aharoni and Goldberg (2018) further explore this idea, augmenting the presented neural models with a copy mechanism. Though outperforming the models used in Narayan et al. (2017), they still perform poorly compared to previous state-of-the-art rule-based syntactic simplification approaches. In addition, Botha et al. (2018) observed that the sentences from the WebSplit corpus contain fairly unnatural linguistic expressions using only a small vocabulary. To overcome this limitation, they present a scalable, languageagnostic method for mining training data from Wikipedia edit histories, providing a rich and varied vocabulary over naturally expressed sentences and their extracted splits. When training the best-performing model of Aharoni and Goldberg (2018) on this new split-and-rephrase dataset, they achieve a strong improvement over prior best results from Aharoni and Goldberg (2018). However, due to the uniform use of a single split per source sentence in the training set, each input sentence is broken down into two output sentences only. Consequently, the resulting simplified sentences are still comparatively long and complex.

Recursive Sentence Splitting
We present DISSIM, a recursive sentence splitting approach that creates a semantic hierarchy of simplified sentences. 1 The goal of our approach is to generate an intermediate representation that presents a simple and more regular structure which is easier to process for downstream semantic applications and may support a faster generalization in ML tasks. For this purpose, we cover a wider range of syntactic constructs (10 in total) than state-of-the-art rule-based syntactic frameworks. In particular, our approach is not limited to breaking up clausal components, but also splits and rephrases a variety of phrasal elements, resulting in a much more fine-grained output where each proposition represents a minimal semantic unit that is typically composed of a simple subject-predicate-object structure. Though tackling a larger set of linguistic constructs, our framework operates on a much smaller set of only 35 manually defined rules as compared to existing syntax-driven rule-based approaches.
With the help of the transformation patterns that we specified, source sentences that present a complex linguistic form are transformed into clean, compact structures by disembedding clausal and phrasal components that contain only supplementary information. These elements are then transformed into independent sentences. In that way, the source sentence is reduced to its key information ("core sentence") and augmented with a number of associated contextual sentences that disclose additional information about it, resulting in a novel hierarchical representation in the form of core sentences and accompanying contexts. Moreover, we identify the rhetorical relations by which core sentences and their associated contexts are connected in order to preserve their semantic relationship. The resulting representation of the source text, which we will call a "discourse tree" in the following, can then be used to facilitate a variety of artificial intelligence tasks, such as text summarization, MT, IE or opinion mining, among other.

Transformation Stage
The structural TS framework that we propose takes a sentence as input and performs a recursive transformation stage that is based upon 35 handcrafted grammar rules. Each rule defines how to split up and rephrase the input into structurally simplified sentences (subtask 1), establish a contextual hierarchy between the split components (subtask 2) and identify the semantic relationship that holds between those elements (subtask 3).
The transformation patterns are based on syntactic and lexical features that can be derived from a sentence's phrase structure. They were heuristically determined in a rule engineering process whose main goal was to provide a best-effort set of patterns, targeting the challenge of being applied in a recursive fashion and to overcome biased or incorrectly structured parse trees. We empirically determined a fixed execution order of the rules by examining which sequence achieved the best simplification results in a manual qualitative analysis conducted on a development test set of 100 randomly sampled Wikipedia sentences. The grammar rules are applied recursively in a top-down fashion on the source sentence, until no more simplification pattern matches. In that way, the input is turned into a discourse tree, consisting of a set of hierarchically ordered and semantically interconnected sentences that present a simplified syntax. Relative clauses (non-defining) 8 3b Relative clauses (defining) 5 4 Reported speech 4 Phrasal disembedding 5 Coordinate verb phrases (VPs) 1 6 Coordinate noun phrases (NPs) 2 7a Appositions (non-restrictive) 1 7b Appositions (restrictive) 1 8 Prepositional phrases (PPs) 3 9 Adjectival and adverbial phrases 2 10 Lead NPs 1 Total 35 Subtask 1: Sentence Splitting and Rephrasing. Each transformation rule takes a sentence's phrasal parse tree 4 as input and encodes a pattern that, in case of a match, will extract textual parts from the tree. The decomposed text spans, as well as the remaining text span are then transformed into new stand-alone sentences. In order to ensure that the resulting simplified output is grammatically sound, some of the extracted text spans are combined with their corresponding referents from the main sentence or appended to a simple phrase (e.g. "This is"). In that way, the simplification rules encode both the splitting points and rephrasing procedure for reconstructing proper sentences. Both coordinate and subordinate clauses, as well as various types of phrasal elements are addressed by our TS approach. Table 1 provides an overview of the linguistic constructs that are tackled, including the number of transformation patterns that were specified for the respective syntactic phenomenon. For a better understanding of the splitting and rephrasing procedure, Figure 2 visualizes the application of the first grammar rule that matches the given input sentence. The upper part of the box represents the complex input, which is matched against the simplification pattern. The lower part RULE TREGEX PATTERN EXTRACTED SENTENCE SharedNPPostCoordinationExtractor (for coordinate verb phrases) ROOT <<: (S < (NP $.. (VP < +(VP) (VP > VP $.. VP )))) NP + VP .
SubordinationPreExtractor (for adverbial clauses with pre-posed subordinative clauses) (3) "although" → Contrast (1) The funding will be delayed if Congress and President Bush fail to increase the Treasury's borrowing capacity.
(1) The Treasury will announce details of the November refunding on Monday.
(2) context (2) core Figure 2: (Subtask 1) The source sentence is split up and rephrased into a set of syntactically simplified sentences. (Subtask 2) Then, the split sentences are connected with information about their constituency type to establish a contextual hierarchy between them.
(Subtask 3) Finally, by identifying and classifying the rhetorical relations that hold between the simplified sentences, their semantic relationship is restored which can be used to inform downstream applications.
Subtask 2: Constituency Type Classification. Each split will create two or more sentences with a simplified syntax. In order to establish a contextual hierarchy between them, we connect them with information about their constituency type. According to Fay (1990), clauses can be related to one another in two ways: First, there are parallel clauses that are linked by coordinating conjunctions, and second, clauses may be embedded inside another, introduced by subordinating conjunctions. The same applies to phrasal elements.
Since the latter commonly express minor information, we denote them context sentences. In contrast, the former are of equal status and typically depict the key information contained in the input. Therefore, they are called core sentences in our approach. To differentiate between those two types of constituents, the transformation patterns encode a simple syntax-based approach where subordinate clauses and phrasal elements are classified as context sentences, while coordinate clauses/phrases are labelled as core. 5 Subtask 3: Rhetorical Relation Identification. Finally, we aim to determine intra-sentential semantic relationships in order to restore semantic relations between the disembedded components. For this purpose, we identify and classify the rhetorical relations that hold between the simplified sentences, making use of both syntactic and lexical features which are encoded in the transformation patterns. While syntactic features are manifested in the phrasal composition of a sentence's parse tree, lexical features are extracted from the parse tree in the form of cue phrases. The determination of potential cue words and their positions in specific syntactic environments is based on the work of Knott and Dale (1994). The extracted cue phrases are then used to infer the type of rhetorical relation. For this task we utilize a predefined list of rhetorical cue words adapted from the work of Taboada and Das (2013), which assigns them to the relation that they most likely trigger. For example, the transformation rule in Figure 2 spec-ifies that "although" is the cue word here, which is mapped to a "Contrast" relationship.

Final Discourse Tree
The leaf nodes resulting from the first simplification pass are recursively simplified in a topdown approach. When no more transformation rule matches, the algorithm stops. The final discourse tree for the example sentence of Figure 2 is shown in Figure 3.

Experimental Setup
To compare the performance of our TS approach with state-of-the-art syntactic simplification systems, we evaluated DISSIM with respect to the sentence splitting task (subtask 1). The evaluation of the rhetorical structures (subtasks 2 and 3) will be subject of future work.
Corpora. We conducted experiments on three commonly used simplification corpora from two different domains. The first dataset we used was Wikilarge, which consists of 359 sentences from the PWKP corpus (Xu et al., 2016). Moreover, to demonstrate domain independence, we compared the output generated by our TS approach with that of the various baseline systems on the Newsela corpus (Xu et al., 2015), which is composed of 1077 sentences from newswire articles. In addition, we assessed the performance of our simplification system using the 5000 test sentences from the WikiSplit benchmark (Botha et al., 2018), which was mined from Wikipedia edit histories.
Baselines. We compared our DISSIM approach against several state-of-the-art baseline systems that have a strong focus on syntactic transformations through explicitly modeling splitting operations. For Wikilarge, these include (i) DSS; (ii) SENTS (Sulem et al., 2018c), which is an extension of DSS that runs the split sentences through the NTS system (Nisioi et al., 2017); (iii) HYBRID (Narayan and Gardent, 2014); (iv) YATS; and (v) RegenT. In addition, we report evaluation scores for the complex input sentences, which allows for a better judgment of system conservatism, and the corresponding simple reference sentences. With respect to the Newsela dataset, we considered the same baseline systems, with the exceptions of DSS and SENTS, whose outputs were not available. Finally, regarding the WikiSplit corpus, we restricted the comparison to the best-performing system in Botha et al. (2018), Copy512, which is a sequence-to-sequence neural model augmented with a copy mechanism and trained over the Wiki-Split dataset.
Automatic Evaluation. The automatic metrics that were calculated in the evaluation procedure comprise a number of basic statistics, including (i) the average sentence length of the simplified sentences in terms of the average number of tokens per output sentence (#T/S); (ii) the average number of simplified output sentences per complex input (#S/C); (iii) the percentage of sentences that are copied from the source without performing any simplification operation (%SAME), serving as an indicator for system conservatism; and (iv) the averaged Levenshtein distance from the input (LD SC ), which provides further evidence for a system's conservatism. Furthermore, in accordance with prior work on TS, we report average BLEU (Papineni et al., 2002) and SARI (Xu et al., 2016) scores for the rephrasings of each system. 6 Finally, we computed the SAMSA and SAMSA abl score of each system, which are the first metrics that explicitly target syntactic aspects of TS (Sulem et al., 2018b).

Manual Analysis.
Human evaluation is carried out on a subset of 50 randomly sampled sentences per corpus by 2 non-native, but fluent English speakers who rated each input-output pair according to three parameters: grammaticality (G), meaning preservation (M) and structural simplicity (S) (see Section A of the appendix).
In order to get further insights into the quality of our implemented simplification patterns, we performed an extensive qualitative analysis of the 35 hand-crafted transformation rules, comprising a manual recall-based analysis of the simplification patterns, and a detailed error analysis.
Usefulness. Since the DISSIM framework that we propose is aimed at serving downstream semantic applications, we measure if an improvement in the performance of NLP tools is achieved when using our TS approach as a preprocessing step. For this purpose, we chose the task of Open IE (Banko et al., 2007) and determine whether such systems benefit from the sentence splitting approach presented in this work.

Results and Discussion
Automatic Evaluation. The upper part of Table  3 reports the results that were achieved on the 359 sentences from the Wikilarge corpus, using a set of automatic metrics. Transforming each sentence of the dataset, our DISSIM approach reaches the highest splitting rate among the TS systems under consideration, together with HYBRID, DSS and SENTS. With 2.82 split sentences per input on average, our framework outputs by a large margin the highest number of structurally simplified sentences per source. Moreover, consisting of 11.01 tokens on average, the DISSIM approach returns the shortest sentences of all systems. The relatively high word-based Levenshtein distance of 11.90 confirms previous findings.
With regard to SARI, our DISSIM framework (35.05) again outperforms the baseline systems. However, it is among the systems with the lowest BLEU score (63.03). Though, Sulem et al. (2018a) recently demonstrated that BLEU is inappropriate for the evaluation of TS approaches when sentence splitting is involved, since it negatively correlates with structural simplicity, thus penalizing sentences that present a simplified syntax, and presents no correlation with the grammaticality and meaning preservation dimensions. For this reason, we only report these scores for the sake of completeness and to match past work. According to Sulem et al. (2018b), the recently proposed SAMSA and SAMSA abl scores are better suited for the evaluation of the sentence splitting task. With a score of 0.67, the DISSIM framework shows the best performance for SAMSA, while its score of 0.84 for SAMSA abl is just below the one obtained by the RegenT system (0.85). 7 The results on the Newsela dataset, depicted in the middle part of Table 3, support our findings on the Wikilarge corpus, indicating that our TS approach can be applied in a domain independent manner. The lower part of Table 3 illustrates the numbers achieved on the WikiSplit dataset. Though the Copy512 system beats our approach in terms of BLEU and SARI, the remaining scores are clearly in favour of the DISSIM system.
Manual Analysis. The results of the human evaluation are displayed in Table 4. The interannotator agreement was calculated using Cohen's κ, resulting in rates of 0.72 (G), 0.74 (M) and 0.60 (S). The assigned scores demonstrate that our DISSIM approach outperforms all other TS systems in the S dimension. With a score of 1.30 on the Wikilarge sample sentences, it is far ahead of the baseline approaches, with HYBRID (0.86) coming closest. However, this system receives the lowest scores for G and M. RegenT obtains the highest score for G (4.64), while YATS is the best-performing approach in terms of M (4.60). However, with a rate of only 0.22, it achieves a low score for S, indicating that the high score in the M dimension is due to the conservative approach taken by YATS, resulting in only a small number of simplification operations. This explanation also holds true for RegenT's high mark for G. Still, our DISSIM approach follows closely, with a score of 4.50 for M and 4.36 for G, suggesting that it obtains its goal of returning finegrained simplified sentences that achieve a high level of grammaticality and preserve the meaning of the input. Considering the average scores of all systems under consideration, our approach is the best-performing system (3.39), followed by Re-genT (3.16). The human evaluation ratings on the Newsela and WikiSplit sentences show similar results, again supporting the domain independence of our proposed approach.
The results of the recall-based qualitative analysis of the transformation patterns, together with the findings of the error analysis are illustrated in Section B of the appendix in Tables 9 and 10. Concerning the quality of the implemented simplification rules, the percentage of sentences that were correctly split was approaching 100% for coordinate and adverbial clauses, and exceeded 80% on average. achieves the highest correlation for M.    Table 5: Improvements when using DISSIM as a preprocessing step.
Usefulness. To investigate whether our proposed structural TS approach is able to improve the performance of downstream NLP tasks, we compare the performance of a number of state-of-the-art Open IE systems, including ClausIE (Del Corro and Gemulla, 2013), OpenIE-4 (Mausam, 2016), REVERB (Fader et al., 2011), OLLIE (Mausam et al., 2012) and Stanford Open IE (Angeli et al., 2015), when directly operating on the raw input data with their performance when our DISSIM framework is applied as a preprocessing step. For this purpose, we made use of the Open IE benchmark framework proposed in Stanovsky and Dagan (2016). 8 The results are displayed in Figure 4. The resulting improvements in overall precision, recall and area under the curve (AUC) are listed in Table  5. The numbers show that when using our DISSIM framework, all systems under consideration gain in AUC. The highest improvement in AUC was achieved by Stanford Open IE, yielding a 597% increase over the output produced when acting as a stand-alone system. AUC scores of REVERB and OLLIE improve by 57% and 20%. While REVERB primarily profits from a boost in recall (+40%), ClausIE, OLLIE and OpenIE-4 mainly improve in precision (+50%, +38% and +20%).

Comparative Analysis
In the following, we compare our TS framework with state-of-the-art rule-based syntactic TS approaches and discuss the strengths and weaknesses of each system. Sentence Splitting. Table 6 compares the output generated by the TS systems RegenT and YATS on a sample sentence. As can be seen, RegenT and YATS break down the input into a sequence of sentences that present its message in a way that is easy to digest for human readers. However, the sentences are still rather long and present an irregular structure that mixes multiple semantically unrelated propositions, potentially causing problems for downstream tasks. On the contrary, our fairly aggressive simplification strategy that splits a source sentence into a large set of very short sentences 9 is rather inapt for a human audience and may in fact even hinder reading comprehension. Though, we were able to demonstrate that the transformation process we propose can improve the performance of downstream NLP applications.

Input
The house was once part of a plantation and it was the home of Josiah Henson, a slave who escaped to Canada in 1830 and wrote the story of his life.

RegenT
The    Glavaš, 2017). 9 In the output generated by DISSIM, contextual sentences are linked to their referring sentences and semantically classified by rhetorical relations. The number indicates the sentences' context layer cl. Sentences with cl = 0 carry the core information of the source, whereas sentences with a cl≥1 provide contextual information about a sentence with a context layer of cl-1.
Text Coherence. The vast majority of syntactic simplification approaches do not take into account discourse-level aspects, producing a disconnected sequence of simplified sentences which results in a loss of cohesion that makes the text harder to interpret (Siddharthan, 2014). However, two notable exceptions have to be mentioned. Siddharthan (2006) was the first to use discourse-aware cues in one of RegenT's predecessor systems, with the goal of generating a coherent output, e.g. by choosing appropriate determiners ("This slave" in Table 6). However, as opposed to our approach, where a semantic relationship is established for each output sentence, only a comparatively low number of sentences is linked by such cue words in Siddharthan (2006)'s framework (and its successors). EVLEX and LEXEV also operate on the discourse level. They are semantically motivated, eliminating irrelevant information from the input by maintaining only those parts of the input that belong to factual event mentions. Our approach, on the contrary, aims to preserve the full informational content of a source sentence, as illustrated in Table 7. By distinguishing core from contextual information, we are still able to extract only the key information given in the input.

Conclusion
We presented a recursive sentence splitting approach that transforms structurally complex sentences into a novel hierarchical representation in the form of core sentences and accompanying contexts that are semantically linked by rhetorical relations. In a comparative analysis, we demonstrated that our TS approach achieves the highest scores on all three simplification corpora with regard to SAMSA (0.67, 0.57, 0.54), and comes no later than a close second in terms of SAMSA abl (0.84, 0.84, 0.84), two recently proposed metrics targeted at automatically measuring the syntactic complexity of sentences. These findings are supported by the other scores of the automatic evaluation, as well as the manual analysis. In addition, the extrinsic evaluation that was carried out based on the task of Open IE verified that downstream semantic applications profit from making use of our proposed structural TS approach as a preprocessing step. In the future, we plan to investigate the constituency type classification and rhetorical relation identification steps and port this approach to languages other than English. Table 8 lists the questions for the human annotation. Since the focus of our work is on structural rather than lexical simplification, we follow the approach taken in Sulem et al. (2018c) in terms of SIMPLICITY and restrict our analysis to the syntactic complexity of the resulting sentences, which is measured on a scale that ranges from -2 to 2 in accordance with Nisioi et al. (2017), while neglecting the lexical simplicity of the output sentences. Regarding the GRAMMATICAL-ITY and MEANING PRESERVATION dimensions, we adopted the guidelines fromŠtajner and Glavaš (2017), with some minor deviations to better reflect our goal of simplifying the structure of the input sentences, while retaining their full informational content.

B Qualitative Analysis of the Transformation Patterns and Error Analysis
Tables 9 and 10 show the results of the recallbased qualitative analysis of the transformation patterns, together with the findings of the error analysis. These analyses were carried out on a dataset which we compiled. 10 It consists of 100 Wikipedia sentences per syntactic phenomenon tackled by our TS approach. In the construction of this corpus we ensured that the collected sentences exhibit a great syntactic variability to allow for a reliable predication about the coverage and accuracy of the specified simplification rules.
Note that we do not consider the rules for disembedding adjectival/adverbial phrases and lead NPs, since an examination of the frequency distribution of the syntactic constructs tackled by our approach over the Wikilarge, Newsela and Wiki-Split test sentences has shown that these types of constructs occur relatively rarely.  Table 9: Recall-based qualitative analysis of the transformation rule patterns. This table presents the results of a manual analysis of the performance of the handcrafted simplification patterns. The first column lists the syntactic phenomena under consideration, the second column indicates its frequency in the dataset, the third column displays the percentage of the grammar fired, and the fourth column reveals the percentage of sentences where the transformation operation results in a correct split. 10 The dataset is available under https://github. com/Lambda-3/DiscourseSimplification/ tree/master/supplemental_material.