Fine-Grained Discourse Structures in Continuation Semantics

In this work, we are interested in the computation of logical representations of discourse. We argue that all discourse connectives are anaphors obeying different sets of constraints and show how this view allows one to account for the semantically parenthetical use of attitude verbs and verbs of report (e.g., think, say) and for sequences of conjunctions (A CONJ_1 B CONJ_2 C). We implement this proposal in event semantics using de Groote (2006)’s dynamic framework.


Introduction
The aim of a theory of discourse such as Rhetorical Structure Theory (RST, Mann and Thompson 1988) or Segmented Discourse Representation Theory (SDRT, Asher and Lascarides 2003) is to explain the structure of text beyond the sentence level, usually through a set of discourse relations (DRs; e.g., Explanation, Elaboration, Contrast). 1 This structure is not only of theoretical interest but has also proved valuable for several Natural Language Processing (NLP) and Computational Linguistics (CL) applications, such as Question Answering (Narasimhan and Barzilay, 2015;Jansen et al., 2014) or Machine Translation (Guzmán et al., 2014;Tu et al., 2014).
The vast majority of the NLP and CL world relies on statistical rather that symbolic methods. Yet, logic-based systems, which are closer to the linguistic theories, can be a viable alternative, especially for inference-related problems (Bos and Markert, 2005;Bjerva et al., 2014;Abzianidze, 2015). That is the direction we advocate for; grounded in the fields of formal grammar and formal semantics, we are interested in the computation of logical representations of discourse. Following Asher and Pogodalla (2011); Qian and Amblard (2011), we argue that it is not necessary to extends syntax beyond the sentence level, as a dynamic framework such as the one presented by de Groote (2006) and based on continuation semantics, allows one to handle discourse relations with a traditional lexicalized grammar.
In particular, this paper shows how a system of anaphora resolution-independently required for the interpretation of pronouns (she, it) and discourse adverbials (then, otherwise)-along with an appropriate representation of propositional attitude verbs and verbs of report (AVs, e.g., think, say) can be used to account for the non-alignment between syntactic and discourse arguments (Dinesh et al., 2005;Hunter and Danlos, 2014) observed for instance in (1). 2 In these discourses, although the AV with its subject (Jane said) is part of the syntactical arguments of the connectives, it is not considered part of the corresponding discourse arguments and is said to be evidential. Evidential status impacts, among other things, the inferences that can be drawn, in particular on the beliefs of the author (Danlos and Rambow, 2011;Hunter, 2016;Hunter and Asher, forthcoming).
(1) (from Hunter and Danlos 2014) a. John didn't come to the party although Jane said he was back in town. b. Lots of people are coming to my party.
2 Following the notation convention of the Penn Discourse Treebank (PDTB, Prasad et al. 2007), the two arguments of relevant discourse relations-named "Arg 1 " and "Arg 2 "are shown in italic and bold, while the connectives lexicalizing them, if any, are underlined. Jane said, for example, that Fred is coming with his whole family.
This article is organized as follows. In Section 2, we present the anaphoric character of adverbial connectives. In Section 3, we start by reviewing the notion of (semantically) parenthetical reporta category that subsumes evidential reports-and we highlight its relation with discourse connectives. Next, we sketch our main contribution, namely that parenthetical reports can be modeled by assuming that all connectives behave anaphorically, even though different classes of connectives obey different sets of constraints. These ideas are implemented formally using continuation semantics in Section 4. In Section 5, we discuss related work and Section 6 concludes the article.

Adverbial connectives as anaphors
In English, using a discourse connective-a word that lexicalizes a DR, such as although and for example in (1) above-is the most direct and reliable way to express a DR. The three main categories of discourse connectives are COORDINATE CON-JUNCTIONS (e.g., and, or), SUBORDINATE CON-JUNCTIONS (e.g., because, although) and ADVER-BIALS (e.g., for example, otherwise). Webber et al. (2003) argue that in contrast with the first two types, jointly called "structural connectives", adverbials are interpreted anaphorically. In other words, the arguments of adverbials cannot be determined by syntax alone (nor an extension of syntax using similar notions of dependency or constituency) and are found in or derived from the context in a similar fashion as the antecedents of nominal anaphoric expressions (e.g., she).
While Webber et al. (2003); Webber (2004) outline D-LTAG, a discourse grammar incorporating anaphoric elements for adverbial connectives, nothing is said about the resolution of the anaphors. In contrast, our approach considers a traditional lexicalized sentence-level grammar such as Combinatorial Categorial Grammar (CCG, Steedman and Baldridge 2011), a formalism for which parsing is an active research topic (Lewis et al., 2016;Ambati et al., 2016), and we focus here on the semantic part of the lexicon, embedding explicitly the anaphoric process in the computation of the semantics of the discourse. In addition, we will see in the next section that considering that structural connectives do sometime behave anaphorically too accounts for (non-)parenthetical reports in a simple way.

Parenthetical reports 3.1 Intensionality and evidentiality
It was observed in Urmson (1952) that some verbs, called parentheticals, can have a special meaning when used with the first person of the present simple. In these cases, the verb is not used primarily to describe an event or a state, but rather to indicate "the emotional significance, the logical relevance or the reliability" of a statement. As an illustration, Urmson (1952) provides sentences in (2), in which I suppose is used to signal a certain degree of reliability (low or moderate) of the speaker's opinion. 3 (2) a. I suppose that your house is very old. b. Your house, I suppose, is very old. c. Your house is very old, I suppose.
It appears that this behaviour is not limited to the first person present. Indeed, Simons (2007) cites dialogue (3) as an example, where Henry thinks that is described as an evidential, indicating the source of its complement (she's left town), which is the main point of the sentence. This evidential use is opposed to the traditional nonparenthetical (or intensional) use, for which the AV carries the main point of the sentence as in (4) (also from Simons 2007). 4 Only when Henry thinks that is interpreted as evidential can (3b) be accepted as a valid answer to (3a). Things are similar with monologue; in (5) (from Hunter and Danlos 2014), the evidential use of Jane said allows he is out of town to be argument of an implicit Explanation relation. 5 (3) a. Why isn't Louise coming to our meetings these days? b. Henry thinks that she's left town.
(4) a. Why didn't Henry invite Louise to the party? b. He thinks that she's left town.

(5)
Fred didn't come to my party. Jane said he is out of town.
The ability to account for both uses of AVs is of theoretical and practical interest. First, one might expect an efficient NLP system to be able to make the difference between, for instance, cases where a report is given as an explanation (as in (4)) and cases where the explanation is only the object of the report (as in (3) or (5)). Also, propositions reported by an evidential are interpreted as, if not true, at least possibly true, information that is valuable for reasoning systems. According to Hunter (2016); Hunter and Asher (forthcoming), parenthetical reports are related to modal (or hedged) DRs: the Explanation in (5) is modalized ( Explanation) and entails (at least) the possibility of both of its arguments. While they focus on implicit DRs, they seem to extend their claim to explicit ones, such as (6) (or (1) above). According to Danlos and Rambow (2011), however, the relation in (6) is not hedged and a strong revision of propositional attitude occurs: one infers that the speaker agrees with Jill's report.

(6)
John didnt come to the party. Instead, Jill said that he went to dinner with his brother. (from Hunter and Asher forthcoming) This last question seems hard to settle without conducting a proper experiment on native speakers and is out of the scope of the present article, which aims at modelling through anaphorlike properties of connectives how DRs receive their arguments and how this process gives rise to (non-)parenthetical interpretations of AVs. Therefore, we will not here take stance on the matter but instead explain how both views can be accommodated within our proposal.

Two classes of explicit connectives
Hunter and Danlos (2014) argue that some connectives, such as because, restrict the reports in their scope to the intensional interpretation, while others, such as for example or although, behave like the implicit connective in (5). In this example, while an implicit because is perfectly fine and lead to an evidential interpretation of the report, the use of an explicit connective is not compatible with the evidential interpretation (7a). 6 Only an 6 A "*" marks an unavailable/ungrammatical analysis intensional interpretation could be accepted: however in this particular case (7b), it corresponds to a very unnatural reading. For example, on the contrary, does not suffer from the same limitations (8): the explicit connective is compatible with the evidential interpretation (8b).
(7) a. *Fred didn't come to my party because Jane said he is out of town. b. #Fred didn't come to my party because Jane said he is out of town.
(8) Lots of people are coming to my party.
a. Jane said that Fred is coming with his whole family. b. For example, Jane said that Fred is coming with his whole family.
Independently, Haegeman (2004) argues that adverbial clauses (i.e., subordinate clauses that function as adverbs) are composed of two classes: central adverbial clauses and peripheral ones. Several syntactic and semantic phenomena distinguish between them; in particular, negation and modal operators present in a matrix clause can also scope over a central clause as in (9), which can either mean that the rain makes Fred happy or that Fred is sad for a reason other than the rain. On the other hand, such elements cannot scope over a peripheral one, as illustrated by (10), which unambiguously expresses a contrast between Fred's happiness and the rain. It appears that all the subordinate conjunctions allowing parenthetical reports mentioned by Hunter and Danlos (2014) introduce peripheral clauses while the ones that do not allow them all introduce central clauses. We think that this is no coincidence and will thus call "central" the connectives that allow parenthetical reports and "peripheral" the ones that do not. 7 (9) Fred is not sad because it is raining.
(10) Fred is not sad although it is raining.
The non-alignment between syntactic and discourse arguments resulting from the parenthetical use of AVs is in no way exceptional. 8 Uswhile a "#" indicates a semantically rejected one. 7 Some ambiguous connectives can introduce both type of adverbial clauses. For instance, while is central when used temporally and peripheral when used contrastively. 8 The term "non-alignment"-or sometimes "mismatch" (Prasad et al., 2008)-is used to describe a DR Rel lexicalized by a connective CONN such that the (discourse) arguments of the former do not directly correspond to the (syntactic) arguments of the latter.
ing the PDTB Browser 9 , we have calculated that in the PDTB, 12.7% of the all explicit relations attributed to the writer have at least one of their arguments attributed to another agent, principally due to the use of an evidential. This proportion is even higher (26.9%) for implicit relations, which most of the time (98.0%) can be accounted for via an implicit (i.e., morphologically empty) adverbial connectives at the beginning of a clause or sentence (Prasad et al., 2008).

Evidentiality and anaphora
Consider a sentence of the form A CONN Jane says X and label e A the propositional content of A, e B the content of Jane says X, e the content of the report X and e the content of the full sentence. We propose that no connectives are really fully structural, but all behave anaphorically, in the sense that their discourse arguments are not determined by syntax alone. In consequence, these discourse arguments are not necessarily the propositional contents of their syntactic arguments (in this case e A and e B respectively). However, these anaphors are constrained by a few rules. The first one applies to all connectives: a discourse argument must have been introduced by the corresponding syntactic argument (in this case, e A is the only candidate for Arg 1 , but both e B and e are candidates for Arg 2 ). The second applies only to central connectives: these cannot "decompose" a clause headed by an AV to access the report (here, for instance, e ) but have to stop at the AV itself (here e B ). A third rule is introduced at the end of the section.
This explains why the two sentences in (11) are acceptable: although, a peripheral connective, has access to both e B and e which can be selected as Arg 2 depending on their semantics. 10 In contrast, because is central and so in the present configuration uses necessarily e B for Arg 2 ; in consequence, the AV is always interpreted intentionally, which 9 http://bit.ly/2zfrTNr 10 It has been argued that there is no mismatch between syntax and discourse in (11b) and that the two sentences in (11) have the same structure (Hardt, 2013). The argument is based on the idea that if there is a contrast between A and B and if agent X speaks truthfully, then there is a contrast between A and X SAYS B. One of the issues with this view is that it fails to account for the differences between (non-)parenthetical uses of AVs; in particular, if (11) have the same structure, how does one infer that the speaker/writer can reject the truth of the complement of the AV in (11a) but not in (11b)? In addition, while the given argument might be intuitively appealing for Concession and Contrast, extending it to other DRs such as the one lexicalized by for example would require to drastically weaken the meaning of those DRs. predicts the acceptability of (12a) and the incoherence (in most contexts) of (12b).
(11) a. Fred came e A although e Sabine said e B she hated e him. b. Fred came e A although e Sabine says e B he was sick e .
(12) a. Fred came e A because e Sabine said e B she liked e him. b. # Fred came e A because e Sabine says e B he had recovered e .
We propose that a third constraint applies to all connectives: when its syntactic argument contains a conjunction, a connective is able to decompose it to access the matrix clause, as in (13b), but not the embedded one. This constraint disambiguates between the two possible bracketings of A CONJ 1 B CONJ 2 C structures: when the Arg 1 of the relation lexicalized by CONJ 2 is the content of either A or the whole A CONJ 1 B then the bracketing is as in (13), when instead this Arg 1 is the content of B, then the bracketing is as in (14) This idea of handling connectives as restricted anaphors can probably be put in practice in various ways; in the remainder of this article we have chosen to implement it in a logical system based on λ-calculus.

Continuation semantics as a dynamic framework
The notion of continuation has emerged in the theory of computer programming in relation to the idea of order of evaluation (see Reynolds 1993 on the history of continuation). It has proved very useful in the understanding of natural language too (Barker and Shan, 2014) and in particular, it forms the basis of de Groote (2006)'s framework for dynamic semantics, i.e., a system accounting for the context-change nature of sentences and in particular, the possibility for a sentence to make reference to entities introduced previously in the discourse (Asher, 2016). A continuized function takes a continuation-which is a representation of some further computation-as an additional argument. This function is then free to execute or not its continuation and (if the continuation is itself a function taking an argument) with what argument. According to a similar principle, in the continuation semantics of de Groote (2006), a sentence is a function that takes as argument not only its left context, but also its continuation, i.e., the remaining portion of the discourse, whose argument is meant to be the context updated with the information expressed by the proposition. Such a framework, based on Church (1940)'s simply typed λ-calculus, is able to handle complex dynamic phenomena (Lebedeva, 2012;Qian, 2014). In particular, an anaphora is modelled using a selection function, a term representing the algorithmic process of determining (from the context) the reference of the anaphoric expression. For instance, the pronoun she uses a selection function sel she that, provided a context c, returns a feminine individual mentioned in c. 11 One of the advantages of de Groote (2006)'s framework over other dynamic systems-such as Kamp and Reyle (1993)'s DRT or Groenendijk and Stokhof (1991)'s DPL-is that it relies entirely on usual mathematical notions; in particular, variables behave standardly and variable renaming, a critical operation to avoid clashes and loss of information (the destructive assignment problem), is handled by the classical operation of α-conversion. 12 We add to the continuation semantics of Lebedeva (2012); Qian (2014) a basic type for propositional referential markers. Mathematically, those propositional markers are similar to the event variables of event semantics (Davidson, 1967), according to which Marie walk is translated as ∃e. walk (e, Marie), i.e., "there exists an event that is a walking by Marie"; the main difference is that those markers denote propositions and are thus suitable to represent the complements of AVs. This move allows us to reuse the anaphora system of continuation semantics for propositional anaphora at no cost. We consider here that any 11 Describing the implementation of the selection functions is out of the scope of this work; however, we make sure that their arguments are informative enough for them to be mathematically defined.
12 See Hindley and Seldin (1986) for more about λcalculus. sentence describes such a propositional marker, which is provided to the semantic translation of the sentence as an argument, and can additionally introduce other markers in the context when, for instance, it contains a report or a discourse connective.

Sentence-level analysis
The meaning of a single sentence is computed as usual, according to a syntactic parse and the semantic entries of the lexicon; Table 1 below shows the parts of the lexicon that are relevant to the current discussion. For the sentence Fred came, the result is given by a in Table 2. This term has three arguments (as all dynamic propositions): a propositional marker e, a context c and a continuation φ (the variable representing the subsequent sentences). It states that e is about Fred coming, and passes the context updated with this description of e (i.e., p :: c) to its continuation. 13

AVs
Because a verb such as think has a propositional complement, it corresponds here to a threeplace predicate, relating the proposition being constructed (about the thinking), the thinker, and the proposition describing what is thought. Crucially, because the two propositions are represented by objects of the same logical type, they can both be referred to anaphorically in the same way. Note how think in Table 1 introduces the marker e , described by the complement P (the proposition embedded under think). The meaning of Eva thinks he recovered is given in b 2 of Table 2: this term states that e is about Eva thinking e , which is about "he" (note the selection function that has to find a reference in the context) having recovered.
It is important to remark that the object of a thought (or of any report that is not factive; Karttunen 1971) is not necessarily a true proposition. Therefore, merely stating the existence of a propositional marker, as in think , does not imply that the corresponding proposition is true. This means that at some point, we will have to indicate when propositions are true; this will be achieved through a predicate true and an entailment relation over makers: a ⊃ b true(a) → true(b). Explanation(e, sel C (eA, c ), sel C (eB, c ))

Conjunctions
As AVs, conjunctions introduce propositional markers; in this case, one for each syntactic argument. We said earlier that all connectives behave, at least to some extent, anaphorically. In our proposition, this corresponds to the fact that the two propositional variables e A and e B transmitted to the two syntactic arguments (A and B, respectively), are not hard-wired as the discourse arguments of the relation lexicalized by the connective; instead, two types of selection functions are used: sel C and sel P , for central and peripheral connectives respectively. These functions have two arguments: the first one is the marker representing the whole corresponding syntactic argument (e A or e B ) and the second one is a context. If the context has been judiciously updated, the selection function has then all the information needed to respect the constraints it is subject to and retrieve the correct discourse argument. All central conjunctions have a lexical entry similar to because given in Table 1. This term can be understood sequentially: for A and B, e (the marker of the whole A because B proposition), the left context c and a continuation φ: i) e A , a marker whose truth is entailed by the truth of e, is described by executing A; ii) similarly, e B is described by executing B; iii) the relation Explanation between two anaphorically determined propositions (one from e A , the other from e B ) is stated (this is the description of e); iv) the remaining φ of the discourse is executed. This order of evaluation is expressed through intermediate continuations, which are written so that the context is appropriately updated from the beginning to the end: the input context of the connective is c, (p 1 :: c) is given to A which gives back c , then (p 2 :: c ) is given to B which gives back c and finally the connective transmits (p 3 :: c ) to its continuation.
The (unnatural) sentence Fred came because Eva thinks he had recovered therefore leads to the term c in Table 2: because of the three constraints applying to sel C (in particular the impossibility of accessing the content of a report), there is no ambiguity in the discourse arguments of the explanation, which are e A (about the coming) and e B (about the thinking). This corresponds to an intensional interpretation of the AV which can be judged inappropriate based on world-knowledge.
The entries for peripheral conjunctions (e.g., although in Table 1) only differ in the use of the sel P selection function instead of sel C . The sentence Fred came although Eva thinks he was sick is translated into term c of Table 2: while sel P (e A , c ) is necessarily resolved as e A itself (because of the first rule), sel P (e B , c ) could potentially be either e B (intensional interpretation) or e (evidential one), the latter being indicated by world-knowledge.

Discourse analysis 4.3.1 Discourse update
To actually compute full discourses, two additional elements are needed (see Lebedeva 2012, who expresses discourse dynamics through continuations and an exception raising/handling mechanism but does not account for DRs); they are shown in Table 1. The first one, D i , is the initial (content-empty) discourse, which simply contains some initial context c i that is passed to its continuation. 14 The second is the dupd operator, that updates a discourse D with a sentence S, by transferring the context from the former to the latter and introducing a new true propositional marker. 15

Adverbials
The adverbial connectives, an example of which is given as however in Table 1, are very similar to the conjunctions of the previous section. The only difference is that as they lack one syntactic argument, only one propositional marker (e B ) is introduced, while the other has to be determined anaphorically from the left context c with an unconstrained selection function. The discourse Fred came. However, Sabine thinks he is sick is translated into term d of Table 2; it is very similar to c , only the selection of Arg 1 is different.

Hedging DRs?
So far, we have been considering that explicit connectives always introduced "plain" (unmodalized) DRs. By simply adding as axioms that veridical DRs such as Explanation or Concession (Asher and Lascarides, 2003) entail the truth of their arguments (R(e, e A , e B ) ⇒ e ⊃ e A ∧ e ⊃ e B ), we obtain the strong revision of propositional attitude proposed by Danlos and Rambow (2011). However, to get the "hedged DR" interpretation advocated for by Hunter (2016), one can modify the terms of the connectives along the following lines: use a conditional statement to introduce a modalized propositional marker for the DR if one of the selected arguments has been introduced by an AV (this piece of information is present in the context), directly use the provided (unmodalized) marker otherwise.

Related work
The idea of using de Groote (2006)'s continuation semantics framework for computing discourse structure was first discussed by Asher and Pogodalla (2011), who were interested in integrating SDRT more tightly with syntax. They outlined a system that does so, giving explicitly a lexical entry for adverbial connectives that uses a selection function to recover its Arg 1 . Qian and Amblard (2011) defend a very similar proposition, but focus on implicit DRs and use an event-based semantics instead of SDRT, in which the discourse arguments are events rather than discourse speech acts (DSA). Their account, as ours, is expressed in a logical language that is simpler than the one of SDRT, which uses labels that name DSA (Asher and Lascarides, 2003); in consequence, all the discourse that they and we treat are directly and entirely (including the DRs) translated in first order logic, ready to be used by theorem provers and model builders. However, considering that discourse arguments are propositions allows us to handle DRs which takes as arguments the complements of propositional attitude verbs (which arguably are propositions and not events nor DSA).
These two previous works both focused on the general principles of introducing DRs in continuation semantics and how to ensure the accessibility constraint (for the selection functions) known as the Right Frontier Constraint (Asher and Lascarides, 2003). This constraint is not only of linguistics interest, it also naturally lowers the ambiguity of anaphors and thus reduces the computation required for the selection algorithms. However, the solutions proposed in these two articles can easily be implemented in our particular proposition as ensuring this constraint is orthogonal to the issues mainly discussed here, namely the variation in anaphoric properties of discourse connectives and the interpretation of AVs.
The distinction between central and peripheral conjunctions and their interaction with AVs has been formally modeled by Bernard and Danlos (2016). In particular, they account for the scope phenomena distinguishing the two classes of subordinating conjunctions discussed in Haegeman (2004)-which we do not. However, their proposition is heavily dependent on the syntactic aspect ∧ Concession(e, sel P (eA, p5 :: . . . p1 :: c), sel P (eB, p5 :: . . . p1 :: c)) p 6 ∧φ(p6 :: . . . p1 :: c) d dupd(Di(a))( however (b2)) = λφ. ∃eA. true(eA) ∧ sick (e , sel he (p5 :: . . . p1 :: ci)) p 6 ∧ Contrast(e, sel (p3 :: . . . p1 :: ci), sel P (eB, p6 :: . . . p1 :: ci)) p 7 ∧φ(p7 :: . . . p1 :: ci) Table 2: Some examples of terms discussed in Section 4. Term b 2 (used in c ) is obtained by replacing recover with sick in b 2 . of the formalism they use, namely STAG (Shieber and Schabes, 1990), while we are more agnostic about this part of the grammar. Furthermore, they model the difference between (non-)parenthetical uses of AVs as a lexical ambiguity (the idea being that the parenthetical version of AVs are only compatible with peripheral connectives), whereas, in line with Simons (2007)'s analysis, we see it as a pragmatic ambiguity concerning the argument of discourse connectives. We achieve this through the use of selection functions, a mechanism independently motivated by pronominal anaphora and adverbial connectives. This allows us to process whole discourses with a limited set of tools while they only account for subordinating conjunctions (i.e., intra-sentential DRs).
Building on Hunter (2016)'s analysis, Hunter and Asher (forthcoming) present a coercion mechanism to compositionally derive in SDRT the correct discourse structure of instances involving evidential reports with implicit connectives. However, their solution does not account for examples involving an evidential with an explicit DR, such as (8b), which remain for them problematic. Note that the present account smoothly extends to implicit DRs under the assumption that they are introduced by implicit adverbial connectives (similar to however in Table 1).

Conclusion
We have argued that all discourse connectivesnot the adverbials only-should be treated as anaphors, with different classes of connectives obeying different anaphoric constraints. We have shown that this view allows one to account for semantically parenthetical reports without postulating any ad-hoc lexical ambiguity concerning the status of AVs. Instead, the parenthetical interpretation is viewed here as a product of the discourse structure itself. The same mechanism also handles sequences of conjunctions (A CONJ 1 B CONJ 2 C). We have shown how to implement this proposal in de Groote (2006)'s dynamic framework. Such a framework makes it possible to handle discourse semantics without the need of a syntactic parse above the sentence level, and in a strictly compositional way using continuations.