Integrated sentence generation using charts

Integrating surface realization and the generation of referring expressions into a single algorithm can improve the quality of the generated sentences. Existing algorithms for doing this, such as SPUD and CRISP, are search-based and can be slow or incomplete. We offer a chart-based algorithm for integrated sentence generation and demonstrate its runtime efficiency.


Introduction
It has long been argued (Stone et al., 2003) that the strict distinction between surface realization and sentence planning in the classical NLG pipeline (Reiter and Dale, 2000) can cause difficulties for an NLG system.Generation decisions that look good to the sentence planner may be hard or impossible for the realizer to express in natural language.Furthermore, a standalone sentence planner must compute each RE separately, thus missing out on opportunities for succinct REs that are ambiguous in isolation but correct in context (Stone and Webber, 1998).
Algorithms such as SPUD (Stone et al., 2003) and CRISP (Koller and Stone, 2007) perform surface realization and parts of sentence planning, including RE generation, in an integrated fashion.Such integrated algorithms for sentence generation can balance the needs of the realizer and the sentence planner and take advantage of opportunities for succinct realizations.However, integrated sentence planning multiplies the complexities of two hard combinatorial problems, and thus existing, search-based algorithms can be inefficient or fail to find a good solu-tion; SPUD's greedy search strategy may even find no solution at all, even when one exists.
By contrast, chart-based algorithms have been shown in parsing to remain efficient and accurate even for large inputs because they support structure sharing and very effective pruning techniques.Chart algorithms have been successfully applied to surface realization (White, 2004;Gardent and Kow, 2005;Carroll and Oepen, 2005;Schwenger et al., 2016), but in RE generation, most algorithms are not chartbased, see e.g.(Dale and Reiter, 1995;Krahmer et al., 2003).One exception is the chart-based RE generation of Engonopoulos and Koller (2014).
In this paper, we present a chart-based algorithm for integrated surface realization and RE generation.This makes it possible -to our knowledge, for the first time -to apply chart-based pruning techniques to integrated sentence generation.Our algorithm extends the chart-based RE generation algorithm of Engonopoulos and Koller (2014) by keeping track of the semantic content that has been expressed by each chart item.Because it is modular on the string side, the same algorithm can be used to generate with context-free grammars or TAGs, with or without feature structures, at no expense in runtime efficiency.An open-source implementation of our algorithm, based on the Alto system (Gontrum et al., 2017), can be found at bitbucket.org/tclup/alto.

Chart-based integrated generation
We first describe the grammar formalism we use.Then we explain the sentence generation algorithm and discuss its runtime performance.

Semantically interpreted grammars
We describe the integrated sentence generation problem in terms of semantically interpreted grammars (SIGs) (Engonopoulos and Koller, 2014), a special case of Interpreted Regular Tree Grammars (Koller and Kuhlmann, 2011).We introduce SIGs by example, and refer to Engonopoulos and Koller (2014) for detailed definitions.
An example SIG grammar is shown in Fig. 2. At the core of each grammar rule is a rule of the form A a → f (B b , . . ., Z z ).The symbols A, . . ., Z are nonterminals such as S, NP, VP, and a, . . ., z are semantic indices, i.e. constants for individuals indicating to which object in the model a natural-language expression is intended to refer.These core rules allow us to recursively derive a derivation tree, such as the one shown in Fig. 1b, representing the abstract syntactic structure of a natural-language expression.
Each derivation tree t is mapped to a string through a function I S .This function is defined recursively for each rule of the grammar.In the example grammar of Fig. 2, we have I S (white r 2 ) = white, i.e. the word "white".Given a string w 1 , we have I S (rabbit r 2 )(w 1 ) = w 1 • rabbit, where "•" is string concatenation.This means that given a subtree t which evaluates to the string w 1 , a string for the tree rabbit r 2 (t ) is constructed by appending the word "rabbit" after w 1 .For the complete derivation tree t in Fig. 1, we obtain I S (t) = "the white rabbit sleeps".
At the same time and in the same way, the derivation tree is also evaluated to a set of referents through a function I R .Constants, such as rabbit and sleep, are interpreted as subsets of and relations between the individuals in a given model.For instance, in the model of Fig. 3, rabbit denotes the set {r 1 , r 2 }, whereas in denotes the relation for all e, a ∈ sleep:   These relations are then combined using intersection R 1 ∩ i R 2 (yielding the subset of elements of R 1 whose i-th component is an element of the set R 2 ), projection [R] i (yielding the set of i-th components of the tuples in R), and the uniqueness checker uniq a (R) (yielding R if R = {a} and ∅ otherwise).Under this interpretation, the derivation tree in Fig. 1 maps to the set {e}, given the model in Fig. 3.
Observe, finally, that each rule is annotated with a "for all" clause, which creates instances of the rule for each tuple of individuals that satisfies the condition.We also mention that although we only use unary attributes such as "rabbit" and "white" in this example grammar, SIGs deal easily with relational attributes such as "in" or "next to"; see Engonopoulos and Koller (2014).

Integrated sentence generation with SIGs
Engonopoulos and Koller (2014) describe an algorithm which, given a SIG grammar G and a set R of target referents, will compute a chart describing all derivations t of G with I R (t) = R -that is, all semantically valid REs for R.
Here we extend both SIGs and this algorithm to include surface realization.We assume that the generation algorithm is given a set N of semantic atoms in addition to the grammar G and referent set R, and should return only derivations that refer to R while expressing at least all the atoms in N .We achieve this by adding an interpretation I N to SIGs, such that I N (t) will return a set of semantic atoms expressed by the derivation tree t.We have added such I N clauses to the grammar in Fig. 2. For example, the rule for white r 2 expresses the set {white(r 2 )} of semantic atoms.The rule for rabbit r 2 evaluates to the disjoint union of {rabbit(r 2 )} with whatever its "Adj" subtree expressed.Thus, the I N interpretation keeps track of the semantic atoms expressed by each subtree of a derivation tree; in the example of Fig. 1, we see that the derivation tree as a whole expresses the semantic atoms {sleep(e, r 2 ), rabbit(r 2 ), white(r 2 )}.
Given a grammar G, target referent set R, and target semantic content N , we can now compute a chart that describes all derivation trees t of G such that I R (t) = R and I N (t) ⊇ N ; thus this algorithm performs surface realization and RE generation at the same time.The algorithm, shown in Fig. 4 in the form of a parsing schema (Shieber et al., 1995), computes chart items [A, R, N ] in a bottom-up fashion.Such an item states that there is a tree t such that t can be derived from A, and we have I R (t) = R and I N (t) = N .Given k items for subtrees derived from the nonterminals B 1 , . . ., B k as premises and a rule r that can combine these into a nonterminal A, the algorithm creates a new item for A in which the I R and I N functions for that rule were applied to the referent sets and semantic contents of the subtrees.Given the inputs R and N , we define a goal item to be an item [S, R, N ] where S is the start symbol of the grammar and N ⊇ N .Each goal item the algorithm discovers thus represents a sentence that achieves the given communicative goals.
An example run of the algorithm, for the inputs (sleep e , 8) Figure 5: Excerpt from the chart for "The white rabbit sleeps."R = {e} and N = {rabbit(r 2 )}, is shown in Fig. 5.Each row in the chart corresponds to one application of the rule in Fig. 4. The grammar rule that was used, along with any premises, is given in brackets to the right.Observe that the only goal item, ( 9), corresponds to the derivation in Fig. 1, and hence the output string "the white rabbit sleeps"; the derivation can be reconstructed by following the backpointers to the premise items recursively.Observe also that (10) -for "the white thing sleeps" -is not a goal item because its semantic content is not a superset of N .The item (11) -for "the rabbit sleeps" -is not a goal item either: Its referent set is empty because ( 8) is not a unique RE for r 2 , and thus the term uniq r 2 (R 1 ) in the "sleep" rule evaluates to the empty set.Thus, the algorithm performs both surface realization and RE generation.

Generating succinct REs in context
One advantage of integrating surface realization with RE generation is that REs can be more succinct in the context of a larger grammatical construction than in isolation.The shortest standalone RE for r 1 in Fig. 3 is "the brown rabbit", but it is perfectly felicitous to say "take the rabbit from the hat".Stone and Webber (1998) explain this in terms of the presuppositions of the verb "take X from Y", which say that X must be in Y, and thus the REs X and Y can mutually constrain each other.They also show how the SPUD algorithm can generate such succinct REs in the context of the verb, by global reasoning over the referent sets of all REs in the sentence.
Our algorithm can generate such REs as well, and can do it in an efficient, chart-based way.Assume R = {e 2 } and N = {takefrom(e 2 , r 1 , h 1 )} and the grammar in Fig. 2. The chart algorithm will construct items for the sub-derivation-trees t 1 = def r 1 (rabbit r 1 (nop r 1 )) ("the rabbit") and t 2 = def h 1 (hat h 1 (nop h 1 )) ("the hat"), with R 1 = I R (t 1 ) = {r 1 , r 2 } and R 2 = I R (t 2 ) = {h 1 , h 2 }; thus, these REs are not by themselves unique.These trees are then combined with the rule takefrom e 2 ,r 1 ,h 1 .This rule intersects R 1 with the set of things that are "in" an element of R 2 , encoding the presupposition of "take X from Y". Thus R 1 ∩ 1 [in ∩ 2 R 2 ] 1 evaluates to {r 1 }, satisfying the uniqueness condition.Thus, the algorithm returns t = takefrom e 2 ,r 1 ,h 1 (t 1 , t 2 ) as a valid realization.
Note that we achieved the ability to let REs mutually constrain each other by moving the requirement for semantic uniqueness to the verb that subcategorizes for the RE.This is in contrast to the standard assumption that it is the definite article that requires uniqueness, but permits us a purely grammar-based treatment of mutually constraining REs which requires no further reasoning capabilities.

Chart generation with heuristics
Our algorithm can enumerate all subsets N of the true semantic atoms in the model, and thus has worst-case exponential runtime.This is probably unavoidable, given that surface realization and the generation of shortest REs are both NP-complete (Koller and Striegnitz, 2002;Dale and Reiter, 1995).
However, because it is a chart-based algorithm, we can use heuristics to avoid computing the whole chart, and thus speed up the search for the best solution.To get an impression of this, assume that we are looking for a short sentence; other optimization criteria are also possible.We first compute the full chart C R for the I R part of the input alone, using essentially the same algorithm as Engonopoulos and Koller (2014).From C R we compute the distance of each chart item to a goal item, i.e. the minimal number of rules that must be applied to the item to produce a goal item.We then refine the items of C R by adding the I N parts to each chart item.Unprocessed chart items [A, R, N ] are organized on an agenda which is ordered lexicographically by the number of atoms in the target semantic content that are not yet realized in N and then the distance of [A, R] to a goal item in C R .We stop the chart computation once the first goal item has been found.
Using this pruning strategy, we measured runtimes with problems from the GIVE Chal-lenge (Koller et al., 2010) on an 2.9 GHz Intel Core i5 CPU. 1 We compared the performance of our system against CRISP, which uses the FF planner (Hoffmann and Nebel, 2001) to perform the search.For CRISP, we only measured the time spent in running the planner.On the most complex scene from GIVE that we tried, our system took 13 ms to generate the sentence "Push the button to the left of the flower", outperforming CRISP which generated the same sentence in 50 ms.Note that it is possible to construct (not entirely realistic) inputs for the generator on which FF's much more sophisticated search strategy outperforms the heuristic described above.By incorporating such a heuristic into chart generation, e.g. as in Schwenger et al. (2016), our system could be accelerated further.

Conclusion
We have presented a chart-based algorithm for integrated surface realization and RE generation.Compared to earlier approaches to integrated sentence generation, our algorithm can exploit the capabilities of charts for structure-sharing and pruning to achieve higher runtime performance in practice.We have only presented a simple pruning strategy here, but it would be astraightforward extension to incorporate pruning strategies from surface realization (White, 2004;Schwenger et al., 2016).
One advantage of our algorithm is that it is agnostic of the grammar formalism that is used on the string side.We have used context-free rules for reading off string representations from the generated derivation trees, but because SIGs are special case of IRTGs, we could instead use a tree-adjoining grammar to construct strings instead (Koller and Kuhlmann, 2012).In fact, the runtime experiments in Section 2.4 were based on a TAG grammar to allow direct comparison with CRISP.
With the algorithm presented here, it may become feasible for the first time to perform integrated sentence generation in the context of practical applications.So far, grammars that support this lag far behind grammars for surface realization in size and complexity.It would thus be interesting to either convert existing surface realization grammars to SIGs, or to learn such grammars from data.
for all a ∈ white: Adj a → whitea I S (whitea) = white I R (whitea) = white I N (whitea) = {white(a)} for all a ∈ U : Adj a → nop a I S (nop a ) = I R (nop a ) = U I N (nop a ) = ∅ for all e, a, b ∈ takefrom(e, a, b): Se → takefrom e,a,b (NPa, NP b ) I S (takefrom e,a,b