Learning to Map Context-Dependent Sentences to Executable Formal Queries

We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that updates after each turn, and can copy sub-sequences of previously predicted queries during generation. Our approach combines implicit and explicit modeling of references between utterances. We evaluate our model on the ATIS flight planning interactions, and demonstrate the benefits of modeling context and explicit references.


Introduction
The meaning of conversational utterances depends strongly on the history of the interaction. Consider a user querying a flight database using natural language (Figure 1). Given a user utterance, the system must generate a query, execute it, and display results to the user, who then provides the next request. Key to correctly mapping utterances to executable queries is resolving references. For example, the second utterance implicitly depends on the first, and the reference ones in the third utterance explicitly refers to the response to the second utterance. Within an interactive system, this information needs to be composed with mentions of database entries (e.g., Seattle, next Monday) to generate a formal executable representation. In this paper, we propose encoder-decoder models that directly map user utterances to executable queries, while considering the history of the interaction, including both previous utterances and their generated queries.
Reasoning about how the meaning of an utterance depends on the history of the interaction is critical to correctly respond to user requests. As interactions progress, users may omit previouslymentioned constraints and entities, and an increas-show me flights from seattle to boston next monday [   (Hemphill et al., 1990;Dahl et al., 1994). Each request is followed by a description of the system response.
ing portion of the utterance meaning must be derived from the interaction history. Figure 2 shows SQL queries for the utterances in Figure 1. As the interaction progresses, the majority of the generated query is derived from the interaction history (underlined), rather than from the current utterance. A key challenge is resolving what past information is incorporated and how. For example, in the figure, the second utterance depends on the set of flights defined by the first, while adding a new constraint. The third utterance further refines this set by adding a constraint to the constraints from both previous utterances. In contrast, the fourth utterance refers only to the first one, and skips the two utterances in between. 1 Correctly generating the fourth query requires understanding that the time constraint (at 7pm) can be ignored as it follows an airline constraint that has been replaced.
We study complementary methods to enable this type of reasoning. The first set of methods implicitly reason about references by modifying the encoder-decoder architecture to encode information from previous utterances for generation decisions. We experiment with attending over previous utterances and using an interaction-level recurrent encoder. We also study explicitly maintaining a set of referents using segments from pre-x1: show me flights from seattle to boston next mondaȳ y1: (SELECT DISTINCT flight.flight id FROM flight WHERE (flight.from airport IN (SELECT airport service.airport code FROM airport service WHERE airport service.city code IN (SELECT city.city code FROM city WHERE city.city name = 'SEATTLE'))) AND (flight.to airport IN (SELECT airport service.airport code FROM airport service WHERE airport service.city code IN (SELECT city.city code FROM city WHERE city.city name = 'BOSTON'))) AND (  vious queries. At each step, the decoder chooses whether to output a token or select a segment from the set, which is appended to the output in a single decoding step. In addition to enabling references to previously mentioned entities, sets, and constraints, this method also reduces the number of generation steps required, illustrated by the underlined segments in Figure 2. For example, the queryȳ 2 will require 17 steps instead of 94. We evaluate our approach using the ATIS (Hemphill et al., 1990;Dahl et al., 1994) task, where a user interacts with a SQL flight database using natural language requests, and almost all queries require joins across multiple tables. In addition to reasoning about contextual phenomena, we design our system to effectively resolve database values, including resolution of time expressions (e.g., next monday in Figure 1) using an existing semantic parser. Our evaluation shows that reasoning about the history of the interaction is necessary, relatively increasing performance by 28.6% over a baseline with no access to this information, and that combining the implicit and explicit methods provides the best performance. Furthermore, our analysis shows that our full approach maintains its performance as interaction length increases, while the performance of systems without explicit modeling deteriorates. Our code is available at https://github.com/clic-lab/atis.

Technical Overview
Our goal is to map utterances in interactions to formal executable queries. We evaluate our approach with the ATIS corpus (Hemphill et al., 1990;Dahl et al., 1994), where users query a realistic flight planning system using natural language. The system responds by displaying tables and database entries. User utterances are mapped to SQL to query a complex database with 27 tables and 162K entries. 96.6% of the queries require joins of different tables. Section 7 describes ATIS.
Task Notation Let I be the set of all interactions, X the set of all utterances, and Y the set of all formal queries. A user utterancex ∈ X of length |x| is a sequence x 1 , . . . , x |x| , where each x i is a natural language token. A formal querȳ y ∈ Y of length |ȳ| is a sequence y 1 , . . . , y |ȳ| , where each y i is a formal query token. An inter-actionĪ ∈ I is a sequence of n utterance-query pairs (x 1 ,ȳ 1 ), . . . , (x n ,ȳ n ) representing an interaction with n turns. To refer to indexed interactions and their content, we markĪ (l) as an interaction with index l, the i-th utterance and query inĪ (l) asx i,j . At turn i, we denote the interaction history of length i − 1 asĪ[: i − 1] = (x 1 ,ȳ 1 ), . . . , (x i−1 ,ȳ i−1 ) . GivenĪ[: i − 1] and utterancex i our goal is to generateȳ i , while considering bothx i andĪ[: i − 1]. Following the ex-ecution ofȳ i , the interaction history at turn i + 1 becomesĪ[: i] = (x 1 ,ȳ 1 ), . . . , (x i ,ȳ i ) . Model Our model is based on the recurrent neural network (RNN; Elman, 1990) encoderdecoder framework with attention (Cho et al., 2014;Sutskever et al., 2014;Bahdanau et al., 2015;Luong et al., 2015). We modify the model in three ways to reason about context from the interaction history by attending over previous utterances (Section 4.2), adding a turn-level recurrent encoder that updates after each turn (Section 4.3), and adding a mechanism to copy segments of queries from previous utterances (Section 4.4). We also design a scoring function to score values that are abstracted during pre-processing, including entities and times (Section 6). The full model selects between generating query tokens and copying complete segments from previous queries. Learning We assume access to a training set that contains N interactions {Ī (l) } N l=1 . We train using a token-level cross-entropy objective (Section 5). For models that use the turn-level encoder, we construct computational graphs for the entire interaction and back-propagate the loss for all queries together. Without the turn-level encoder, each utterance is processed separately. Evaluation We evaluate using a test set We measure the accuracy of each utterance for each test interaction against the annotated query and its execution result. For models that copy segments from previous queries, we evaluate using both predicted and gold previous queries.
Recovering context-independent executable representations has been receiving increasing attention.
Mapping sentence in isolation to SQL queries has been studied with ATIS using statistical parsing (Popescu et al., 2004;Poon, 2013) and sequence-to-sequence models (Iyer et al., 2017). Generating executable programs was studied with other domains and formal languages (Giordani and Moschitti, 2012;Ling et al., 2016;Zhong et al., 2017;Xu et al., 2017). Recently, various approaches were proposed to use the formal language syntax to constrain the search space (Yin and Neubig, 2017;Rabinovich et al., 2017;Cheng et al., 2017) making all outputs valid programs. These contributions are orthogonal to ours, and can be directly integrated into our decoder.
Generating context-dependent formal representations has received less attention. Miller et al. (1996) used ATIS and mapped utterances to semantic frames, which were then mapped to SQL queries. For learning, they required full supervision, including annotated parse trees and contextual dependencies. 2 Zettlemoyer and Collins (2009) addressed the problem with lambda calculus, using a semantic parser trained separately with context-independent data. In contrast, we generate executable formal queries and require only interaction query annotations for training.
Recovering context-dependent meaning was also studied with the SCONE (Long et al., 2016) and SequentialQA (Iyyer et al., 2017) corpora. We compare ATIS to these corpora in Section 7. Resolving explicit references, a part of our problem, has been studied as co-reference resolution (Ng, 2010). Context-dependent language understanding was also studied for dialogue systems, including with ATIS, as surveyed by Tür et al. (2010). More recently, encoder-decoder methods were applied to dialogue systems (Peng et al., 2017;, including using hierarchical RNNs (Serban et al., 2016(Serban et al., , 2017, an architecture related to our turn-level encoder. These approaches use slot-filling frames with limited expressivity, while we focus on the original representation of unconstrained SQL queries.

Context-dependent Model
We base our model on an encoder-decoder architecture with attention (Cho et al., 2014;Sutskever et al., 2014;Bahdanau et al., 2015;Luong et al., 2015). At each interaction turn i, given the current utterancex i and the interaction historyĪ[: i − 1], the model generates the formal queryȳ i . Figure 3 illustrates our architecture. We describe the base architecture, and gradually add components.

Base Encoder-Decoder Architecture
Our base architecture uses an encoder to process the user utterancex i = x i,1 , . . . , x i,|x i | and a decoder to generate the output queryȳ i token-bytoken. This architecture does not observe the interaction historyĪ[: i − 1].
The encoder computes a hidden state where LSTM − → E is a long short-term memory recurrence (LSTM; Hochreiter and Schmidhuber, 1997) and φ x is a learned embedding function for input tokens. The backward RNN recurs in the opposite direction with separate parameters.
We generate the query with an RNN decoder. The decoder state at step k is: where LSTM D is a two-layer LSTM recurrence, φ y is a learned embedding function for query tokens, and c k is an attention vector computed from the encoder states. y i,0 is a special start token, and c 0 is a zero-vector. The initial hidden state and cell memory of each layer are initialized as h E The attention vector c k is a weighted sum of the encoder hidden states: where W A is a learned matrix. The probabilities of output query tokens are computed as: where W m , W o , and b o are learned. 3 We omit the memory cell (often denoted as cj) from all LSTM descriptions. We use only the LSTM hidden state hj in other parts of the architecture unless explicitly noted.

Incorporating Recent History
We provide the model with the most recent interaction history by concatenating the previous h utterances x i−h , ...,x i−1 with the current utterance in order, adding a special delimiter token between each utterance. The concatenated input provides the model access to previous utterances, but not to previously generated queries, or utterances that are more than h turns in the past. The architecture remains the same, except that the encoder and attention are computed over the concatenated sequence of tokens. The probability of an output query token is computed the same, but is now conditioned on the interaction history:

Turn-level Encoder
Concatenating recent utterances to provide access to recent history has computational drawbacks. The encoding of the utterance depends on its location in the concatenated string. This requires encoding all recent history for each new utterance, and does not allow re-use of computation between utterances during encoding. It also introduces a tradeoff between computation cost and expressivity: attending over the h previous utterances allows the decoder access to the information in these utterances when generating a query, but is computationally more expensive as h increases. We address this by encoding each utterance once. To account for the influence of the interaction history on utterance encoding, we maintain a discourse state encoding h I i computed with a turn-level recurrence, and use it during utterance encoding. The state is maintained and updated over the entire interaction. At turn i, this model has access to the complete prefix of the interactionĪ[: i − 1] and the current requestx i . In contrast, the concatenationbased encoder (Section 4.2) has access only to information from the previous h utterances. We also use positional encoding in the attention computation to account for the position of each utterance relative to the current utterance.
Formally, we modify Equation 1 to encodex i : where h I i−1 is the discourse state following utterancex i−1 . LSTM ← − E is modified analogously. In contrast to the concatenation-based model, the recurrence processes a single utterance. The dis- . . Figure 3: Illustration of the model architecture during the third decoding step while processing the instruction which ones arrive at 7pm from the interaction in Figure 2. The current discourse state h I 2 is used to encode the current utterancex 3 (Section 4.3). Query segments from previous queries are encoded into vector representations (Section 4.4). In each generation step, the decoder attends over the previous and current utterances, and a probability distribution is computed over SQL tokens and query segments. Here, segments 1 is selected.
course state h I i is computed as Similar to the concatenation-based model, we attend over the current utterance and the h previous utterances. We add relative position embeddings φ I to each hidden state. These embeddings are learned for each possible distance 0, . . . , h − 1 from the current utterance. We modify Equation 2 to index over both utterances and tokens: In contrast to the concatenation model, without position embeddings, the attention computation has no indication of the utterance position, as our ablation shows in Section 8. The attention distribution is computed as in Equation 3, and normalized across all utterances. The position embedding is also used to compute the context vector c k :

Copying Query Segments
The discourse state and attention over previous utterances allow the model to consider the interaction history when generating queries. However, we observe that context-dependent reasoning often requires generating sequences that were generated in previous turns. Figure 2 shows how segments (underlined) extracted from previous utterances are predominant in later queries. To take advantage of what was previously generated, we add copying of complete segments from previous queries by expanding the set of outputs at each generation step. This mechanism explicitly models references, reduces the number of steps re-quired to generate queries, and provides an interpretable view of what parts of a query originate in context. Figure 3 illustrates this architecture. Extracting Segments Given the interaction his-toryĪ[: i − 1], we construct the set of segments S i−1 by deterministically extracting subtrees from previously generated queries. 4 In our data, we extract 13 ± 5.9 (µ ± σ) segments for each annotated query. Each segments ∈ S i−1 is a tuple a, b, l, r , where a and b are the indices of the first and most recent queries,ȳ a andȳ b , in the interaction that contain the segment. l and r are the start and end indices of the segment inȳ b . Encoding Segments We represent a segment s = a, b, l, r using the hidden states of an RNN encoding of the queryȳ b . The hidden states h Q 1 , ..., h Q |ȳ b | are computed using a bi-directional LSTM RNN similar to the utterance encoder (Equation 1), except using separate LSTM parameters and φ y to embed the query tokens. The embedded representation of a segment is a concatenation of the hidden states at the segment endpoints and an embedding of the relative position of the utterance where it appears first: where φ g is a learned embedding function of the position of the initial queryȳ a relative to the current turn index i. We learn an embedding for each relative position that is smaller than g, and use the same embedding for all other positions. Generation with Segments At each generation step, the decoder selects between a single query token or a segment. When a segment is selected, it is appended to the generated query, an embedded segment representation for the next step is computed, and generation continues. The probability of a segments = a, b, l, r at decoding step k is: where m k is computed in Equation 5 and W S is a learned matrix. To simplify the notation, we assign the segment to a single output token. The output probabilities (Equations 7 and 9) are normalized together to a single probability distribution. When a segment is selected, the embedding used as input for the next generation step is a bagof-words encoding of the segment. We extend the output token function φ y to take segments: The recursion in φ y is limited to depth one because segments do not contain other segments.

Inference with Full Model
Given an utterancex i and the history of interaction I[: i − 1], we generate the queryȳ i . An interaction starts with the user providing the first utterancex 1 . The utterance is encoded using the initial discourse state h I 0 , the discourse state h I 1 is computed, the queryȳ 1 is generated, and the set of segments S 1 is created. The initial discourse state h I 0 is learned, and the set of segments S 0 used when generatingȳ 1 is the empty set. The attention is computed only over the first utterance because no previous utterances exist. The user then provides the next utterance or concludes the interaction. At turn i, the utterancex i is encoded using the discourse state h I i−1 , the discourse state h I i is computed, and the queryȳ i is generated using the set of segments S i−1 . The model has no access to future utterances. We use greedy inference for generation. Figure 3 illustrates a single decoding step.

Learning
We assume access to a training set of N interactions {Ī (l) } N l=1 . Given an interactionĪ (l) , each utterancex i . The set of segments from previous utterances is deterministically extracted from the annotated queries during learning. However, the data does not indicate what parts of each query originate in segments copied from previous utterances. We adopt a simple approach and heuristically identify context-dependent segments based on entities that appear in the utterance and the query. 5 Once we identify a segment in the annotated query, we replace it with a unique placeholder token, and it appears to the learning algorithm as a single generation decision. Treating this decision as latent is an important direction for future work. Given the segment copy decisions, we minimize the token cross-entropy loss: where k is the index of the output token. The base and recent-history encoders (Sections 4.1 and 4.2) can be trained by processing each utterance separately. For these models, given a mini-batch B of utterances, each identified by an interactionutterance index pair, the loss is the mean token loss The turn-level encoder (Section 4.3) requires building a computation graph for the entire interaction. We update the model parameters for each interaction. The interaction loss is where B is the batch size, and n B re-normalizes the loss so the gradient magnitude is not dependent on the number of utterances in the interaction. Our ablations (−batch re-weight in Table 2) shows the importance of this term. For both cases, we use teacher forcing (Williams and Zipser, 1989).

Reasoning with Anonymized Tokens
An important practical consideration for generation in ATIS and other database domains is reasoning about database values, such as entities, times, and dates. For example, the first utterance in Figure 2 includes two entities and a date reference. With limited data, learning to both reason about a large number of entities and to resolve dates are challenging for neural network models. Following previous work (Dong and Lapata, 2016;Iyer et al., 2017), we address this with anonymization, where the data is pre-and post-processed to abstract over tokens that can be heuristically resolved to tokens in the query language. In contrast to previous work, we design a special scoring function to anonymized tokens to reflect how they are used in the input utterances. Figure 4 illustrates preprocessing in ATIS. For example, we use a temporal semantic parser to resolve dates (e.g., next  Figure 4: An example of date and entity anonymization pre-processing forx 1 andȳ 1 in Figure 2. Monday) and replace them with day, month, and year placeholders. To anonymize database entries, we use a dictionary compiled from the database (e.g., to map Seattle to SEATTLE). The full details of the anonymization procedure are provided in the supplementary material. Following preprocessing, the model reasons about encoding and generation of anonymized tokens (e.g., CITY#1) in addition to regular output tokens and query segments from the interaction history. Anonymized tokens are typed (e.g., CITY), map to a token in the query language (e.g., 'BOSTON'), and appear both in input utterances and generated queries. We modify our encoder and decoder embedding functions (φ x and φ y ) to map anonymized tokens to the embeddings of their types (e.g., CITY). The type embeddings in φ x and φ y are separate. Using the types only, while ignoring the indices, avoids learning biases that arise from the arbitrary ordering of the tokens in the training data. However, it does not allow distinguishing between entries with the same type for generation decisions; for example, the common case where multiple cities are mentioned in an interaction. We address this by scoring anonymized token based on the magnitude of attention assigned to them at generation step k. The attention magnitude is computed from the encoder hidden states. This computation considers both the decoder state and the location of the anonymized tokens in the input utterances to account for how they are used in the interaction. The probability of an anonymized token w at generation step k is where s k (t, j) is the attention score computed in Equation 8. This probability is normalized to-  gether with the probabilities in Equations 7 and 9 to form the complete output probability.

Experimental Setup
Hyperparameters, architecture details, and other experimental choices are detailed in the supplementary material. Data We use ATIS (Hemphill et al., 1990;Dahl et al., 1994) to evaluate our approach. The data was originally collected using wizard-of-oz experiments, and annotated with SQL queries. Each interaction was based on a scenario given to a user. We observed that the original data split shares scenarios between the train, development, and test splits. This introduces biases, where travel patterns that appeared during training repeat in testing. For example, a model trained on the original data split often correctly resolves the exact referenced by on Saturday with no pre-processing or access to the document date. We evaluate this overfitting empirically in the supplementary material. We re-split the data to avoid this bias. We evenly distribute scenarios across splits so that each split contains both scenarios with many and few representative interactions. The new split follows the original split sizes with 1148/380/130 train/dev/test interactions. Table 1 shows data statistics. The system uses a SQL database of 27 tables and 162K entries. 96.6% of the queries require at least one join, and 93% at least two joins. The most related work on ATIS to ours is Miller et al. (1996), which we discuss in Section 3. The most related corpora to ATIS are SCONE (Long et al., 2016) and Sequen-tialQA (Iyyer et al., 2017). SCONE (Long et al., 2016) contains micro-domains consisting of stack-or list-like elements. The formal representation is linguistically-motivated and the majority of queries include a single binary predicate. All interactions include five turns. SequentialQA (Iyyer et al., 2017) contains sequences of questions on a single Wikipedia table.
Interactions are on average 2.9 turns long, and were created by re-phrasing a question from a context-independent corpus (Pasupat and Liang, 2015). In contrast, ATIS uses a significantly larger database, requires generating complex queries with multiple joins, includes longer interactions, and was collected through interaction with users. The supplementary material contains analysis of the contextual phenomena observed in ATIS. Pre-processing We pre-process the data to identify and anonymize entities (e.g., cities), numbers, times, and dates. We use string matching heuristics to identify entities and numbers, and identify and resolve times and dates using UWTime (Lee et al., 2014). When resolving dates we use the original interaction date as the document time. The supplementary material details this process. Metrics We evaluate using query accuracy, strict denotation accuracy, and relaxed denotation accuracy. Query accuracy is the percentage of predicted queries that match the reference query. Strict denotation accuracy is the percentage of predicted queries that execute to exactly the same table as the reference query. In contrast to strict, relaxed gives credit to a prediction query that fails to execute if the reference table is empty. In cases when the utterance is ambiguous and there are multiple gold queries, we consider the query or table correct if they match any of the gold labels. Systems We evaluate four systems: (a)  the baseline encoder-decoder model (Section 4.1); (b) SEQ2SEQ-H: encoderdecoder with attention on current and previous utterances (Section 4.2); (c) S2S+ANON: encoderdecoder with attention on previous utterances and anonymization scoring (Section 6); and (d) FULL: the complete approach including segment copying (Section 4.4). For FULL, we evaluate with predicted and gold (FULL-GOLD) previous queries, and without attention on previous utterances . All models except SEQ2SEQ-0 and FULL-0 use h = 3 previous utterances. We limit segment copying to segments that appear in the most recent query only. 6 Unless specifically ablated, all experiments use pre-processing. Table 2 shows development and test results. We run each experiment five times and report mean and standard deviation. The main metric we focus on is strict denotation accuracy. The relatively low performance of SEQ2SEQ-0 demon-  strates the need for context in this task. Attending on recent history significantly increases performance. Both SEQ2SEQ models score anonymized tokens as regular vocabulary tokens. Adding anonymized token scoring further increases performance (S2S+ANON). FULL-0 and FULL add segment copying and the turn-level encoder. The relatively high performance of FULL-0 shows that substituting segment copying with attention maintains and even improves the system effectiveness. However, the best performance is provided with FULL, which combines both. This shows the benefit of redundancy in accessing contextual information. Unlike the other systems, both FULL and FULL-0 suffer from cascading errors due to selecting query segments from previously incorrect predictions. The higher FULL-GOLD performance illustrates the influence of error propagation. While part of this error can be mitigated by having both attention and segment copying, this behavior is unlikely to be learned from supervised learning, where errors are never observed.

Results
Ablations show that all components contribute to the system performance. Performance drops when using a concatenation-based encoder instead of the turn-level encoder (−turn-level enc.;   from the interaction history lowers performance (−query segments; Section 4.4). Treating indexed anonymized tokens as regular tokens, rather than using attention-based scoring and type embeddings, lowers performance (−anon. scoring; Section 6). Finally, pre-processing, which includes anonymization, is critical (−pre-processing).
Figure 5(a) shows the performance as interactions progress. All systems show a drop in performance after the first utterance, which is always context-independent. As expected, SEQ2SEQ-0 shows the biggest drop. The FULL approach is the most stable as the interaction progresses. Figure 5(b) shows the performance as we decrease the number of previous utterances used for attention h. Without the turn-level encoder and segment copying (SEQ2SEQ-H and S2S+ANON), performance decreases significantly as h decreases. In contrast, the FULL model shows a smaller decrease (1.5%). The supplementary material includes attention analysis demonstrating the importance of previous-utterance attention. However, attending on fewer utterances improves inference speed: FULL-0 is 30% faster than FULL.
Finally, while we re-split the data due to scenario sharing between train and test early in development and used this split only for development, we also evaluate on the original split (Table 3). We report mean and standard deviation over three trials. The high performance of S2S+ANON potentially indicates it benefits more from the differences between the splitting procedures.

Analysis
We analyze errors made by the full model on thirty development interactions. When analyzing the output of FULL, we focus on error propagation and analyze predictions that resulted in an incorrect table when using FULL, but a correct table when using FULL-GOLD. 56.7% are due to selection of a segment that contained an incorrect constraint. 43.4% of the errors are caused by a necessary segment missing during generation. 93.0% of all predictions are valid SQL and follow the database schema. We also analyze the errors of FULL-GOLD. We observe that 30.0% of errors are due to generating constraints that were not mentioned by the user. Other common errors include generating relevant constraints with incorrect values (23.3%) and missing constraints (23.3%).
We also evaluate our model's ability to recover long-distance references while constraints are added, changed, or removed, and when target attributes change. The supplementary material includes the analysis details. In general, the model resolves references well. However, it fails to recover constraints mentioned in the past following a user focus state change (Grosz and Sidner, 1986).

Discussion
We study models that recover context-dependent executable representations from user utterances by reasoning about interaction history. We observe that our segment-copying models suffer from error propagation when extracting segments from previously-generated queries. This could be mitigated by training a model to ignore erroneous segments, and recover by relying on attention for generation. However, because supervised learning does not expose the model to erroneous states, a different learning approach is required. Our analysis demonstrates that our model is relatively insensitive to interaction length, and is able to recover both explicit and implicit references to previouslymentioned entities and constraints. Further study of user focus change is required, an important phenomenon that is relatively rare in ATIS.