A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling

A spoken language understanding (SLU) system includes two main tasks, slot filling (SF) and intent detection (ID). The joint model for the two tasks is becoming a tendency in SLU. But the bi-directional interrelated connections between the intent and slots are not established in the existing joint models. In this paper, we propose a novel bi-directional interrelated model for joint intent detection and slot filling. We introduce an SF-ID network to establish direct connections for the two tasks to help them promote each other mutually. Besides, we design an entirely new iteration mechanism inside the SF-ID network to enhance the bi-directional interrelated connections. The experimental results show that the relative improvement in the sentence-level semantic frame accuracy of our model is 3.79% and 5.42% on ATIS and Snips datasets, respectively, compared to the state-of-the-art model.


Introduction
Spoken language understanding plays an important role in spoken dialogue system. SLU aims at extracting the semantics from user utterances. Concretely, it identifies the intent and captures semantic constituents. These two tasks are known as intent detection and slot filling (Tur and De Mori, 2011), respectively. For instance, the sentence 'what flights leave from phoenix' sampled from the ATIS corpus is shown in Table 1. It can be seen that each word in the sentence corresponds to one slot label, and a specific intent is assigned for the whole sentence.  Traditional pipeline approaches manage the two mentioned tasks separately. Intent detection is seen as a semantic classification problem to predict the intent label. General approaches such as support vector machine (SVM) (Haffner et al., 2003) and recurrent neural network (RNN) (Lai et al., 2015) can be applied. Slot filling is regarded as a sequence labeling task. Popular approaches include conditional random field (CRF) (Raymond and Riccardi, 2007), long short-term memory (LSTM) networks (Yao et al., 2014).
Considering the unsatisfactory performance of pipeline approaches caused by error propagation, the tendency is to develop a joint model (Chen et al., 2016a;Zhang and Wang, 2016) for intent detection and slot filling tasks. Liu and Lane (2016) proposed an attention-based RNN model. However, it just applied a joint loss function to link the two tasks implicitly. Hakkani-Tür et al. (2016) introduced a RNN-LSTM model where the explicit relationships between the slots and intent are not established. Goo et al. (2018) proposed a slotgated model which applies the intent information to slot filling task and achieved superior performance. But the slot information is not used in intent detection task. The bi-directional direct connections are still not established. In fact, the slots and intent are correlative, and the two tasks can mutually reinforce each other. This paper proposes an SF-ID network which consists of an SF subnet and an ID subnet. The SF subnet applies intent information to slot filling task while the ID subnet uses slot information in intent detection task. In this case, the bi-directional interrelated connections for the two tasks can be established. Our contributions are summarized as follows: 1) We propose an SF-ID network to establish the interrelated mechanism for slot filling and intent detection tasks. Specially, a novel ID subnet is proposed to apply the slot information to intent detec-

Proposed Approaches
This section first introduces how we acquire the integration of context of slots and intent by attention mechanism. And then it presents an SF-ID network which establishes the direct connections between intent and slots. The model architecture based on bi-directional LSTM (BLSTM) is shown in Figure 2. 1

Integration of Context
In SLU, word tags are determined not only by the corresponding terms, but also the context (Chen et al., 2016b). The intent label is also relevant with every element in the utterance. To capture such dependencies, attention mechanism is introduced. Slot filling: The i th slot context vector c i slot is computed as the weighted sum of BLSTM's hidden states (h 1 , ..., h t ): where the attention weight α is acquired the same way as in (Liu and Lane, 2016). Intent detection: The intent context vector c inte is calculated as the same way as c slot , in particular, it just generates one intent label for the whole sentence.

SF-ID Network
The SF-ID network consists of an SF subnet and an ID subnet. The order of the SF and ID subnets can be customized. Depending on the order of the two subnets, the model have two modes: SF-First and ID-First. The former subnet can produce active effects to the latter one by a medium vector.

SF-First Mode
In the SF-First mode, the SF subnet is executed first. We apply the intent context vector c inte and slot context vector c slot in the SF subnet and generate the slot reinforce vector r slot . Then, the newlyformed vector r slot is fed to the ID subnet to bring the slot information. SF subnet: The SF subnet applies the intent and slot information (i.e. c inte and c slot ) in the calculation of a correlation factor f which can indicate the relationship of the intent and slots. This correlation factor f is defined by: (2) In addition, we introduce a slot reinforce vector r slot defined by (3), and it is fed to the ID subnet to bring slot information.
ID subnet: We introduce a novel ID subnet which applies the slot information to the intent detection task. We believe that the slots represent the wordlevel information while the intent stands for the sentence-level. The hybrid information can benefit the intent detection task. The slot reinforce vector r slot is fed to the ID subnet to generate the reinforce vector r, which is defined by: where the weight α i of r i slot is computed as: We also introduce an intent reinforce vector r inte which is computed as the sum of the reinforce vector r and intent context vector r inte .
Iteration Mechanism: The intent reinforce vector r inte can also be fed into the SF subnet. In fact, this intent reinforce vector r inte can improve the effect of relation factor f because it contains the hybrid information of intent and slots, and (2) can be replaced by: With the change in the relation factor f , a new slot reinforce vector r slot is acquired. Thus, the ID subnet can takes a new r slot and exports a new r inte . In this case, both SF subnet and ID subnet are updated, one iteration is completed. In theory, the interaction between the SF subnet and ID subnet can repeat endlessly, which is denoted as the iteration mechanism in our model. The intent and slot reinforce vectors act as the links between the SF subnet and the ID subnet and their values continuously change during the iteration process.
After the iteration mechanism, the r inte and r slot participate in the final prediction of intent and slots, respectively. For the intent detection task, the intent reinforce vector r inte and the last hidden state h T of BLSTM are utilized in the final intent prediction: y inte = sof tmax(W hy inte concat(h T , r inte )) (9) For the slot filling task, the hidden state h i combined with its corresponding slot reinforce vector r i slot are used in the i th slot label prediction. The final expression without CRF layer is: y i slot = sof tmax(W hy slot concat(h i , r i slot )) (10)

ID-First Mode
In the ID-First mode, the ID subnet is performed before the SF subnet. In this case, there are some differences in the calculation of ID subnet in the first iteration. ID subnet: Unlike the Slot-First mode, the reinforce vector r is acquired by the hidden states and the context vectors of BLSTM. Thus, (4) (5) (6) can be replaced by: The intent reinforce vector r inte is still defined by (7), and it is fed to the SF subnet. SF subnet: The intent reinforce vector r inte is fed to the SF subnet and the relation factor f is calculated the same way as (8). Other algorithm details are the same as in SF-First mode. Iteration Mechanism: Iteration mechanism in ID-First mode is almost the same as that in SF-First mode except for the order of the two subnets.

CRF layer
Slot filling is essentially a sequence labeling problem. For the sequence labeling task, it is beneficial to consider the correlations between the labels in neighborhoods. Therefore, we add the CRF layer above the SF subnet outputs to jointly decode the best chain of labels of the utterance.   Table 3: Analysis of seperate subnets and their interaction effects metrics in the experiments. For the slot filling task, the F1-score is applied. For the intent detection task, the accuracy is utilized. Besides, the sentence-level semantic frame accuracy (sentence accuracy) is used to indicate the general performance of both tasks, which refers to proportion of the sentence whose slots and intent are both correctly-predicted in the whole corpus.
Training Details: In our experiments, the layer size for the BLSTM networks is set to 64. During training, the adam optimization (Kingma and Ba, 2014) is applied. Besides, the learning rate is updated by η t = η 0 /(1 + pt) with a decay rate of p = 0.05 and an initial learning rate of η 0 = 0.01, and t denotes the number of completed steps. Model Performance: The performance of the models are given in Table 2, wherein it can be seen that our model outperforms the baselines in all three aspects: slot filling (F1), intent detection (Acc) and sentence accuracy (Acc). Specially, on the sentence-level semantic frame results, the relative improvement is around 3.79% and 5.42% for ATIS and Snips respectively, indicating that SF-ID network can benefit the SLU performance significantly by introducing the bi-directional interrelated mechanism between the slots and intent. Analysis of Seperate Subnets: We analyze the effect of seperate subnets, and the obtained results are given in Table 3. The experiments are conducted when the CRF layer is added. As we can Figure 3: Effect of iteration number on the model performance in SF-First mode see, both models including only the SF subnet or the ID subnet have acheived better results than the BLSTM model. Therefore, we believe that both SF subnet and ID subnet have significance in performance improvement. Beside, we also analyse the condition with independent SF and ID subnet, in other words, when there is no interaction in SF and ID subnet. We can see it also obtains good results. However, the SF-ID network which allows the two subnets interact with each other achieve better results. This is because the bi-directional interrelated mechanism help the two subnets promote each other mutually, which improves the performance in both tasks. Analysis of Model Mode: In Table 2, it can be seen that the ID-First mode achieves better performance in the slot filling task. This is because the ID-First mode treats the slot filling task as a more important task, because the SF subnet can utilize the intent information output from the ID subnet. Similarly, the SF-First mode performs better in the intent detection task. In general, the difference between the two modes is minor. Iteration Mechanism: The effect of iteration mechanism is shown in Figure 3. The experiments are conducted in SF-First mode. Sentence accuracy is applied as the performance measure because it can reflect the overall model performance. It increases gradually and reaches the maximum value when the iteration number is three on both ATIS and Snips dataset, indicating the effective-ness of iteration mechanism. It may credit to the iteration mechanism which can enhance the connections between intent and slots. After that, the sentence accuracy gradually gets stabilized with minor drop. On balance, the iteration mechanism with proper iteration number can benefit the SLU performance. CRF Layer: From Table 2 it can be seen that the CRF layer has a positive effect on the general model performance. This is because the CRF layer can obtain the maximum possible label sequence on the sentence level. However, CRF layer mainly focuses on sequence labeling problems. So the improvement of the slot filling task obviously exceeds that of the intent detection task. In general, the performance is improved by the CRF layer.

Conclusion
In this paper, we propose a novel SF-ID network which provides a bi-directional interrelated mechanism for intent detection and slot filling tasks. And an iteration mechanism is proposed to enhance the interrelated connections between the intent and slots. The bi-directional interrelated model helps the two tasks promote each other mutually. Our model outperforms the baselines on two public datasets greatly. This bi-directional interrelated mechanism between slots and intent provides guidance for the future SLU work.