Samvaadhana: A Telugu Dialogue System in Hospital Domain

In this paper, a dialogue system for Hospital domain in Telugu, which is a resource-poor Dravidian language, has been built. It handles various hospital and doctor related queries. The main aim of this paper is to present an approach for modelling a dialogue system in a resource-poor language by combining linguistic and domain knowledge. Focusing on the question answering aspect of the dialogue system, we identified Question Classification and Query Processing as the two most important parts of the dialogue system. Our method combines deep learning techniques for question classification and computational rule-based analysis for query processing. Human evaluation of the system has been performed as there is no automated evaluation tool for dialogue systems in Telugu. Our system achieves a high overall rating along with a significantly accurate context-capturing method as shown in the results.


Introduction
A dialogue system is a computer system which is used for communication with human beings in natural language. It can be used for communication in either written or spoken form. Dialogue systems is a research problem which is being explored very rigorously over the past few years and there are great advancements as well. But despite that, most of the work is limited to English. This might be mainly due to the lack of resources, domain expertise and tools in other languages. Dialogue systems can be broadly classified into two kinds as Task Oriented Dialogue Systems and Non-task Oriented Dialogue Systems (Chen et al., 2017). Task oriented or domain-specific dialogue systems are systems which handle queries related to a particular task or a fixed domain. The main purpose of such systems is to provide the users with any information or help about that particular chosen do-main. On the other hand, Non-task Oriented or Generic Dialogue Systems are modelled to have natural and extended conversations with human beings and can handle multiple domain queries and can act as our assistants.
In this paper, we make an attempt to model a domain-specific dialogue system which answers various queries related to hospitals and their doctors in Telugu. Telugu is an agglutinative South Indian language which belongs to the family of Dravidian languages. It is spoken mainly in Southern India and is also the third most spoken language in India with approximately 93 Million speakers. It is a morphologically rich and highly inflectional language.
Our approach in modelling a domain-specific dialogue system mainly shows that even if there are limited resources like insufficient data, unavailability of linguistic tools etc., still, by taking some suitable measures and creating simple computational tools will lead to the required results. Our dialogue system mainly has two parts namely Question Classification and Query Processing.
Question Classification: In this phase, with the help of a question classifier, the question posed by the user is classified into one of the predefined categories which have been designed using the domain knowledge depending on the aim and intention of the question.
For training the question classifier, the data required was manually created. This is possible when the dialogue system is domain-specific which implies that the questions will only be related to a fixed number of categories. In the hospital domain, questions will majorly be related to the categories like timings and availability of the doctor, specialization of the doctor, location of the hospital and so on. This would result in limited questions classes overall.
Query Processing: Once the category of the question is known, we process the question using Named Entity Recognition(NER) and extract all the relevant details which are required to answer the question belonging to the particular category. If the information is sufficient to answer the question, then using it, an SQL query is built for retrieving the data which is required to generate a template-answer. But if the information is not sufficient for answering the question belonging to the category, then the user is asked to give the required information following which an SQL query is generated. Apart from question classification and query processing, context handling is another important task handled by our dialogue system. This serves as the main differentiating factor between a Question Answering system and a Dialogue system. Further, it facilitates the conversation to seem natural.

Related Work
Dialogue systems is a field that has rigorous research going on. There are many novel systems that have been developed already in English. There can be different kinds of dialogue systems based on the purpose that it serves. One of the very first dialogue systems is ELIZA (Weizenbaum, 1966), which was a deterministic rule-based system. It was one of the first systems to facilitate conversation between man and computer in natural language. Another such early rule-based dialogue system was PARRY (Colby et al., 1971). It was the first dialogue system to pass the Turing Test.
There are other systems like (Chung, 2004), (Zue et al., 2000) and (Ferguson and Allen, 1998) which are mixed-initiative and domain-specific systems. They operate and deliver information only related to a particular domain. In contrast, there are also generic dialogue system architectures which can adapt to domains. (ALLEN et al., 2000) and (Galescu et al., 2018) propose such architectures.
Another kind of dialogue systems is data-driven dialogue systems. They mine conversations from the already available dialogue-corpus. (Serban et al., 2015a), (Jafarpour and Burges, 2010), (Ritter et al., 2011) and (Leuski and Traum, 2011) are some of the systems which are data-driven. They mainly extract the relevant required response using Information Retrieval techniques.
There is another kind of dialogue systems like (Fujie et al., 2019) which mainly work with the user feedback combined with any other technique. This helps in the evolution and learning of the dialogue system. There are also some notable dialogue system like (Vinyals and Le, 2015), (Ritter et al., 2010), (Serban et al., 2015b) and (Mutiwokuziva et al., 2017) which are based on neural networks and deep learning.
In Telugu, the first dialogue system is (Nandi Reddy and Bandyopadhyay, 2006) and it uses computational rules and frames for answer generation. Another dialogue system in Telugu is (Ch. Sravanthi et al., 2015). The authors use various complex linguistic properties of the question to understand the meaning of the query and then process it accordingly.

About The Database
As this is a domain-specific dialogue system which is about Hospitals and can be used to answer questions related to hospitals and doctors in the area of Gachibowli, the database consists information related to hospitals and is used in the last stage of the architecture to generate templateanswer. A database consisting the details of four major hospitals in Gachibowli namely Continental hospital, Sunshine hospital, Himagiri hospital and Care hospital was created. The database created mainly contains the following information: • Name of the doctor • Hospital in which the doctor is working • Qualification of the doctor • Experience of the doctor • Specialization of the doctor (multiple fields, also includes the department in which they are working) • Recommendation Rating of the doctor • Consultation fees of the doctor • Days of availability of the doctor • Timings of availability of the doctor On the basis of the available information in the database, the following question categories were defined for question classification task based on its aim: • Information about the hospitals in the localities -Number of hospitals -List of all the hospitals -Address of the hospital

Dataset for Question Classification
There is a lack of dialogue conversational data in Telugu. But, any deep learning technique requires some considerable amount of data for training. And due to this, 388 natural language questions were created initially. Since the categories of the questions asked are finite, the questions posed are also limited. But the 388 questions are not sufficient for training a question classifier of 11 classes. Therefore, we performed Data Augmentation which led to a considerable amount of question data that could be used for training the classifier. This idea has been inspired by (Fadaee et al., 2017) and has been modified according to our requirement.
Data augmentation is done by making slight changes in the already present data to create more data. Even when there is a slight change in the sentence, the system always considers it as a different sentence and that is how the dataset grows. The attribute values like doctor name, hospital name, time and day, were replaced with new values and the tenses were changed to generate new questions which finally become a part of the dataset. There are a total of 28837 questions in this dataset after performing data augmentation. Data Augmentation is done for making the system robust.
For training and testing phases of the classifier, the initial manually created data (388 questions) was split in a ratio of 80% (310 questions) for training and 20% (78 questions) for testing. Then the training and testing data were augmented as described above. It is important to note that we first split the manually written data and then we perform data augmentation separately. This is for proper training and evaluation of the question classifier.

Question Classification
In this phase, the question posed by the user is classified into one of the already predefined categories depending on the aim of the question. We first get a vector representation of the question with the help of word embeddings 1 . Let the number of words in the question be N. Let the i th word in the question q be q i . Now each of these words is embedded into a vector with the help of an embedding matrix W. Let the vector representation of the i th word be and final representation of 2-dimensional matrix for the question X is obtained.

Experiments
Multiple experiments using various deep learning models and machine learning approaches were performed for the question classification task. The results are shown in table 1.

Support Vector Machine
SVM (Cortes and Vapnik, 1995) is one of the most popular machine learning classifier. The question representation X is used as the input for SVM. The final question representations are the main features on which SVM is trained.

Convolutional Neural Network
Word embedding based model is used for CNN (Kim, 2014), the X matrix is given as input to the CNN model followed by the fully connected layer and finally a softmax layer. The filter of size 4 is used for the convolutions.

Bidirectional LSTM
A single-layer Bidirectional LSTM (Schuster and Paliwal, 1997) is used for question classification.
The representation of the question X is given as an input to the Bidirectional LSTM layer. The output of this is then fed into a Dense Layer and then finally softmax is performed. The hidden dimension of Bi-LSTM is 64. The dropout rate is set 0.4 for avoiding overfitting.

Long Short Term Memory
A single-layer LSTM (Hochreiter and Schmidhuber, 1997) model has been implemented for classification. The input to LSTM layer is the concatenated representation of the question X and the output is passed to a dense layer and finally, softmax is performed to predict question category. The hidden dimension of LSTM is 32. A total of 10 epochs were performed and the drop out rate is set to 0.2.

Named Entity Recognition
After the classification of the question, the question category designated by the classifier along with the question posed by the user is analyzed to extract the important information from the question which is required for SQL query generation. This information is predominantly named entities and so we use Named Entity Recognition (NER). The named entities which were defined are the following: 1. Name of the Doctor 2. Name of the Hospital 3. Time 4. Date or Day 5. Name of the Locality 6. Hospital domain related Technical Terms For answering the majority of the questions, the required information is mostly the named entities that belong to the above list. There is no readily available computational tool for Named Entity Recognition in Telugu for our domain. Therefore, as proposed in (Srikanth and Narayana Murthy, 2019), a hybrid model is designed which combines both heuristics and rules, based on the nature of the language and patterns in occurrences for identifying named entities. Here heuristics implies some simple probable cases like, in the context of the question, there is a high probability of finding the Name of the Doctor, as the next two words right after the doctor or Dr. tag and it is also likely to find the name of the hospital right before the hospital tag. Tags like a.m. and p.m. can be used as a clue to find the time intended in the question right before these tags. It is important to note that there can also be some ambiguities, but since this data is domain-specific, there are less chances of facing such ambiguities.
Apart from such heuristics, some rules were also designed on the basis of nature of the language for identifying the named entities. These are more focused on the language. For example, mostly when the case marker 'ki' occurs, it is preceded by time in the question. In another instance, whenever 'lo' occurs, it is a case marker which is associated with location. We also have some definitive rules like 'gAaru' is always followed by It is not necessary that these rules and are also accurate always, but when the previous knowledge of the question category is combined with these rules, it is most likely that the required named entities are found.

Check for Answer Retrieval
After performing NER on the input question, the next aim is to generate an SQL query for the given question with the respective attribute values and then to retrieve the answer. But, if there is insufficient data in the context, then the SQL query cannot be generated. It is important to check this first. There are 11 categories of questions that can be handled by the system. For each of those categories, there are a set of attributes which are mandatory for answering the question. If they are not present in the context, then the system reverts back to the user asking for the required information. When the user responds with the information, the context is updated. If this context is sufficient for answering the question, then an SQL query is generated, else the same process is repeated until all the required information is available. The same is conveyed with real-time examples of our system in Figure 3.
From the example in Figure 2, it is understood from the first question that the conversation is about Dr. Anusha Meka. Now as a continuation to the first question, the user asks questions like How much experience does she have? or How much is the consultation fee?. The basic necessity of the dialogue system is to be able to understand how these questions are related to the first question and to have information as the context while answering these questions. To know that these questions are about Dr. Anusha is the pre-context that is being captured by the system and the further questions are answered accordingly. This is also primarily done by maintaining the context in every level of the dialogue. When a new question comes up without any contextual information, then the system goes back to the context available, looks for the attribute values and fills the missing attributes required for answering the question. When a new question comes up with a different doctor's name from the previous context, then it is assumed that this question is of different context, hence the previous context is flushed and the new context from this question is registered. With this process, context is grabbed and the output also seems more natural and realistic, and this property lets the system and user engage in a normal, natural and complete conversation, which is close to the real-world human-human conversation.

SQL Query Generation and Answer Retrieval
There are a total of 11 question categories that are handled. Each question frame has a definite and fixed SQL query. After the question is completely processed and once the required information for answering the question is available, it is put into the attribute blanks of SQL query accordingly. Then this query is given to the SQL database where all the information regarding the hospitals is stored. The attribute values which are required to build the template-answer are retrieved from the database and finally, the template-answer is generated and returned back to the user.

Other Simple Handled Issues
Apart from the detailed framework presented above, there is a need to handle some challenging linguistic issues to enhance the dialogue system and make the conversations more natural.

Anaphora Resolution
In this system, as a part of context handling, it is important for the system to understand various kinds of references. If there is a pronoun in a question, then the system should understand what is the actual reference to that pronoun. In this system, for pronoun handling, simple rule-based anaphora resolution is modelled. For example, if there is a pronoun intended for female, like 'Ame'(she) then the system looks for a female doctor in the context available.

Resolution of ambiguity in names
It is very likely that there are two or more doctors with the same first name and the user also generally addresses the doctor with the first name. In such a case, it is important for the system to understand which of the doctors is being referred to by the user. For this, the system prompts the user to select the doctor from a list of doctors having that same first name.

Handling Spelling Mistakes
It is possible that users can very easily misspell the name of a doctor or hospital because proper nouns can have many versions of pronunciations and corresponding spellings as well. Therefore, to find out what is exactly being referred to, character level matching is done and the similarity score is calculated with Levenshtein distance (Miller et al., 2009) between the user's spelling and all the names in the database. Based on the similarity score, the one with the highest and which passes the cutoff score is chosen as the correct spelling.

Question Classification
Several models have been used for the task of Question Classification. The accuracies have been reported in Table 1. We can notice that LSTM outperforms all the other algorithms. It is also important to have such high accuracy because if the question is classified wrong, then the output generated will also be wrong eventually.

Dialogue System
For evaluating a dialogue system, there is no automated evaluation tool available. Hence the system was manually evaluated by 8 people. The evaluators were native Telugu speakers. A special User Interface was created for easy evaluation of the system. After every answer from the system, the evaluator was expected to mark the response as 'correct', 'not sure' or 'incorrect'. Everyone evaluated the system for about 20-30 dialogues(here dialogue is a conversation between the user and the system until the answer is retrieved). A total of 195 responses were recorded. Table 2 shows the ratings given by the evaluators in various aspects for judging the overall performance of the system. The scaling followed is 0-5, where 0 being Poor and 5 being Excellent. The Table 3 shows the accuracy metrics.

Conclusion
This work mainly combines both deep learning techniques as well as rule-based computational techniques. Though this approach is domainspecific, it can be easily extended to any other domain as well. It only requires the creation of some domain-specific data and some domain-specific rules and heuristics. Even if the data is little, using some simple techniques like Data Augmentation and standard classifier gives good results and serves the required purpose. This can really be helpful with resource-poor languages. With such vast applications of the dialogue system, this is definitely one step closer to creating dialogue systems in resource-poor languages.

Future Work
Our future work would mainly be focused on working with Telugu-English Code-Mixed Data as more commonly used in Telugu speaking regions.
Another thing that we would focus more on is error handling, that is basically to identify a completely irrelevant question as an irrelevant one and also will try to handle Out-of-Vocabulary(OOV) words. We are also looking forward to design better heuristics for handling spelling mistakes. Also, using the available recommendation ratings in the database, we would try to inculcate the doctor recommendation system also as a part of this Dialogue System. The objective would be to recommend a doctor according to the patient's request or even based on the diseases/symptoms. Apart from that, we would also like to make this a multidomain dialogue system which would consist of information from multiple domains and switching between the domains in the conversation would also be facilitated.