MedWriter: Knowledge-Aware Medical Text Generation

To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical. However, most of recent works only consider the information of unstructured text rather than structured information of the knowledge graph. In this paper, we focus on the medical topic-to-text generation task and adapt a knowledge-aware text generation model to the medical domain, named MedWriter, which not only introduces the specific knowledge from the external MKG but also is capable of learning graph-level representation. We conduct experiments on a medical literature dataset collected from medical journals, each of which has a set of topic words, an abstract of medical literature and a corresponding knowledge graph from CMeKG. Experimental results demonstrate incorporating knowledge graph into generation model can improve the quality of the generated text and has robust superiority over the competitor methods.


Introduction
Medical text generation has been a hot topic recently, such as electronic medical record (EMR) generation (Guan et al., 2018), medical question generation , clinical notes generation (Melamud and Shivade, 2019), etc. However, compared to the research in the general domain, there is still a lot of space for exploration, especially with the assistance of specific knowledge graph. Objective: Explore the therapeutic effect of celecoxib combined with bone trauma treatment instrument on knee arthritis pain. Methods: 108 patients with unilateral early and mid-stage knee osteoarthritis treated in the orthopedic clinic of liaocheng third people's hospital from January 2015 to January 2017 were randomly divided into observation group and contrast group, with 54 cases in each group. The observation group takes celecoxib orally with the application of a bone trauma treatment device; The contrast group just takes celecoxib orally. The clinical effects of the two groups were compared. Results: At the last follow-up, the pain score of the observation group was lower than that of the contrast group (t=3.21,p=0.00). The knee function of the observation group was better than that of the contrast group (t=3.74,p=0.00). The total effective rate in the observation group was 88.89% higher than 77.78% in the contrast group (χ2=4.70,p=0.03). Conclusion: Celecoxib combined with bone trauma treatment device has obvious clinical effect, effectively reduces pain score, can improve knee function, and is worthy of clinical application. Intuitively, the medical knowledge graph (KG) is essential to guarantee the correctness of generated text, especially for high professional domains. However, most of the recent works don't make full use of medical knowledge graph (MKG). Lee et al. (2018) adopt an encoder-decoder model to generate free texts in electronic health records and Guan et al. (2018) propose a GAN-based framework trained by the reinforce algorithm to generate synthetic EMR text, both of which don't utilize the external medical knowledge. Lee et al. (2019)    model, which is pretrained by leveraging the medical concept unique identifiers from the UMLS, to improve the quality of generated clinical text. However, they view each triplet as a instance and adopt the Skip-gram algorithm (Mikolov et al., 2013) to acquire the pretrained concept embedding, which ignore the relationships between medical entities. In this paper, we focus on the knowledge-aware medical text generation, which is a topic-to-text task. We firstly collect a medical literature dataset from medical journals that contains more than 50,000 topic-text pairs. Each of them has a set of keywords describing the topic and a relevant abstract as target text. For each pair, we collect the corresponding knowledge from a large scale Chinese medical knowledge graph CMeKG 1 . An example is shown in Figure 1. Then, we adapt a knowledge-aware neural generation model for this task, named MedWriter, which consists of three components: topic encoder, graph encoder and decoder. The topic encoder is used to acquire the representation of topic words, while the graph encoder exploits the specific information from the MKG. Therefore, the model combines the information of topic words with the medical knowledge. Afterwards, we use the decoder with copy mechanism (Gu et al., 2016) to generate medical text. Experimental results demonstrate incorporating knowledge graph into generation model can improve the quality of the generated text and has robust superiority over the competitor methods.

Method
Given a set of topic words K = {w 1 , w 2 , ..., w s }, and a knowledge graph represented as a set of triples, i.e., G = {g 1 , g 2 , g 3 , ...}, where each triple g i is comprised of < s i , p i , o i > denoting subject, predicate and object respectively, our goal is to generate a natural language text Y = {y 1 , y 2 , y 3 , ...}, which is required to be relevant to the topic, grammatically correct and informative.

Topic Encoder
We first convert each keyword into word embedding representation e(w i ) by a matrix M ∈ R l×d , where l denotes the size of vocabulary and d denotes the dimension of word embedding. Then a bidirectional GRU (Cho et al., 2014) is employed to transform the keywords into a distributed representation: where [; ] denotes the concatenation operation; e(w t ) denotes the word embedding of the t-th keyword.
The last hidden states of the forward and backward GRU network are concatenated as the entire keywords

Graph Encoder
For each knowledge graph, as the previous graph-based works did, we first perform a Levi graph transformation (Beck et al., 2018), where each labeled edge in G is replaced by two unlabeled edges, and add reverse and self-loop edges to the Levi graph. For instance, given a triple < s, p, o >, after transformation, we obtain < s, →, p >, < p, →, o >, < o, →, p >, < p , →, s > and their self-loop connections, where p is the reverse edge of p. In this way, both the entities and relations can be viewed as vertices without losing any information. Besides, a global vertex is added to connect all entity vertices in order to aggregate the information between disconnected parts of graph. Thus, the original knowledge graph can be represented as a unlabeled graph G = {V, E}, where V = {v 1 , v 2 , ..., v x−1 , v g } is a list of entities, relations and global node v g , and E is an adjacent matrix M ∈ R x×x which describes the connections, where x is the total number of the vertices contained in The graph encoder is composed of a stack of several identical layers similar to (Vaswani et al., 2017), each of which has a multi-head attention sub-layer followed by a feed-forward network sub-layer. Each sub-layer is equipped with a residual connections (He et al., 2016) and a layer normalization (Ba et al., 2016). With the same operation in 2.1, the vertices are converted to an embedding representation e(v i ).
Following a similar procedure to (Koncel-Kedziorski et al., 2019), for each vertex v i , in order to obtain the contextual representation, we adopt multi-head attention mechanism to attend over the other vertices adjacent to v i in G . It linearly projects the inputs of attention several times with different parameters respectively. All the inputs of attention function come from V , and then the multi-head self-attention can be calculated as: where N i denotes the neighbourhood of v i ; n denotes the number of head; d h denotes the dimension of each head; W , W t 1 and W t 2 are learnable parameters. Then we can obtain the final output r V of one layer by r V = F F N (M ulHeadAtt(V )), where F F N is a feed-forward network which consists of two linear transformations with a ReLU activation. Since the identical layers are stacked for several times, where the output of previous layer is fed into current layer as input, we take the output of the last layer as the final encoding representation.

Decoder
We use an attention-based GRU network as the decoder initialized by the concatenation of the representations of topic and global vertex [r K ; r vg ]. At the t-th time step, the hidden state h t is calculated by h t = GRU (h t−1 , e(y t−1 ), c t−1 )), where h t−1 is the hidden state of last step; e(y t−1 ) is the embedding of the output of last step; c t−1 is the context embedding in the last step. The context embedding c consists of two parts: c K and c V , attending over keywords and knowledge graph respectively.
where W 3 and W 4 are learnable parameters. The computation of c K is similar to c V . Meanwhile, we also adopt the copy mechanism (Gu et al., 2016) to directly select the token from keywords and knowledge graph. The probability p for copying is computed as p = σ(W [h t ; c t ] + b). Then we can obtain the final probability distribution: (1 − p) * P gen + p * P copy where P gen is a probability distribution over all words in the vocabulary which is calculated by two linear neural networks with a softmax function; P copy is a probability distribution of copying a word from inputs based on the attention scores over the [K; V ]

Dataset
In order to realize the medical text generation task, we collect a Chinese medical literature dataset from medical journals. The literature dataset contains plenty of pairs, all of which come from the medical articles published on the platform. Each pair has a set of keywords describing some topic information and an abstract which is a piece of text related to the topic. However, the original pair doesn't have corresponding knowledge graph. Thus, we draw support from CMeKG which is a large-scale Chinese Medical Knowledge Graph. Firstly, we make a mapping between keyword and entity in CMeKG. In addition to exact matching, we also conduct fuzzy matching through calculating the similarity between them. Given a keyword, we select several candidate entities based on the inverted index we built and then utilize the WMD algorithm (Kusner et al., 2015) to compute the similarity between keyword and each candidate entity. We use a lot of medical literature to pretrain the char embedding. When calculating the similarity, we keep the entity with the highest score among the entities whose score is more than 0.7. Afterwards, given a set of entities, all pairwise entities are used for search in CMeKG and we keep the exact matched triples. Besides, we consider the fuzzy matching as a new relation and keep all the <keyword, fuzzy matching, entity> triples. Finally, we obtain a dataset that contains more than 50,000 items. Each item has a set of topic keywords, an abstract as text and a corresponding knowledge graph derived from CMeKG. The statistics of the dataset are shown in Table 1.

Competitor Methods
In order to validate the effectiveness of incorporating knowledge graph into generation model, we compare MedWriter with two competitor methods.
The first method is an attention-based sequence-to-sequence model (Sutskever et al., 2014), which only use the topic words as input to generate text, named Seq2Seq.
The second method is a variant of the Seq2Seq, which utilizes not only the topic words but also the linearized knowledge graph, named GraphSeq. Borrowing the idea from (Konstas et al., 2017), we flatten the knowledge graph to a linear sequence according to the entity order they appear in the text. Another sequence encoder is employed to encode it.
When decoding, both of the two competitor methods are equipped with copy mechanism.

Settings
The model is trained to minimize the negative log-likelihood of the training set with the SGD optimization. The learning rate is set to 0.15. The hidden size of GRU is set to 512. The stack of graph encoder has 6 identical layers. We employ 4 parallel attention layers to perform multi-head attention. The dimension of embedding layer and the attention sub-layer are set to 512, while the intermediate dimension of linear sub-layer is set to 2048. The size of the vocabulary is truncated to 50,000. The batch size is set to 32. We train the model for 30 epochs and select the model which achieves the best performance on the validation set.

Metrics
For evaluation, we adopt BLEU (Papineni et al., 2002) and ROUGE (Lin, 2004) metrics. BLEU is an n-gram overlapping measure which is widely adopted in the text generation task. BLEU1, BLEU2 and BLEU3 are reported. ROUGE is also a common measure to automatically determine the quality of the generated text. We report the F1 score for ROUGE-L, which measures the longest common sequence (LCS) between the reference and the candidate.

Results
As shown in Table 2, the Seq2Seq method achieves the worst performance in terms of both BLEU and ROUGE since it only uses the topic keywords. The GraphSeq method outperforms the Seq2Seq because it uses the medical knowledge graph, though the graph is viewed as a sequence consists of entities and relations, which means the knowledge graph can improve the performance a little bit with this setting. Compared to the competitor methods, MedWriter significantly improves the performance by at least +3.36 BLEU1 points and +3.77 ROUGE-L points while the GraphSeq just improve the Seq2Seq by +1.0 BLEU1 points and +0.8 ROUGE-L points, which means incorporating the knowledge graph including not only the entities and relations but also the graph structure into generation model is indeed conductive to the medical text generation task. Like the triple containing the relation between entities, besides that, the knowledge graph even contains the relation between triples. The experimental results also illustrate the point. Objective: Investigate the effect of Irbesartan combined with nifedipine sustained-release tablets in the treatment of diabetic patients with hypertension. Methods: 60 patients with diabetes and hypertension treated in our hospital from January 2017 to January 19 were selected as the research object. It was divided into a contrast group and an observation group, 30 cases each. The contrast group was treated with nifedipine sustained-release tablets, and the observation group was added with nifedipine sustained-release tablets. The treatment effects of the two groups were compared. Results: The total effective rate of treatment in the observation group (95.00%) was significantly higher than that in the contrast group (80.00%), and the difference was statistically significant (p<0.05).
The total effective rate of treatment in the observation group was significantly higher than that in the contrast group, the difference was statistically significant (p<0.05). Conclusion: Irbesartan combined with nifedipine sustained-release tablets has a significant effect on diabetic patients with hypertension, and is worthy of clinical promotion. The Figure 3 shows an example of the generated text by MedWriter. The generated text is of good quality on syntactic and semantic except for some repetition. And these topic words and their relations are also described in the text. It demonstrates that the MedWriter has the ability to model the knowledge graph and learn the information contained in it. Though MedWriter achieves a nice performance, but there are still many issues unsolved. Medical literature always contains a lot of medical indications and their corresponding values. If the model generates a right description of indication but a wrong value, the entire generated text may be meaningless even hazardous in the medical domain. For example, in the generated text, though 95% is higher than 80% which conforms to the description, the numerical values aren't necessarily accurate, while these values are very important and often appear in the medical text. So how to generate a right numerical value for the corresponding term is a considerable and challenging problem, we will explore it in the future research.

Conclusion
We use the medical knowledge graph to facilitate the medical text generation. A Chinese medical literature dataset with the corresponding knowledge graph is collected and an encoder-decoder model equipped with a graph encoder is adapted to the medical topic-to-text generation task. Experimental results demonstrate the effectiveness of incorporating knowledge graph into generation model by outperforming the competitor methods. This work is a preliminary attempt on knowledge-aware medical text generation. In the future, we plan to do more researches on applying natural language generation technology to the medical domain.