Data2Text Studio: Automated Text Generation from Structured Data

Data2Text Studio is a platform for automated text generation from structured data. It is equipped with a Semi-HMMs model to extract high-quality templates and corresponding trigger conditions from parallel data automatically, which improves the interactivity and interpretability of the generated text. In addition, several easy-to-use tools are provided for developers to edit templates of pre-trained models, and APIs are released for developers to call the pre-trained model to generate texts in third-party applications. We conduct experiments on RotoWire datasets for template extraction and text generation. The results show that our model achieves improvements on both tasks.


Introduction
Data-to-text generation, i.e., a technology which takes structured data as input and produces text that adequately and fluently describes this data as output, has various applications on the generation of sports news (Chen and Mooney, 2008; Kim and Mooney, 2010;Mei et al., 2016;Wiseman et al., 2017), product descriptions (Wang et al., 2017), weather reports (Liang et al., 2009;Angeli et al., 2010;Mei et al., 2016) and short biographies (Lebret et al., 2016;Chisholm et al., 2017). In another scenario, it is possible albeit a little awkward for a virtual assistant like Microsoft Cortana to read out structured data when responding to users' queries. it is more user friendly for a virtual assistant to identify and read out the essential part of the structured data in natural language to make it easier to understand. In these cases, it is inefficient and expensive to generate texts using human writers, while an automatic text generation system would be helpful.
There are two main challenges for data-to-text generation systems: 1) Interactivity: For a developer, it should be able to customize the text generation model and control the generated texts. 2) Interpretability: the generated texts should be consistent with the structured data. For example, we can say "with a massive 8 GB of memory" for a laptop computer while "a massive 2 GB" is inappropriate. Rule-based approaches (Moore and Paris, 1993;Hovy, 1993;Reiter and Dale, 2000;Belz, 2007;Bouayad-Agha et al., 2011) encode domain knowledge into the generation system and then produce high-quality texts, while the construction of the system is expensive and heavily depends on domain experts. Statistical approaches are employed to reduce extensive development time by learning rules from historical data (Langkilde and Knight, 1998;Liang et al., 2009;Duboue and McKeown, 2003;. However, statistical approaches are prone to generating texts with mistakes, because they don't know how to use specific phrases under various application conditions.
To address the second challenge, we propose a Semi-HMMs model to automatically extract templates and corresponding trigger conditions from parallel training data. Trigger conditions are explicit latent semantic annotations between paired structured data and texts, which support learning how to use specific phrases under the particular condition and then improve the interactivity and interpretability of the generated text compared to traditional template-based methods. More importantly, obtaining text generation trigger conditions automatically from alignment distribution could significantly reduce human editing workload compared with other commercial systems, e.g., Word-  Smith, Arria and Quill 1 . For example, although WordSmith provides functional tools to help developers create templates and generation rules, it still needs to create rules from scratch manually.
For the first challenge, we demonstrate the Data2Text Studio, a powerful platform equipped with the proposed Semi-HMMs model, to assist developers to generate texts from structured data in their own applications. Currently, this system provides several pre-trained models covering different domains: sports headline generation, resume generation, product description generation, etc. Developers can also train their own models by uploading parallel data. After model training, developers can revise the model, preview the generated texts or call the APIs to generate texts in third-party applications. All the processes are simple and friendly.
We conduct experiments on the ROTOWIRE dataset (Wiseman et al., 2017) to evaluate the performance of template extraction and overall text generation. The results show that our model achieves improvements on both tasks. The rest of this paper is organized as follows: Section 2 describes the architecture of Data2Text Studio. Section 3 proposes the main algorithm. Section 4 shows the experiment results. ers need to upload parallel data which consists of texts and corresponding structured data to train the model, and then training components will extract the templates and corresponding trigger conditions from training data automatically; secondly, developers could leverage the built-in tools to further revise the extracted templates and trigger conditions manually; finally, developers could preview the generated texts of the customized model, and the APIs are provided to generate texts in bulk or generate texts in third-party applications. In the following, we will introduce these modules in detail.

Model Training
We adopt the template-based solution for the Data2Text Studio. It can generate texts with high accuracy and fluency, which can be used in business applications directly. Several previous studies (Liang et al., 2009;Wang et al., 2017; can be applied to extract templates from parallel data. To address the challenges introduced in Section 1, we propose a (a) Template revision. The center part shows the template with slots, and the bottom part shows the trigger conditions.
(b) Generated texts preview. Multiple headlines are generated for the same game to ensure variety. Semi-HMMs model to extract templates and corresponding trigger conditions from parallel data (see Section 3.1 for the algorithm). Fig. 2 presents an example of the extracted templates from NBA Headline parallel data, which consists of the scoreboard and the corresponding news.

Model Revision
The trained model provides a better starting point for developers to avoid creating a model from scratch. If necessary, developers can revise the trained model by editing the extracted templates and their corresponding trigger conditions. Fig. 3a shows the interface of template editing. Three mechanisms are designed to manage templates and corresponding trigger conditions: 1) Data slot: the input structured data will be filled into the slot to generate texts. 2) Synonyms: it is constructed by a list of phrases, and one of them will be chosen randomly during the generation process. 3) Branch: the trigger condition to define usage scenario for the specific phrase. Our Semi-HMMs model in Section 3.1 can learn such data slots and trigger conditions automatically. Meanwhile, developers can also revise them if necessary.

Text Generation
Given the structured data, the system will generate corresponding texts with the trained model. Fig. 3b shows an example for NBA headline generation. The left-hand side shows the input struc-tured data which contains the attributes of the game. The right-hand side shows multiple generated texts for this game to help developers check the quality of the generated texts.

API for Third-Party Applications
To use the text generation service in third-party applications, an API is created for each trained model. Once the structured data is posted through the API, the system will deliver the generated text back to third-party applications automatically. In this way, developers can leave the development work for a text generation model in the Data2Text Studio. Fig. 4 shows three application scenarios: sports headline generation, user profile generation based on LinkedIn data and car insight generation.

The Proposed Algorithm
In this section, we introduce the proposed algorithm for templates extraction and corresponding trigger conditions mining.

Template Extraction
A main challenge of templates extraction is the alignment between text and structured data. We adopt the model given by Liang et al. (2009), which presents a 3-tier HMMs to automatically align words to the fields of structured data. These aligned words could be strings, like brand names, or numbers copied from the data. Another challenge is the lexical choice, which refers to choosing contextually-appropriate words to express non-linguistic data. For example, in a basketball game report, the author tends to use blow out only when the score difference is very large. Lexical choice is very subtle and differs from author to author, thus we enrich the alignment model with a Gaussian emission probability from words to numbers in the data.
The garbage collection problem is severe in the original model of Liang et al. (2009), which means that most of the words are wrongly aligned to infrequent fields that should remain unaligned (i.e, aligned to null). Here we incorporate the Posterior Regularization proposed by Graça et al. (2010), which could add constraints into models with latent variables while keeping the model tractable at the same time. In practice, we set a lower bound on the number of unaligned words, which could significantly alleviate the garbage collection problem.
In a nutshell, we propose a generative model, P s (w, π|l), where s is the world state, namely, the structured data, w is the observed words, π is the segmentation of words, and l represents tags, which could be the fields of the structured data (e.g. Team Name) or simple operations on specific fields (e.g. score difference). Let c be the segments of sentence w segmented by π. We further make a Markov assumption and factorize it into: where c t represents the segment at time stamp t, which is annotated with tag l t . For different types of fields, we use different methods to model P s (c t |l t ).
During the training process, our goal is to maximize the complete data likelihood: where D represents the whole training data. Once the model has been trained, we use Viterbi-like dynamic programming to perform the MAP inference to segment the texts and to assign the most likely tags for each span.
We derive an expectation-maximization (EM) algorithm to perform maximum likelihood esti-mation, and introduce a soft statistical regularization to guide the model towards a better solution. Specifically, we design a special NULL tag for unaligned words, and we "encourage" it to annotate at least half of the words. For more details, please refer to Qin et al. (2018).

Trigger Mechanism
As proposed in 3.1, we use Gaussian distribution to model the probability of alignment between numerical values and phrases. Hence our model can tell us not only where the word comes from, but also the distribution of numbers it is aligned to. For example, after training, our model successfully aligns "blow out" to the score difference, and shows that the mean value of score difference is 17 when this phrase is used. With this information, we could set a "trigger" on the aligned words. Trigger is a scheme which determines under what conditions a template could be used. For example, templates with "blow out" aligned to score difference can only be used when the score difference is around 17, where blew out would have a higher probability than defeated. So we could obtain a rule like this: With such rules, our model will be able to use different words under various conditions. Now that the templates and triggers are ready for use, for text generation, we fill the templates with structured data under corresponding applicable trigger conditions.

Experiments
In this section, we will report the performance of the proposed model on template extraction and on overall text generation, both evaluated on the ROTOWIRE subset of the Wiseman et al. (2017)

Template Extraction Evaluation
We conduct an experiment and compare with Liang et al. (2009)'s system as the baseline. It is difficult to evaluate the accuracy of tag assignment for the whole dataset, since the executable tags are not annotated in the original data. We recruit three human annotators which are familiar with basketball games to label a random sample consisting of 300 sentences from the test set. The annotators were told to judge whether each word span is related to the table, and which label they are related to. Finally, we calculate the precision and recall for non-NULL tag assignments at word-level. The results are shown at Table 1. We can observe that our initial model indeed outperforms the baseline system in recall, while posterior regularization helps a lot to avoid distraction from irrelevant information that should be tagged as NULL without sacrificing the recall performance.

Overall Text Generation Evaluation
We also test the performance of extracted templates in overall text generation, by comparing with the baseline using the same heuristics described in Section 3.2. To generate document-level texts, we first generate a sentence describing the scoreline result for every game, followed by three sentences describing other information about team performance. While maintaining that no template is repeatedly used, we then choose the template with the highest score for the top ten players sorted by their game points. We report automatic metrics including BLEU scores and those based on relation extraction as proposed by Wiseman et al. (2017): precision & number of unique relations in generation (RG), precision & recall for content selection (CS), and content ordering (CO) score. Besides these automatic metrics for various aspects in NLG, we also conduct human evaluation on information correctness (1-5 scale ratings, the higher the better). We ask four human raters who are fluent in English and familiar with basketball to rate outputs for 30 random games. Results are shown in Table 2 with Kendall's W measuring the inter annotator agreement. We can observe that templates derived from our model indeed outperform those from the baseline system.

Conclusion and Future Work
To summarize, Data2Text Studio is a platform for automated text generation from structured data.  It not only provides several pre-trained models which could generate high-quality texts from data but also is very easy to train new models by uploading parallel data. In addition, this system is equipped with the proposed Semi-HMMs model which could extract templates and corresponding trigger conditions from parallel data automatically and supports learning how to use specific phrases under the particular condition. Experiment results on the ROTOWIRE dataset show that the proposed model outperforms the baseline for template extraction and text generation.
In the future, we will integrate more powerful pre-trained models into this system in terms of data domain and text fidelity. For the template extraction model, we will learn more complex grounding rules to enhance the model power.