Story-level Text Style Transfer: A Proposal

Text style transfer aims to change the style of the input text to the target style while preserving the content to some extent. Previous works on this task are on the sentence level. We aim to work on story-level text style transfer to generate stories that preserve the plot of the input story while exhibiting a strong target style. The challenge in this task compared to previous work is that the structure of the input story, consisting of named entities and their relations with each other, needs to be preserved, and that the generated story needs to be consistent after adding flavors. We plan to explore three methods including the BERT-based method, the Story Realization method, and the Graph-based method.


Introduction
Text style transfer has been extensively explored by the NLP community on the sentence level. In previous work, researchers defined style of a sentence as one or some of its attributes, including but not limited to sentiment (Xu et al., 2018;John et al., 2019;Liao et al., 2018), formality Jain et al., 2018;Rao and Tetreault, 2018), factuality , etc. The goal is to change the specified attribute or attributes in the input sentence to the target attribute or attributes. For example, changing a positive sentence to a negative sentence while keeping its key information. There are also works on transferring Shakespearean English to modern English and backward (Xu et al., 2012;Jhamtani et al., 2017).
In this paper, we propose methods to transfer text style on the story level. The task takes a story as input, and generates a story in the target style with the main plot of the input story preserved. In our work, we define style as the setting of the story which reveals time background and geographical information. For example, if a story starts with a boy receiving a package containing parchments and a robe delivered by an owl, a good guess is that this is a magic story most likely taken from or inspired by Harry Potter. If we want to change the above mentioned story into the Alice in Wonderland style, an ideal output maybe a story about a girl receiving a package containing an invitation to a tea party from a rabbit.
Compared with sentence-level text style transfer, our proposed work faces more challenges. It is impractical to collect parallel stories that have the same plot or structure but differ in settings. To deal with this, we break down the task into two parts. First, we explore methods to build a structural representation of the original story to preserve the main plot, including leading roles and their connections. Second, we generate a story given the retrieved information or graph and the target style.

Related Work
The work we propose is closely related to previous work on event extraction from text so that we have a structure representation of the input story, and text generation from events to produce the story in the target style.
Graph Extraction from Text Generating text on the story level from events ideally requires the events to be organized as a structural representation, otherwise the plot will not be consistent. While manually constructing graphs is expensive, there are multiple approaches to automatically construct graphs based on stories, including Named Entity Recognition (NER), Knowledge Graph, and other text graph generation methods. While NER has been studied for a while, the task of extracting named entities with semantic relations between nodes labelled has much room left to be explored. Most previous work on extracting entities together with relations either extract them separately and  Figure 2 illustrates how GCN works. GCN is a variant of convolutional neural networks (CNNs) that works on graphs. The representation of each node is updated based on its adjacent nodes.
Text Generation from Graph Due to the variety of graphs and information loss of long-distance dependencies, it is hard to generate coherent stories that span across multiple sentences from a graph. Koncel-Kedziorski et al. (2019) proposed a novel graph transformer to alleviate this problem by leveraging the relational structure of graphs without setting linearization or hierarchical constraints.
The usage of GCN for text generation from graphs is enjoying growing popularity among researchers. Marcheggiani and Perez-Beltrachini (2018) used GCNs to build an encoder which cal-culates the node representation of each node in a directed graph. After adding residual connections and dense connections between the GCN layers, they used an LSTM decoder. Guo et al. (2019) built Densely Connected Graph Convolutional Networks to address the issue of learning deeper GCNs, and achieved better results on graph-to-sequence learning and AMR-to-text generation than previous methods.

Proposed Methodology
Our goal is to adapt the original story to the target setting. A well-known example of such kind of adaptation is New York theatre production Sleep No More, which adapts the story of Macbeth deprived of its original time setting, and sets in a 1930s hotel called the McKittrick.

Data Set
The data sets ideal for our proposed work need to satisfy the following requirements. First, each corpus needs to have an abundant amount of text in the same style. Second, the style of each corpus should differ from each other significantly, to the extent that a snippet from a certain corpus tells enough for people to tell which corpus it is from.
We select paragraphs between 100 and 200 words from each corpus and use GraphRel to automatically build graphs from the text.
For each method described in the next section, we use different training data. For the BERTbased method, we use the story corpora as training data. For the Story-realization method, we use the selected paragraphs and corresponding extracted To satisfy these requirements, we choose the Harry Potter Series and the Game of the Throne Series as our corpora. The former consists of 1,084,170 words and the latter consists of 1,736,054 words.

Models
We plan to experiment with the following three methods. The first two methods serve as baselines.
BERT-based Method This method will be based on Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018). In this method, first we use the corpus in the target style to fine-tune BERT. Then we build a vocabulary for the target corpus, setting the threshold of minimum occurrence to 20. We examine each word in the input story to see if they are included in the vocabulary of the target corpus. If they are not, we use the fine-tuned BERT to mask and predict these words one by one. The BERT-based method serves as our baseline model as it modifies the input story sentence by sentence instead of as a whole.
In simpler cases where we only wish to change the era of the story and do not have any other requirements, we can add append a phrase indicating the era to the original sentences. For example, when we mask video in the sentence 'The boy spent a whole day playing video games.', BERT (largecased version) correctly predicts the word to be video. If we add the phrase 'on the first day of the 18th century', the prediction becomes card, which matches the time setting. Ammanabrolu et al. (2019) proposed an ensemble-based model to generate sentences given plot events. This involves two steps. First, we need to extract events from the input story. This can be done through Named Entity Recognition (in this work we will use Allennlp NER) and finding VerbNet (Kipper-Schuler, 2005) classes of verbs and WordNet (Miller, 1995) Synsets for nouns recognized as events. The next step is to expand these events to a story. We plan to experiment with the ensemble model by Ammanabrolu et al. (2019) which is reported to combine the strength of the retrieve-and-edit method (Hashimoto et al., 2018), the template filling method, the sequence-to-sequence methods with finite state machine decoder, Monte Carlo beam decoding, and vanilla beam-decoding respectively. This method will conduct an event-to-event generation first to include more events before generating the output story. Figure 1 illustrates how this method works.

Story Realization Method
Here we need to note that sometimes extracted entities or relations are out-of-target-corpusvocabulary words in the target style corpus. For example, computer is not in the corpus of Harry Potter. We need to replace these words with words that have the same part of speech and closest in the word embedding trained on the target corpus. Euclidean distance is used for distance calculation.
We expect that compared with the BERT-based method, the Story Realization method will perform better in terms of creativity while not as well in terms of content preservation.
Graph-based Method In this method, a similar replacing scheme of out-of-target-corpusvocabulary words as in the story realization method should be used on the input story. Then we plan to experiment with graph transformers and other graph-to-text generators trained on our data sets, compare their performance on our task, and examine the possibility to improve their performance by making modifications. Specifically, in the text-tograph step we explore using Graph Neural Network. We plan to start with using GraphRel, the GCNbased SOTA entity and relation extraction model, to convert the input story to a graph. Figure 3 illustrates how the Graph-based Method works. The input is an extract from the novel Educated. A graph containing key information is built upon the input story. Some modification is done to replace out-of-target-corpus-vocabulary words. We expect the output to preserve the structure of the input story while being creative and consistent. Towards this goal, we plan to experiment with different GCNs structures for text generation.

Evaluation
We plan to evaluate our generated stories using perplexity and human evaluation, with an emphasis on the latter considering the creative nature of this task.
The generated stories will be evaluated by linguists from these aspects: grammar and fluency; main plot preservation; strength of the target style; creativeness. Each aspect will be given a score between 1 and 5, with 1 representing total failure, 2 representing barely acceptable, 3 representing acceptable, 4 representing good, and 5 representing the most satisfying performance.

Summary
We propose to explore text style transfer on the story level. The challenge remains in preserving the main plot and generating consistent and meaningful text in the target style. We plan to focus mostly on studying the possible application of GCN in this task. We will perform extensive experiments and report results in future work.