Where Have I Heard This Story Before? Identifying Narrative Similarity in Movie Remakes

People can identify correspondences between narratives in everyday life. For example, an analogy with the Cinderella story may be made in describing the unexpected success of an underdog in seemingly different stories. We present a new task and dataset for story understanding: identifying instances of similar narratives from a collection of narrative texts. We present an initial approach for this problem, which finds correspondences between narratives in terms of plot events, and resemblances between characters and their social relationships. Our approach yields an 8% absolute improvement in performance over a competitive information-retrieval baseline on a novel dataset of plot summaries of 577 movie remakes from Wikipedia.


Introduction
The ability to automatically understand narratives has been a long-standing goal of AI. Humans routinely invoke narratives to share information, learn normative behavior, and to make sense of the world (Gottschall, 2012;Miller and Mitchell, 1983). They accept narratives that adhere to familiarity and personal experiences, and reinterpret those that appear unfamiliar (Herman, 2003). In this work, we present a new task for narrative understanding: identifying instances of similar narratives. The ability to recognize similar narratives can be valuable for tasks such as QA and information retrieval, and furnish tools towards analyzing collections of real or fictional narratives. For example, given a news story, a digital archives analyst might identify similar stories from the past.
A major bottleneck in computationally exploring narrative similarity is the limited availability of annotated data for analyses and evaluation. A contribution of this work is a dataset of plot summaries of movies, which include movie pairs that have been identified as remakes (see Sec. 3). Our The Vanishing (1993) (right). Note that (1) plot similarity and (2) characters and their relationships are significant elements in determining this similarity. working hypothesis is that re-tellings of similar stories would retain prominent elements in terms of narrative theme, even while they look superficially different. Figure 1 shows an example of two such movie summaries, condensed here for brevity. Our approach for identifying similar narratives infers alignments between pairs of narratives using a story-kernel that takes into account two kinds of likenesses: (1) plot similarity (2) correspondences between characters in the narratives (based on attributes such as name, gender, prominence in the narrative, and social relationships with other characters). 1 While our data and problem formulation do not accommodate all aspects of narrative similarity, and our approach is relatively simple (for example, it doesn't model temporal or sentiment trajectories), we believe they capture many substantial aspects of the phenomenon and serve as a useful starting point for research into the problem.
Our contributions are: 1. We introduce the problem of characterizing narrative similarity in movie remakes, and formulate this as a ranking task. 2. We create a dataset of 577 narratives for this task, mined from plot summaries of movie remakes from Wikipedia. 3. We present a story-kernel that quantifies narrative similarity by considering correspondences between narratives using a charactercentric approach. We empirically evaluate the story-kernel and its various components, and demonstrate its utility.

Related Work
The field of computational narratology has focused on algorithmic understanding and generation of narratives (Mani, 2012;Richards et al., 2009). Much previous work has attempted to understand narratives either from the perspective of their (i) sequences of events (Schank and Abelson, 1975;Chambers and Jurafsky, 2009) or plot units (McIntyre and Lapata, 2010;Goyal et al., 2010;Finlayson, 2012), or from the perspective of (ii) characters (Wilensky, 1978) or personas in a narrative (Propp, 1968;Bamman et al., 2013Bamman et al., , 2014Valls-Vargas et al., 2014). Elsner (2012) explore the plot structure of novels to distinguish original texts from novels from synthetically altered versions of the same. Some recent approaches have also focused on modeling relationships between literary characters (Chaturvedi, 2016;Iyyer et al., 2016;, and their social networks (Elson et al., 2010;Agarwal et al., 2013;Krishnan and Eisenstein, 2015;. Other research has focused on characterizing narratives in terms of their structure. In particular, seminal formalisms such as plot units (Lehnert, 1981) and Story Grammars (Rumelhart, 1980) have been used to analyze story plots. A significant issue with almost all such frameworks is that they are either largely conceptual, or depend on careful manual annotations of features about narrative plot elements (Elsner, 2012;Elson, 2012;Finlayson and Henry Winston, 2006), which makes them unamenable to comprehensive empirical analysis. While some of the above approaches explore prototypical patterns that characterize narratives (Nguyen et al., 2013) and narrative similarity (Fisseni and Löwe, 2012), they do not address the issue of automatically comparing narratives. Also noteworthy in this context is the Aarne-Thompson classification system (Aarne and Thompson, 1961), which has been extensively used in the analysis of folk-tales to organize types of stories, based on an index of motifs. Our work is most closely related to that of Nguyen et al. (2014) who attempt to understand the various dimensions that experts and non-experts consider while judging narrative similarity.

Movie Remakes Dataset
We present a dataset for evaluating narrative level similarity of texts. Our assumption is that movie remakes are re-tellings of the same story and retain prominent narrative elements. Hence, a good measure of narrative similarity should evaluate summaries of movie remakes as being similar to each other. To this end, we present a dataset of movie plots extracted from Wikipedia.
In particular, we scraped lists of movies from the 'Lists of film remakes' page on Wikipedia, which consist of entries of movies considered remakes of previous movies. Since some movies have multiple remakes, we obtain clusters of movie plots, each of which share the same narrative theme. For each movie, we extract the text of its corresponding Wikipedia plot summary from the CMU Movie summary dataset (Bamman et al., 2013). In some cases, the remakes are close to the originals at a surface level, whereas in other cases, they diverge at a surface level, and may also significantly differ in the narrative. These clusters were then manually pruned to remove errors, and the statistics of the curated dataset are shown in Table 1. In particular, we observe that the average summary is quite long (564 words), which would make human annotations of similarity for such narratives difficult.
NLP pre-processing: We processed texts of movie summaries using the BookNLP pipeline (Bamman et al., 2014) to get dependency parses, and identify major characters. We also assigned a gender to each character which corresponded to the gender that is most frequently assigned to that character's mentions across the story using the Stanford Core NLP  (Manning et al., 2014).

Identifying Narrative Similarity
Our approach's core consists of a story-kernel, S(s i , s j ) that characterizes the similarity between two narratives, s i and s j . The story-kernel has the following two components: (i) Plot Kernel, which incorporates surface similarity between plots of the two stories (in terms of the principal events and entities), and (ii) Character Alignment Kernel, which considers correspondences in terms of character attributes and relationships.
Plot Kernel: A simple measure for narrative similarity can incorporate lexical similarities between textual descriptions of two narratives. However, our goal is to identify narratives that have similar plot structure, rather than incidental surface-level matches in their summaries. Therefore, we focus only on events, and entities and their properties. We model events mentioned in a story by identifying all verbs occurring in the text of the narrative. We capture entities and their properties by identifying nouns and the adjectives that modify them. As mentioned earlier, our approach specifically models characters as a separate component in the story-kernel. Hence, at this stage, we only consider text entities that do not represent a character mention. We represent the plot of a narrative using a bag-of-word representation of its events and entities (and their characteristics) as described above. We then define S plot (s i , s j ) as the cosine similarity between these representations for narratives s i and s j .
Character Alignment Kernel: This component compares two narratives by aligning characters of one with similar characters in the other. Specifically, we align each character, c i , of a story, s i , to a character, c j , of the other story, s j . This alignment is based on a similarity score, S(c i , c j ), between the two characters (defined later). The goal of this joint alignment is to maximize the average alignment score of characters in the narrative pair: S char (s i , s j ) = max xc i c j ;c i ∈s i ,c j ∈s j xc i c j S(c i ,c j ) N subject to alignment constraints c i x c i c j = 1, ∀c j and c j x c i c j = 1, ∀c i . Here, x is a binary matrix indicating character alignments, and N is the total number of aligned characters from the two narratives. These constraints ensure that each character is aligned to one, and only one, character from the other story. This combinatorial optimization can be solved in polynomial time by modifying the Hungarian assignment algorithm (Kuhn, 1955). When two stories have different number of characters, the extra unaligned characters are aligned to a special null character from the other story.
In the above description, the similarity between two (non-null) characters, S(c i , c j ) ∈ [0, 1], is defined as a convex combination of their similarities along (i) name, (ii) gender, (iii) prominence in the story, and (iv) attributes and social relationships with other characters.
Here, (1) S name (c i , c j ) is an indicator function that identifies if two character names are matching strings. It prefers aligning characters with same names, and can be a strong but shallow signal.
(2) S gender (c i , c j ) prefers alignments of characters with the same gender; i.e. S gender (c i , c j ) = 1 if gender of c i is the same as of c j , and 0 otherwise.
(3) S prom (c i , c j ) aligns characters with similar prominence. E.g., it avoids matching a protagonist with a side-character. We compute the prominence of a character, prom(c), as simply the fraction of mentions that refer to this character. We then de- considers how similar the two characters are in terms of attributes, and their relationship to other characters. For example, characters described with positive traits, or friends of the protagonist in one story are likely to be better matched to similar characters in other narratives. We model the relationship between two characters (from the same narrative) by extracting features describing actions in which they participate and adjectives describing character attributes (Chaturvedi et al., 2017). E.g. we identify the actions using verbs that have the two character Figure 2: Performance of various approaches on Narrative Similarity task mentions as their agents (identified using 'nsubj' and 'agent' dependency relations), and patients (using 'dobj' and 'nsubjpass' relations). We then represent a character's relationship with all other characters in the narrative using these features. Finally, we compute the relationship-based similarity, S reln (c i , c j ) between two characters, c i and c j as their cosine similarity in this feature space.
The story kernel is then defined as:

Evaluation
For our experiment, we tune parameters on 20% of the data and use the remaining data (466 movies) for the test set. In order to keep the test set completely distinct from the types of stories used for parameter tuning, this split was performed at the cluster level. Given a test story, we output the most similar story from the dataset. For evaluation, we compute P@1 (precision at 1) as follows: The output is deemed correct only if the predicted movie belongs to the same remake cluster, and incorrect otherwise. Figure 2 shows the performance of our approach in identifying narrative similarity. Here, BoW refers to a baseline approach that uses all words in the movie summary (after stopword removal and lemmatization) as feature representation, and uses cosine similarity for retrieval. This approach achieves a P@1 performance of 0.558. The second column corresponds to using S plot alone (does not include character mentions), and performs slightly worse than BoW, suggesting that character names do indicate remakes in our data. Adding character names in the plot kernel (third column) improves the performance significantly above BoW, indicating substantial value in focusing on narrative elements such as events and entities, rather than the entire text. The fourth column

Component
Weights S plot 0.7 S character 0.3 S character -name 0.4 S character -gender 0.1 S character -prominence 0.1 S character -relationship 0.4 shows an ablated variant of the approach that separately adds a character kernel score to S plot based on character names alone, but does not incorporate correspondences based on gender, prominence and character relationships. We combine this alternative character-kernel with the plot-based kernel in the same manner (using a mixing parameter α tuned on the development set). This indicates that it helps to have a separate component dedicated to characters while solving this task. The final column shows the full model, which leads to a significant further improvement in performance to 0.637, reflecting an 8% absolute improvement over the baseline model. This indicates significant value in modeling multiple facets of character attributes and relationships. We observed similar trends on the development set. Table 2 shows the weights for individual components of our kernel (tuned on the development set). These results validate our assumption that both plot and character similarity are distinct and important facets in evaluating narrative similarity. Qualitative Results and Error Analysis: Figure 3 shows an illustrative example of character alignment using our story-kernel for the moviesummaries shown in Figure 1. Note that the stories do not share any character names. Our approach aligns the protagonists of the two narratives, Rex and Jeff. It also aligns their respective kidnapped girlfriends, Saskia and Diane, and their new girlfriends, Lieneke, and Rita. However, it aligns Saskia's kidnapper, Raymond, with a null character, even though the movie's summary mentions Diane's kidnapper, Barney Cousins. In this case, the NLP pipeline does not identify Barney Cousins as an animate character, possibly due to his unusual name. As a result, the method received as input a summary in which only three characters were identified for the story on the right. Nevertheless, it correctly identifies the story on the right Figure 3: Example of aligned characters from the two movies in Figure 1 as most similar to the story on the left. An error analysis reveals that apart from missed characteridentification, other NLP pipeline errors such as missed coreference, are major sources of errors.

Conclusion
We introduce an objective task, dataset and approach for quantitative evaluation of narrative similarity. Our approach, which compares narratives based on plot and character correspondences, takes a step towards addressing this problem. However, the general problem of narrative similarity can have further complexities. For example, narrative similarity can be abstract and rely on deeper reasoning (e.g., the subliminal resistance of temptation of power in 'The Lord of The Rings'). Such aspects are beyond the scope of current NLP tools, but may guide future explorations. Future work can also explore other domains (e.g., newswire and literary fiction) and evaluate character and event alignments between narratives based on established ground truths.