FanfictionNLP: A Text Processing Pipeline for Fanfiction

Fanfiction presents an opportunity as a data source for research in NLP, education, and social science. However, answering specific research questions with this data is difficult, since fanfiction contains more diverse writing styles than formal fiction. We present a text processing pipeline for fanfiction, with a focus on identifying text associated with characters. The pipeline includes modules for character identification and coreference, as well as the attribution of quotes and narration to those characters. Additionally, the pipeline contains a novel approach to character coreference that uses knowledge from quote attribution to resolve pronouns within quotes. For each module, we evaluate the effectiveness of various approaches on 10 annotated fanfiction stories. This pipeline outperforms tools developed for formal fiction on the tasks of character coreference and quote attribution


Introduction
A growing number of natural language processing tools and approaches have been developed for fiction (Agarwal et al., 2013;Bamman et al., 2014;Iyyer et al., 2016;Sims et al., 2019). These tools generally focus on published literary works, such as collections of novels. We present an NLP pipeline for processing fanfiction, amateur writing from fans of TV shows, movies, books, games, and comics.
Fanfiction writers creatively change and expand on plots, settings, and characters from original media, an example of "participatory culture" (Jenkins, 1992;Tosenberger, 2008). The community of fanfiction readers and writers, now largely online, has been studied for its mentorship and support for writers  and for the broad representation of LGBTQ+ characters and relationships in fan-written stories (Lothian et al., 2007;Dym et al., 2019). Fanfiction presents an opportunity as * Denotes equal contribution. a data source for research in a variety of fields, from those studying learning in online communities to social science analysis of how community norms develop in an LGBTQ-friendly environment. For NLP researchers, fanfiction provides a large source of literary text with metadata, and has already been used in applications such as authorship attribution (Kestemont et al., 2018) and character relationship classification (Kim and Klinger, 2019).
There is an vast amount of fanfiction in online archives. As of March 2021, over 7 million stories were hosted on just one fanfiction website, Archive of Our Own, and there exist other online archives of similar or even larger sizes . We present a pipeline that enables structured insight into this vast amount of text by identifying sets of characters in fanfiction stories and attributing narration and quotes to these characters.
Knowing who the characters are and what they do and say is essential for understanding story structure (Bruce, 1981;Wall, 1984). Such processing is also useful for researchers in the humanities and social sciences investigating identification with characters and the representation of characters of diverse genders, sexualities, and ethnicities (Green et al., 2004;Kasunic and Kaufman, 2018;Felski, 2020). The presented pipeline, which extracts text related to characters in fanfiction, can assist researchers building NLP tools for literary domains, as well those analyzing characterization in fields such as digital humanities. For example, the pipeline could be used to explore how characters are voiced and described differently when cast in queer versus straight relationships.
The presented pipeline contains three main modules: character coreference resolution, quote attribution, and extraction of "assertions", narration that relates to particular characters. We incorporate new and existing methods into the pipeline that perform well on an annotated set of 10 fanfiction stories. This includes a novel method using Figure 1: Fanfiction NLP pipeline overview. From the text of a fanfiction story, the pipeline assigns character mentions to character clusters (character coreference). It then attributes assertions and quotes to each character, optionally using the quote attribution output to improve coreference resolution within quotes (see Section 3.3).
quote attribution information to resolve first-and second-person pronouns within quotes.
Fanfiction is written by amateur writers of all ages and education levels worldwide, so it contains much more variety in style and genre than formal fiction. It is not immediately clear that techniques for coreference resolution or quote attribution that perform well on news data or formal fiction will be effective in the informal domain of fanfiction. We demonstrate that this pipeline outperforms existing tools designed for formal fiction on the tasks of character coreference resolution and quote attribution (Bamman et al., 2014).
Contributions. We contribute a fanfiction processing pipeline that outperforms prior work designed for formal fiction. The pipeline includes novel interleaving of coreference and quote attribution to improve the resolution of first-and secondperson pronouns within quotes in narrative text. We also introduce an evaluation dataset of 10 fanfiction stories with annotations for character coreference, as well as for quote detection and attribution.

Fanfiction and NLP
Data from fanfiction has been used in NLP research for a variety of tasks, including authorship attribution (Kestemont et al., 2018), action prediction (Vilares and Gómez-Rodríguez, 2019), finegrained entity typing (Chu et al., 2020), and tracing the sources of derivative texts (Shen et al., 2018). Computational work focusing on characterization in fanfiction includes the work of Milli and Bamman (2016), who found that fanfiction writers are more likely to emphasize female and secondary characters. Using data from WattPad, a platform that includes fanfiction along with original fiction, Fast et al. (2016) find that portrayals of gendered characters generally align with mainstream stereotypes.
We are not aware of any text processing system for fanfiction specifically, though BookNLP (Bamman et al., 2014) is commonly used as an NLP system for formal fiction. We evaluate our pipeline's approaches to character coreference resolution and quote attribution against BookNLP, as well as against other task-specific approaches, on an evaluation dataset of fanfiction.

Fanfiction Processing Pipeline
We introduce a publicly available pipeline for processing fanfiction. 1 This pipeline is a commandline tool developed in Python. From the text of a fanfiction story, the pipeline extracts a list of characters, each mention of a character, as well as what each character does and says ( Figure 1). More specifically, the pipeline first performs character coreference resolution, extracting character mentions and attributing them to character clusters with a single standardized character name (Section 3.1). After coreference, the pipeline outputs quotes uttered by each character using a sieve-based approach from Muzny et al. (2017) (Section 3.2). These quote attribution results are optionally used to aid the resolution of first-and second-person pronouns within quotes to improve coreference output (Section 3.3). In parallel with quote attribution, the pipeline extracts "assertions", topically coherent segments of text that mention a character (Section 3.4).

Character Coreference Module
The story text is first passed through the coreference resolution module, which extracts mentions of characters and attributes them to character clusters. These mentions include alternative forms of names, pronouns, and anaphoric references such as "the bartender". Each cluster is then given a single standardized character name.
Coreference Resolution. We use SpanBERTbase (Joshi et al., 2020), a neural method with stateof-the-art performance on formal text, for coreference resolution. This model uses SpanBERT-base embeddings to create mention representations and employs Lee et al. (2017)'s approach to calculate the coreferent pairs. SpanBERT-base is originally trained on OntoNotes (Pradhan et al., 2012). However, we further fine-tune SpanBERT-base on Lit-Bank , a dataset with coreference annotations for works of literature in English, a domain more similar to fanfiction. The model takes the raw story text as input, identifies spans of text that mention characters, and outputs clusters of mentions that refer to the same character.
Character Standardization. We then assign representative character names for each coreference cluster. These names are simply the most frequent capitalized name variant, excluding pronouns and address terms, such as sir. If there are no capitalized terms in the cluster or if there are only pronouns and address terms, the most frequent mention is chosen as the name.
Post-processing. SpanBERT-base resolves all entity mentions. In order to focus solely on characters, we post-process the cluster outputs. We remove plural pronouns (we, they, us, our, etc.) and noun phrases, demonstrative pronouns (that, this), as well as it mentions. We also remove clusters whose standardized representative names are not named entities and have head words that are not descendants of person in WordNet (Miller, 1995). Thus clusters with standardized names such as "the father" are kept (since they are descendants of person in WordNet), yet clusters with names such as "his workshop" are removed.
For each character cluster, a standardized name and list of the mentions remaining after postprocessing is produced, along with pointers to the position of each mention in the text. This coreference information is then used as input to quote attribution and assertion extraction modules.

Quote Attribution Module
To extract quotes, we simply extract any spans between quotation marks, a common approach in literary texts (O'Keefe et al., 2012). For the wide variety of fanfiction, we recognize a broader set of quotation marks than are recognized in BookNLP's approach for formal fiction.
The pipeline attributes quotes to characters with the deterministic approach of Muzny et al. (2017), which uses sieves such as looking for character mentions that are the head words of known speech verbs. We use a standalone re-implementation of this approach by Sims and Bamman (2020) that allows using the pipeline's character coreference as input. Muzny et al. (2017)'s approach assigns quotes to character mentions and then to character clusters. We simply assign quotes to the names of these selected character clusters.

Quote Pronoun Resolution Module
Recent advances in coreference resolution, such as the SpanBERT-base system incorporated in the pipeline, leverage contextualized word embeddings to compute mention representations and to cluster these mentions from pairwise or higher-order comparisons. They also concatenate features such as the distance between the compared mentions to their representations. However, these approaches to not capture the change in point of view caused by quotes within narratives, so they suffer when resolving first-and second-person pronouns within quotes. To alleviate this issue, we introduce an optional step in the pipeline that uses the output from quote attribution to inform the resolution of firstand second-person pronouns within quotes.
Prior work (Almeida et al., 2014) proposed a joint model for entity-level quotation attribution and coreference resolution, exploiting correlations between the two tasks. However, in this work, we propose an interleaved setup that is modular and allows the user of the pipeline to use independent off-the-shelf pre-trained models of their choice for both coreference resolution and quote attribution.
More specifically, once the quote attribution module predicts the position of each quote (q i ) and its associated speaker (s i ), the first-person pronouns within the quote (e.g. I, my, mine, me) are resolved to the speaker of that quote, s i . For secondperson pronouns (e.g. you, your, yours), we assume that they point to the addressee of the quote (a i ), which is resolved to be the speaker of the nearest quote before the current quote (a i = s i−j such that s i−j = s i ). We only consider the previous 5 quotes to find a i .
Since there are no sieves for quote attribution that consider pronouns within quotes, the improved coreference within quotes from this optional step does not affect quote attribution. Thus, this "cycle" of character coreference, then quote attribution, then improved character coreference, need only be run once. However, the improved coreference resolution could impact which assertions are associated with characters.

Assertion Extraction Module
After coreference, the pipeline also extracts what we describe as "assertions", topically coherent segments of text that mention a character. The motivation for this is to identify longer spans of exposition and narrative that relate to characters for building embedding representations for these characters. Parsing these assertions would also facilitate the extraction of descriptive features such as verbs for which characters are subjects and adjectives used to describe characters.
To identify such spans of texts that relate to characters, we first segment the text with a topic segmentation approach called TextTiling (Hearst, 1997). We then assign segments (with quotes removed) to characters if they contain at least one mention of the character within the span. If multiple characters are mentioned, the span is included in extracted assertions for each of the characters.

Fanfiction Evaluation Dataset
To evaluate our pipeline, we annotate a dataset of 10 publicly available fanfiction stories for all mentions of characters and quotes attributed to these characters, which is similar in size to the test set used in LitBank . We select these stories from Archive of Our Own 2 , a large fanfiction archive that is maintained and operated by a fan-centered non-profit organization, the Organization for Transformative Works (Fiesler et al., 2016). To capture a representative range of fanfiction, we choose one story from each of the 10 most popular fandoms on Archive of Our Own when we collected data in 2018 (Table 1). Fandoms are fan communities organized around a particular original media source. For each fandom, we randomly sampled a story in English that has fewer than 5000 words and does not contain explicit sexual or violent content. Two of the authors annotated the 10 stories for each of the tasks of character coreference and quote attribution. All annotators were graduate students working in NLP. Statistics on this evaluation dataset and the annotations can be found in Table 2.
These stories illustrate the expanded set of challenges and variety in fanfiction. In one story, all of the characters meet clones of themselves as male if they are female, or female if they are male. This is a variation on the practice of "genderswapping" characters in fanfiction (McClellan, 2014). Coreference systems can struggle to keep up with characters with the same name but different genders. Another story in our test set is a genre of fanfiction called "songfic", which intersperses song lyrics into the narrative. These song lyrics often contain pronouns such as I and you that do not refer to any character.
For quote attribution, challenges in the test set include a variation of quotation marks, sometimes used inconsistently. There is also great variation in the number of indirect quotes without clear quota-tives such as "she said". This can be a source of ambiguity in published fiction as well, but we find a large variety of styles in fanfiction. One fanfiction story in our evaluation dataset, for example, contains many implicit quotes in conversations among three or more characters, which can be difficult for quote attribution.
Annotation details and inter-annotator agreement for this evaluation dataset are described below. An overview of inter-annotator agreement is provided in Table 3.

Character Coreference Annotation
To annotate character mentions in our evaluation dataset, annotators (two of the authors) were instructed to identify and group all mentions of singular characters, including pronouns, generic phrases that refer to characters such as "the boy", and address terms. Possessive pronouns were also annotated, with nested mentions for phrases such as <char1><char2>his</char2> sister</char1>. Determiners and prepositional phrases attached to nouns were annotated, since they can specify characters and contribute to characterization. For an example, <char1>an old friend of <char2>my</char2> parents</char1>. Note that "parents" is not annotated in this example since it does not refer to a singular character. Appositives were annotated, while relative clauses ("the woman who sat on the left") and phrases after copulas ("he was a terrible lawyer") were not annotated, as we found them to act more as descriptions of characters than mentions.
After extracting character mentions, annotators grouped these mentions into character clusters that refer to the same character in the story. Note that since we focus on characters, we do not annotate other non-person entities usually included in coreference annotations. Full annotation guidelines are available online 3 .
To create a unified set of gold annotations, we resolved disagreements between annotators in a second round of annotation. The final test set of 10 annotated stories contains 2,808 annotated character mentions.
In Table 3, we first provide inter-annotator agreement on extracting the same spans of text as character mentions by comparing BIO labeling at the  Table 3: Inter-annotator agreement (Cohen's κ) between two annotators for each task, averaged across 10 fics. Extraction (BIO) is agreement on extracting the same spans of text (not attributing them to characters) with token-level BIO annotation. Attribution (all) refers to attribution of spans to characters where missed spans receive a NULL character attribution. Attribution (agreed) refers to attribution of spans that both annotators marked.
token level. Tokens that begin a mention are labeled B, tokens that are inside or end a mention are labeled I, and all other tokens are labeled O. Which mentions are identified affects the agreement of attributing those mentions to characters. For this reason, we provide two attribution agreement scores. First, we calculate agreement on mentions annotated by either annotator, with a NULL character annotation if any annotator did not annotate a mention (Attribution (all) in Table 3). We also calculate agreement only for character mentions annotated by both annotators (Attribution (agreed) in Table 3). Character attribution was labeled as matching if there was significant overlap between primary character names chosen for each cluster by annotators; there were no disagreements on this.

Quote Attribution Annotation
Two of the authors annotated all quotes that were said aloud or written by a singular character, and attributed them to a list of characters determined from the character coreference annotations. Annotation was designed to focus on characters' voices as displayed in the stories. Thus characters' thoughts were not annotated as quotes, nor were imagined or hypothetical utterances. We also chose not to annotate indirectly reported quotes, such as "the friend said I was very strange" since this could be influenced more by the character or narrator reporting the quote than the original character who spoke it. However, we did annotate direct quotes that are reported by other characters.
Inter-annotator agreement on quote attribution was 0.89 Cohen's κ on the set of all quotes annotated by any annotator (see Table 3). Attribution agreement on the set of quote spans identified by both annotators was very high, 0.98 κ. Token-level BIO agreement for marking spans as quotes was 0.97 κ. The final test set of 10 stories contains 876 annotated quotes.

Pipeline Evaluation
We evaluate the pipeline against BookNLP, as well as other state-of-the-art approaches for coreference resolution and quote attribution.

Character Coreference Evaluation
We evaluate the performance of the character coreference module on our 10 annotated fanfiction stories using the CoNLL metric (Pradhan et al., 2012; the average of MUC, B 3 , and CEAFE) and LEA metric (Moosavi and Strube, 2016). We compare our approach against different stateof-the-art approaches used for coreference resolution in the past. Along with BookNLP's approach, we consider the Stanford CoreNLP deterministic coreference model (CoreNLP (dcoref); Raghunathan et al., 2010;Recasens et al., 2013;Lee et al., 2011) and the CoreNLP statistical model (CoreNLP (coref); Clark and Manning, 2015) as traditional baselines. As a neural baseline, we evaluate the more recently proposed BERT-base model (Joshi et al., 2019), which replaces the original GloVe embeddings (Pennington et al., 2014) with BERT (Devlin et al., 2019) in Lee et al. (2017)'s coreference resolution approach.
Micro-averaged results across the 10 annotated stories are shown in Table 4. The FanfictionNLP approach is SpanBERT-base fine-tuned on LitBank, with the post-hoc removal of non-person and plural mentions and clusters (as described in Section 3.1). Note that these results are without the quote pronoun resolution module described in Section 3.3. Traditional approaches like BookNLP and CoreNLP (dcoref, coref) perform significantly worse than the neural models, especially on recall. Neural models that are further fine-tuned on LitBank (OL) outperform the ones that are only trained on OntoNotes (O). This suggests that further training the model on literary text data does indeed improve its performance on fanfiction narrative. Furthermore, the SpanBERT-base approaches outperform their BERT-base counterparts with an absolute improvement of 4-5 CoNLL F1 percent-  L: Model is also fine-tuned on LitBank corpus. Fanfic-tionNLP is the SpanBERT-base OL model with posthoc removal of non-person entities. Note that none of the approaches had access to our fanfiction data. These results are without the quote pronoun resolultion module described in Section 3.3.
age points and 6 LEA F1 percentage points. Posthoc removal of non-person and plural entities improves CoNLL precision on characters by more than 12 percentage points over SpanBERT-base OL.

Quote Attribution Evaluation
Using our expanded set of quotation marks, we reach 96% recall and 95% precision of extracted quote spans, micro-averaged over the 10 test stories, compared with 25% recall and 55% precision for BookNLP. For attributing these extracted quotes to characters, we report average F1, precision, and recall under different coreference inputs (Table 5). To determine correct quote attributions, the canonical name for the character cluster attributed by systems to each quote is compared with the gold attribution name for that quote. A match is assigned if a) an assigned name has only one word, which matches any word in the gold cluster name (such as Tony and Tony Stark), or b) if more than half of the words in the name match between the two character names, excluding titles such as Ms. and Dr. Namematching is manually checked to ensure no system is penalized for selecting the wrong name within a correct character cluster. Any quote that a system fails to extract is considered a mis-attribution (an attribution to a NULL character).
As baselines, we consider BookNLP and the approach of He et al. (2013), who train a RankSVM model supervised on annotations from the novel  Table 5: Quote attribution evaluation scores. Scores are reported using the respective system's coreference (system coreference), with gold character coreference supplied (gold coreference) and with gold character and gold quote spans supplied (gold quote extraction). Attribution is calculated by a character name match to the gold cluster name. If a quote span is not extracted by a system, it is counted as a mis-attribution. Micro-averages across the 10-story test set are reported. We include Muzny et al. (2017)'s approach in the FanfictionNLP pipeline.
Pride and Prejudice.
The quality of character coreference affects quote attribution. If an entire character is not identified, there is no chance for the system to attribute a quote to that character. If a system attributes a quote to the nearest character mention and that mention is not attributed to the correct character cluster, the quote attribution will likely be incorrect. For this reason, we evaluate quote attribution with different coreference settings. System coreference in Table 5 refers to quote attribution performance when using the respective system's coreference. That is, BookNLP's coreference was evaluated with BookNLP's quote attribution and FanfictionNLP's coreference with FanfictionNLP's quote attribution. We test He et al. (2013)'s approach with the same coreference input as FanfictionNLP. Evaluations are also reported with gold character coreference, as well as with gold character coreference and with gold quote extractions, to measure attribution without the effects of differences in quote extraction accuracy. The deterministic approach of Muzny et al. (2017), incorporated in the pipeline, outperforms both BookNLP and He et al. (2013)'s RankSVM classifier in this informal narrative domain.

Quote Pronoun Resolution Module Evaluation
We test our approach for resolving pronouns within quotes (Section 3.3) on character coreference on the fanfiction evaluation set. We show results using gold quote attribution as an upper bound of the prospective improvement, and using quote attributions predicted by Muzny et al. (2017)'s approach adopted in the fanfiction pipeline. As shown in Table 6, post-hoc resolution of first-person (I) and second-person (you) pronouns with perfect quote  annotation information (Gold QuA) substantially improves the overall performance of coreference resolution across both CoNLL and LEA F1 scores (by 1.6 and 3.5 percentage points respectively). Similarly, coreference resolution using information from a state-of-the-art quote attribution system (Muzny et al., 2017) also results in statistically significant, although smaller, improvements across both metrics (by 0.3 percentage points and 0.8 percentage points respectively) on the 10 fanfiction stories. These results suggest that our approach is able to leverage the quote attribution outputs (speaker information) to resolve the first and second-person pronouns within quotations. It does so by assuming that the text within a quote is from the point of view of the speaker of the quote, as attributed by the quote attribution system. Table 7 shows the qualitative results on three consecutive quotes from one of the stories in our fanfiction dataset. For the first two quotations, Fanfic-tionNLP incorrectly resolves your/you to the char- Table 7: Coreference Resolution of first-and second-person pronouns in three consecutive quotes from one of the fanfiction stories in our dataset. Results show the impact of the Quote Attribution predictions on the performance of the algorithm described in Section 3.3.
acter Caitlin. However, FanfictionNLP + I + You correctly maps the mentions to Cisco. In the third example, we find that FanfictionNLP + I + You (Muzny QuA) does not perform correct resolution as the speaker output by the quote attribution module is incorrect. This shows the dependence of this algorithm on quality quote attribution predictions.

Assertion Extraction Qualitative Evaluation
There is no counterpart to the pipeline's assertion extraction in BookNLP or other systems. Qualitatively, the spans identified by TextTiling include text that relates to characterization beyond simply selecting sentences that mention characters, and with more precision than selecting whole paragraphs that mention characters.
For example, our approach captured sentences that described how characters were interpreting their environment. In one fanfiction story in our test set, a character "could see stars and planets, constellations and black holes. Everything was distant, yet reachable." Such sentences do not contain character mentions, but certainly contribute to character development and contain useful associations made with characters.
These assertions also capture narration that mentions interactions between characters, but which may not mention any one character individually. In another fanfiction story in which two wizards are dueling, extracted assertions for each character includes, "Their wands out, pointed at each other, each shaking with rage." These associations are important to characterization, but fall outside sentences that contain individual character mentions.

Ethics
Though most online fanfiction is publicly available, researchers must consider how users themselves view the reach of their content (Fiesler and Proferes, 2018). Anonymity and privacy are core values of fanfiction communities; this is especially important since many participants identify as LGBTQ+ (Fiesler et al., 2016;Dym et al., 2019). We informed Archive of Our Own, with our contact information, when scraping fanfiction and modified fanfiction examples given in this paper for privacy. We urge researchers who may use the fanfiction pipeline we present to consider how their work engages with fanfiction readers and writers, and to honor the creativity and privacy of the community and individuals behind this "data".

Conclusion
We present a text processing pipeline for the domain of fanfiction, stories that are written by fans and inspired by original media. Large archives of fanfiction are available online and present opportunities for researchers interested in community writing practices, narrative structure, fan culture, and online communities. The presented text processing pipeline allows researchers to extract and cluster mentions of characters from fanfiction stories, along with what each character does (assertions) and says (quotes).
We assemble state-of-the-art NLP approaches for each module of this processing pipeline and evaluate them on an annotated test set, outperforming a pipeline developed for formal fiction on character coreference and quote attribution. We also present improvements in character coreference with a post-processing step that uses information from quote attribution to resolve first-and second-person pronouns within quotes. Our hope is that this pipeline will be a step toward enabling structured analysis of the text of fanfiction stories, which contain more variety than published, formal fiction. The pipeline could also be applied to other formal or informal narratives outside of fanfiction, though we have not evaluated it in other domains.