Detecting Everyday Scenarios in Narrative Texts

Script knowledge consists of detailed information on everyday activities. Such information is often taken for granted in text and needs to be inferred by readers. Therefore, script knowledge is a central component to language comprehension. Previous work on representing scripts is mostly based on extensive manual work or limited to scenarios that can be found with sufficient redundancy in large corpora. We introduce the task of scenario detection, in which we identify references to scripts. In this task, we address a wide range of different scripts (200 scenarios) and we attempt to identify all references to them in a collection of narrative texts. We present a first benchmark data set and a baseline model that tackles scenario detection using techniques from topic segmentation and text classification.


Introduction
According to Grice's (1975) theory of pragmatics, people tend to omit basic information when participating in a conversation (or writing a story) under the assumption that left out details are already known or can be inferred from commonsense knowledge by the hearer (or reader). Consider the following text fragment about eating in a restaurant from an online blog post: Example 1.1 (. . . ) we drove to Sham Shui Po and looked for a place to eat. (. . . ) [O]ne of the restaurants was fully seated [so we] chose another. We had 4 dishes-Cow tripe stir fried with shallots, ginger and chili. 1000-year-old-egg with watercress and omelet. Then another kind of tripe and egg-all crispy on the top and soft on the inside. Finally calamari stir fried with rock salt and chili. Washed down with beers and tea at the end. (. . . ) The text in Example 1.1 obviously talks about a restaurant visit, but it omits many events that are involved while eating in a restaurant, such as finding a table, sitting down, ordering food etc., as well as participants such as the waiter, the menu,the bill. A human reader of the story will naturally assume that all these ingredients have their place in the reported event, based on their commonsense knowledge, although the text leaves them completely implicit. For text understanding machines that lack appropriate common-sense knowledge, the implicitness however poses a non-trivial challenge.
Writing and understanding of narrative texts makes particular use of a specific kind of commonsense knowledge, referred to as script knowledge (Schank and Abelson, 1977). Script knowledge is about prototypical everyday activity, called scenarios. Given a specific scenario, the associated script knowledge enables us to infer omitted events that happen before and after an explicitly mentioned event, as well as its associated participants. In other words, this knowledge can help us obtain more complete text representations, as required for many language comprehension tasks.
There has been some work on script parsing (Ostermann et al., 2017(Ostermann et al., , 2018c, i.e., associating texts with script structure given a specific scenario. Unfortunately, only limited previous work exists on determining which scenarios are referred to in a text or text segment (see Section 2). To the best of our knowledge, this is the first dataset of narrative texts which have annotations at sentence level according to the scripts they instantiate.
In this paper, we describe first steps towards the automatic detection and labeling of scenariospecific text segments. Our contributions are as follows: • We define the task of scenario detection and introduce a benchmark dataset of annotated narrative texts, with segments labeled according to the scripts they in-stantiate (Section 3). To the best of our knowledge, this is the first dataset of its kind. The corpus is publicly available for scientific research purposes at this http: //www.sfb1102.uni-saarland.de/ ?page_id=2582.
• As a benchmark model for scenario detection, we present a two-stage model that combines established methods from topic segmentation and text classification (Section 4).
• Finally, we show that the proposed model achieves promising results but also reveals some of the difficulties underlying the task of scenario detection (Section 5).

Motivation and Background
A major line of research has focused on identifying specific events across documents, for example, as part of the Topic Detection and Tracking (TDT) initiative (Allan et al., 1998;Allan, 2012). The main subject of the TDT intiative are instances of world events such as Cuban Riots in Panama. In contrast, everyday scenarios and associated sequences of event types, as dealt with in this paper, have so far only been the subject of individual research efforts focusing either on acquiring script knowledge, constructing story corpora, or script-related downstream tasks. Below we describe significant previous work in these areas in more detail.
Script knowledge. Scripts are descriptions of prototypical everyday activities such as eating in a restaurant or riding a bus (Schank and Abelson, 1977). Different lines of research attempt to acquire script knowledge. Early researchers attempted to handcraft script knowledge (Mueller, 1999;Gordon, 2001). Another line of research focuses on the collection of scenario-specific script knowledge in form of event sequence descriptions (ESDs) via crowdsourcing, (Singh et al., 2002;Gupta and Kochenderfer, 2004;Li et al., 2012;Raisig et al., 2009;Regneri et al., 2010;Wanzare et al., 2016) Table 1 summarizes various script knowledge-bases. Our work lies in between both lines of research and may help to connect them: we take an extended set of specific scenarios as a starting point and attempt to identify instances of those scenarios in a large-scale collection of narrative texts.
Textual resources. Previous work created scriptrelated resources by crowdsourcing stories that instantiate script knowledge of specific scenarios. For example, Modi et al. (2016) and Ostermann et al. (2018a, 2019 asked crowd-workers to write stories that include mundane aspects of scripts "as if explaining to a child". The collected datasets, InScript and MCScript, are useful as training instances of narrative texts that refer to scripts. However, the texts are kind of unnatural and atypical because of their explicitness and the requirement to workers to tell a story that is related to one single scenario only. Gordon and Swanson (2009) employed statistical text classification in order to collect narrative texts about personal stories. The Spinn3r 1 dataset (Burton et al., 2009) contains about 1.5 Million stories. Spinn3r has been used to extract script information (Rahimtoroghi et al., 2016, see below). In this paper, we use the Spinn3r personal stories corpus as a source for our data collection and annotation. The bottom part of Table 1 summarizes various script-related resources. The large datasets come with no scenarios labels while the crowdsourced datasets only have scenario labels at story level. Our work provides a more fine grained scenario labeling at sentence level.
Script-related tasks. Several tasks have been proposed that require or test computational models of script knowledge. For example, Kasch and Oates (2010) and Rahimtoroghi et al. (2016) propose and evaluate a method that automatically creates event schemas, extracted from scenario-specific texts. Ostermann et al. (2017)   tify and label mentions of events from specific scenarios in corresponding texts. Finally, Ostermann et al. (2018b) present an end-to-end evaluation framework that assesses the performance of machine comprehension models using script knowledge. Scenario detection is a prerequisite for tackling such tasks, because the application of script knowledge requires awareness of the scenario a text segment is about.

Task and Data
We define scenario detection as the task of identifying segments of a text that are about a specific scenario and classifying these segments accordingly.
For the purpose of this task, we view a segment as a consecutive part of text that consists of one or more sentences. Each segment can be assigned none, one, or multiple labels.
Scenario labels. As a set of target labels, we collected scenarios from all scenario lists available in the literature (see Table 1). During revision, we discarded scenarios that are too vague and general (e.g. childhood) or atomic (e.g. switch on/off lights), admitting only reasonably structured activities. Based on a sample annotation of Spinn3r stories, we further added 58 new scenarios, e.g. attending a court hearing, going skiing, to increase cover-age. We deliberately included narrowly related scenarios that stand in the relation of specialisation (e.g. going shopping and shopping for clothes, or in a subscript relation (flying in an airplane and checking in at the airport). These cases are challenging to annotators because segments may refer to different scenarios at the same time.
Although our scenario list is incomplete, it is representative for the structural problems that can occur during annotation. We have scenarios that have varying degrees of complexity and cover a wide range of everyday activities. The complete list of scenarios 2 is provided in Appendix B.
Dataset. As a benchmark dataset, we annotated 504 texts from the Spinn3r corpus. To make sure that our dataset contains a sufficient number of relevant sentences, i.e., sentences that refer to scenarios from our collection, we selected texts that have a high affinity to at least one of these scenarios. We approximate this affinity using a logistic regression model fitted to texts from MCScript, based on LDA topics (Blei et al., 2003) as features to represent a document.

Annotation
We follow standard methodology for natural language annotation (Pustejovsky and Stubbs, 2012). Each text is independently annotated by two annotators, student assistants, who use an agreed upon set of guidelines that is built iteratively together with the annotators. For each text, the students had to identify segments referring to a scenario from the scenario list, and assign scenario labels. If a segment refers to more than one script, they were allowed to assign multiple labels. We worked with a total of four student assistants and used the Webanno 3 annotation tool (de Castilho et al., 2016).
The annotators labeled 504 documents, consisting of 10,754 sentences. On average, the annotated documents were 35.74 sentences long. A scenario label could be either one of our 200 scenarios or None to capture sentences that do not refer to any of our scenarios.
Guidelines. We developed a set of more detailed guidelines for handling different issues related to 2 The scenario collection was jointly extended together with the authors of MCScript (Ostermann et al., 2018a(Ostermann et al., , 2019  the segmentation and classification, which is detailed in Appendix A. A major challenge when annotating segments is deciding when to count a sentence as referring to a particular scenario. For the task addressed here, we consider a segment only if it explicitly realizes aspects of script knowledge that go beyond an evoking expression (i.e., more than one event and participant need to be explicitly realized). Example 3.1 below shows a text segment with minimal scenario information for going grocery shopping with two events mentioned. In Example 3.2, only the evoking expression is mentioned, hence this example is not annotated.
Example 3.1 going grocery shopping ...We also stopped at a small shop near the hotel to get some sandwiches for dinner...
Example 3.2 paying for gas ... A customer was heading for the store to pay for gas or whatever,...

Statistics
Agreement. To measure agreement, we looked at sentence-wise label assignments for each doubleannotated text. We counted agreement if the same scenario label is assigned to a sentence by both annotators. As an indication of chance-corrected agreement, we computed Kappa scores (Cohen, 1960). A kappa of 1 means that both annotators provided identical (sets of) scenario labels for each sentence. When calculating raw agreements, we counted agreement if there was at least one same scenario label assigned by both annotators. Table 2 shows the Kappa and raw (in italics) agreements for each pair of annotators. On average, the Kappa score was 0.61 ranging from 0.57 to 0.64. The average raw agreement score was 0.70 ranging from 0.65 to 0.72. The Kappa value indicates relatively consistent annotations across annotators even though the task was challenging. We used fuzzy matching to calculate agreement in span between segments that overlap by at least one token. Table 3 shows pairwise % agreement  Table 3: Relative agreement on segment spans between annotated segments that overlap by at least one token and are assigned the same scenario label scores between annotators. On average, the annotators achieve 67% agreement on segment spans. This shows considerable segment overlap when both annotators agreed that a particular scenario is referenced.
Analysis. Figure 1 shows to what extent the annotators agreed in the scenario labels. The None cases accounted for 32% of the sentences. Our scenario list is by far not complete. Although we selected stories with high affinity to our scenarios, other scenarios (not in our scenario list) may still occur in the stories. Sentences referring to other scenarios were annotated as None cases.
The None label was also used to label sentences that described topics related to but not directly part of the script being referenced. For instance, sentences not part of the narration, but of a different discourse mode (e.g. argumentation, report) or sentences where no specific script events are mentioned 4 . About 20% of the sentences had Single annotations where only one annotator indicated that there was a scenario reference. 47% of the sentences were assigned some scenario label(s) by both annotators (Identical, At least one, Different). Less than 10% of the sentences had Different scenario labels for the case where both annotators assigned scenario labels to a sentence. This occurred frequently with scenarios that are closely related (e.g. going to the shopping center, going shopping) or scenarios in a subscenario relation (e.g. flying in a plane, checking in at the airport) that share script events and participants. In about 7% of the sentences, both annotators agreed on At least one scenario label. The remaining 30% of the sentences were assigned Identical (sets of) scenario labels by both annotators. No. of sentences Figure 1: Absolute counts on sentence-level annotations that involve the same (Identical), overlapping (At least one) or disagreed (Different) labels; also shown are the number of sentences that received a label by only one annotator (Single) or no label at all (None).

Adjudication and Gold Standard
The annotation task is challenging, and so are gold standard creation and adjudication. We combined automatic merging and manual adjudication (by the main author of the paper) as two steps of gold-standard creation, to minimize manual postprocessing of the dataset. We automatically merged annotated segments that are identical or overlapping and have the same scenario label, thus maximizing segment length. Consider the two annotations shown in Example 3.3. One annotator labeled the whole text as growing vegetables, the other one identified the two bold-face sequences as growing vegetables instances, and left the middle part out. The result of the merger is the maximal growing vegetables chain, i.e., the full text. Taking the maximal chain ensures that all relevant information is included, although the annotators may not have agreed on what is script-relevant.

Example 3.3 growing vegetables
The tomato seedlings Mitch planted in the compost box have done really well and we noticed flowers on them today . Hopefully we will get a good It has rained and rained here for the past month so that is doing the garden heaps of good . We bought some organic herbs seedlings recently and now have some thyme , parsley , oregano and mint growing in the garden .We also planted some lettuce and a grape vine . We harvested our first crop of sweet potatoes a week or so ago (. . . ) The adjudication guidelines were deliberately designed in a way that the adjudicator could not easily  overrule the double-annotations. The segmentation could not be changed, and only the labels provided by the annotators were available for labeling. Since segment overlap is handled automatically, manual adjudication must only care about label disagreement: the two main cases are (1) a segment has been labeled by only one annotator and (2) a segment has been assigned different labels by its two annotators. In case (1), the adjudicator had to take a binary decision to accept the labeled segment, or to discard it. In case (2), the adjudicator had three options: to decide for one of the labels or to accept both of them.
Gold standard. The annotation process resulted in 2070 single segment annotations. 69% of the single segment annotations were automatically merged to create gold segments. The remaining segments were adjudicated, and relevant segments were added to the gold standard. Our final dataset consists of 7152 sentences (contained in 895 segments) with gold scenario labels. From the 7152 gold sentences, 1038 (15%) sentences have more than one scenario label. 181 scenarios (out of 200) occur as gold labels in our dataset, 179 of which are referred to in at least 2 sentences. Table 4 shows example scenarios 5 and the distribution of scenario labels: the number of documents that refer to the given scenario, the number of gold sentences and segments referring to the given scenario, and the average segment length (in sentences) per scenario. 16 scenarios are referred to in more than 100 gold sentences, 105 scenarios in at least 20 gold sentences, 60 scenarios in less than 20 gold sentences. Figure 2 shows the distribution of segments and scenario references per text in the gold standard. On average, there are 1.8 segments per text and 44% of the texts refer to at least two scenarios.

Benchmark model
Texts typically consist of different passages that refer to different scenarios. When human hearers or readers come across an expression that evokes a particular script, they try to map verbs or clauses in the subsequent text to script events, until they face lexical material that is clearly unrelated to the script and may evoke a different scenario. Scenario identification, scenario segmentation, and script parsing are subtasks of story comprehension, which ideally work in close mutual interaction. In this section, we present a model for scenario identification, which is much simpler in several respects: we propose a two-step model consisting of a segmentation and a classification component. For segmentation, we assume that a change in scenario focus can be modeled by a shift in lexical cohesion. We identify segments that might be related to specific scripts or scenarios via topic segmentation, assuming that scenarios can be approximated as distributions over topics. After segmentation, a supervised classifier component is used to predict the scenario label(s) 5 The rest of the scenarios are listed in Appendix B for each of the found segment. Our results show that the script segmentation problem can be solved in principle, and we propose our model as a benchmark model for future work.
Segmentation. The first component of our benchmark model reimplements a state-of-art unsupervised method for topic segmentation, called TopicTiling (Riedl and Biemann, 2012). TopicTiling (TT) uses latent topics inferred by a Latent Derichlet Allocation (LDA, Blei et al. (2003)) model to identify segments (i.e., sets of consecutive sentences) referring to similar topics. 6 The TT segmenter outputs topic boundaries between sentences where there are topic shifts. Boundaries are computed based on coherence scores. Coherence scores close to 1 indicate significant topic similarity while values close to 0 indicate minimal topic similarity. A window parameter is used to determine the block size i.e. the number of sentences to the left and right that should be considered when calculating coherence scores. To discover segment boundaries, all local minima in the coherence scores are identified using a depth score (Hearst, 1994). A threshold µ−σ/x is used to estimate the number of segments, where µ is the mean and σ is the standard deviation of the depth scores, and x is a weight parameter for setting the threshold. 7 Segment boundaries are placed at positions greater than the threshold.
Classification. We view the scenario classification subtask as a supervised multi-label classification problem. Specifically, we implement a multilayer perceptron classifier in Keras (Chollet et al., 2015) with multiple layers: an input layer with 100 neurons and ReLU activation, followed by an intermediate layer with dropout (0.2), and finally an output layer with sigmoid activations. We optimize a cross-entropy loss using adam. Because multiple labels can be assigned to one segment, we train several one-vs-all classifiers, resulting in one classifier per scenario. We also experimented with different features and feature combinations to represent text segments: term frequencies weighted by inverted document frequency (tf.idf, Salton and McGill (1986)) 8 and topic features derived from LDA (see above), and we tried to work with word embeddings. We found the performance with tf.idf features to be the best.

Experiments
The experiments and results presented in this section are based on our annotated dataset for scenario detection described in section 3.

Experimental setting
Preprocessing and model details. We represent each input to our model as a sequence of lemmatized content words, in particular nouns and verbs (including verb particles). This is achieved by preprocessing each text using Stanford CoreNLP (Chen and Manning, 2014).
Segmentation. Since the segmentation model is unsupervised, we can use all data from both MC-Script and the Spinn3r personal stories corpora to build the LDA model. As input to the TopicTiling segmentor, each sentence is represented by a vector in which each component represents the (weight of a) topic from the LDA model (i.e. the value of the i th component is the normalized weight of the words in the sentence whose most relevant topic is the i th topic). For the segmentation model, we tune the number of topics (200) and the window size (2) based on an artificial development dataset, created by merging segments from multiple documents from MCScript.
Classification. We train the scenario classification model on the scenario labels provided in MC-Script (one per text). For training and hyperparameter selection, we split MCScript dataset (see Section 2) into a training and development set, as indicated in Table 5. We additionally make use of 18 documents from our scenario detection data (Section 3) to tune a classification threshold. The remaining 486 documents are held out exclusively for testing (see Table 5). Since we train separate classifiers for each scenario (one-vs-all classifiers), we get a probability distribution of how likely a sentence refers to a scenario. We use entropy to measure the degree of scenario content in the sentences. Sentences with entropy values higher than the threshold are considered as not referencing any scenario (None cases), while sentences with lower entropy values reference some scenario.
Baselines. We experiment with three informed baselines: As a lower bound for the classification task, we compare our model against the baseline   Table 6: Results for the scenario detection task sent maj, which assigns the majority label to all sentences. To assess the utility of segmentation, we compare against two baselines that use our proposed classifier but not the segmentation component: the baseline sent tf.idf treats each sentence as a separate segment and random tf.idf splits each document into random segments.
Evaluation. We evaluate scenario detection performance at the sentence level using micro-average precision, recall and F 1 -score. We consider the top 1 predicted scenario for sentences with only one gold label (including the None label), and top n scenarios for sentences with n gold labels. For sentences with multiple scenario labels, we take into account partial matches and count each label proportionally. Assuming the gold labels are washing ones hair and taking a bath, and the classifier predicts taking a bath and getting ready for bed. Taking a bath is correctly predicted and accounts for 0.5 true positive (TP) while washing ones hair is incorrectly missed, thus accounts for 0.5 false negative (FN). Getting ready for bed is incorrectly predicted and accounts for 1 false positive (FP). We additionally provide separate results of the segmentation component based on standard segmentation evaluation metrics.

Results
We present the micro-averaged results for scenario detection in Table 6. The sent maj baseline achieves a F 1 -score of only 6%, as the majority class forms only a small part of the dataset (4.7%). Our TT model with tf.idf features surpasses both baselines that perform segmentation only naively (26% F 1 ) or randomly (37% F 1 ). This result shows that scenario detection works best when using predicted segments that are informative and topically consistent.
We estimated an upper bound for the classifier by taking into account the predicted segments from the segmentation step, but during evaluation, only considered those sentences with gold scenario labels (TT tf.idf (Gold)), while ignoring the sentences with None label. We see an improvement in precision (54%), showing that the classifier correctly predicts the right scenario label for sentences with gold labels while also including other sentences that may be in topic but not directly referencing a given scenario.
To estimate the performance of the TT segmentor individually, we run TT on an artificial development set, created by merging segments from different scenarios from MCScript. We evaluate the performance of TT by using two standard topic segmentation evaluation metrics, P k (Beeferman et al., 1999) and WindowDiff (W D, Pevzner and Hearst (2002)). Both metrics express the probability of segmentation error, thus lower values indicate better performance. We compute the average performance over several runs. TT attains P k of 0.28 and W D of 0.28. The low segmentation errors suggest that TT segmentor does a good job in predicting the scenario boundaries.

Discussion
Even for a purpose-built model, scenario detection is a difficult task. This is partly to be expected as the task requires the assignment of one (or more) of 200 possible scenario labels, some of which are hard to distinguish. Many errors are due to misclassifications between scenarios that share script events as well as participants and that are usually mentioned in the same text: for example, sending food back in a restaurant requires and involves participants from eating in a restaurant. Table 7 shows the 10 most frequent misclassifications by our best model TT tf.idf (F 1). These errors account for 16% of all incorrect label assignments (200 by 200 matrix). The 100 most frequent misclassifications account for 63% of all incorrect label assignments. In a quantitative analysis, we calculated the commonalities between scenarios in terms of the pointwise mutual information (PMI) between scenario labels in the associated stories. We calculated PMI using Equation (1). The probability of a scenario is given by the document frequency of the scenario divided by the number of documents.
Scenarios that tend to co-occur in texts have higher PMI scored. We observe that the scenario-wise recall and F 1 -scores of our classifier are negatively correlated with PMI scores (Pearson correlation of −0.33 and −0.17, respectively). These correlations confirm a greater difficulty in distinguishing between scenarios that are highly related to other scenarios.
On the positive side, we observe that scenariowise precision and F 1 -score are positively correlated with the number of gold sentences annotated with the respective scenario label (Pearson correlation of 0.50 and 0.20, respectively). As one would order pizza laundry gardening barbecue   pizza  clothes  tree  invite  order  dryer  plant  guest  delivery  laundry  hole  grill  decide  washer  water  friend  place  wash  grow  everyone  deliver  dry  garden  beer  tip  white  dig  barbecue  phone  detergent  dirt  food  number  start  seed  serve  minute washing soil season Table 9: Example top 10 scenario-words expect, our approach seems to perform better on scenarios that appear at higher frequency. Table 8 shows the 10 scenarios for which our approach achieves the best results.
Scenario approximation using topics. We performed an analyses to qualitatively examine in how far topic distributions, as used in our segmentation model, actually approximate scenarios. For this analysis, we computed a LDA topic model using only the MCScript dataset. We created scenariotopics by looking at all the prevalent topics in documents from a given scenario. Table 9 shows the top 10 words for each scenario extracted from the scenario-topics. As can be seen, the topics capture some of the most relevant words for different scenarios.

Summary
In this paper we introduced the task of scenario detection and curated a benchmark dataset for automatic scenario segmentation and identification. We proposed a benchmark model that automatically segments and identifies text fragments referring to a given scenario. While our model achieves promising first results, it also revealed some of the difficulties in detecting script references. Script detection is an important first step for large-scale data driven script induction for tasks that require the application of script knowledge. We are hopeful that our data and model will form a useful basis for future work. 5. One passage of text can be associated with more than one scenario label.
• A passage of text associated with two or more related scenarios i.e. scenario that often coincide or occur together. (see example A.11). • A shorter passage of text referring to a given scenario is nested in a longer passage of text referring to a more general scenario. The nested text passage is therefore associated with both the general and specific scenarios. (see example A.12).
6. For a given text passage, if you do not find a full match from the scenario list, but a scenario that is related and similar in structure, you may annotate it. (see example A.13).

Rules of thumb for annotation
1. Do not annotate if no progress to events is made i.e. the text just mentions the scenario but no clear script events are addressed.
Example A.1 short text with event and participants addressed feeding a child ... Chloe loves to stand around babbling just generally keeping anyone amused as long as you bribe her with a piece of bread or cheese first.
going grocery shopping ... but first stopped at a local shop to pick up some cheaper beer . We also stopped at a small shop near the hotel to get some sandwiches for dinner .
Example A.2 scenario is just mentioned cooking pasta And a huge thanks to Megan & Andrew for a fantastic dinner, especially their first ever fresh pasta making effort of salmon filled ravioli -a big winner.
riding on a bus, flying in a plane and then catch a bus down to Dublin for my 9:30AM flight the next morning.
We decide to stop at at Bob Evan's on the way home and feed the children.
Example A.3 scenario is implied but no events are addressed answering the phone one of the citizens nodded and started talking on her cell phone. Several of the others were also on cell phones taking a photograph Here are some before and after shots of Brandon . The first 3 were all taken this past May . I just took this one a few minutes ago.
Example A.4 different discourse mode that is not narration e.g. information, argumentative, no specific events are mentioned writing a letter A long time ago, years before the Internet, I used to write to people from other countries. This people I met through a program called Pen Pal. I would send them my mail address, name, languages I could talk and preferences about my pen pals. Then I would receive a list of names and address and I could start sending them letters. ...

2.
When a segment refers to more than one scenario, either related scenarios or scenarios where one is more general than the other, if there is only a weak reference to one of the scenarios, then annotate the text with the scenario having a stronger or more plausible reference.
Example A.5 one scenario is weakly referenced visiting a doctor, taking medicine taking medicine is weakly referenced Now another week passes and I get a phone call and am told that the tests showed i had strep so i go in the next day and see the doc and he says that i don ' Example A.6 two mentions of a scenario annotated as one segment writing a letter I asked him about a month ago to write a letter of recommendation for me to help me get a library gig. After bugging him on and off for the past month, as mentioned above, he wrote me about a paragraph. I was sort of pissed as it was quite generic and short.
I asked for advice, put it off myself for a week and finally wrote the letter of recommendation myself. I had both Evan and Adj. take a look at it-and they both liked my version.
Example A.7 a separator referring to topic related to the current scenario is included writing an exam The Basic Science Exam (practice board exam) that took place on Friday April 18 was interesting to say the least. We had 4 hours to complete 200 questions, which will be the approximate time frame for the boards as well. I was completing questions at a good pace for the first 1/3 of the exam, slowed during the second 1/3 and had to rush myself during the last 20 or so questions to complete the exam in time.
separator: Starting in May, I am going to start timing myself when I do practice questions so I can get use to pacing. There was a lot of information that was familiar to me on the exam (which is definitely a good thing) but it also showed me that I have a LOT of reviewing to do.
Monday April 21 was the written exam for ECM. This exam was surprisingly challenging. For me, the most difficult part were reading and interpreting the EKGs. I felt like once I looked at them, everything I knew just fell out of my brain. Fortunately, it was a pass/fail exam and I passed.
Example A.8 a long separator is excluded going to the beach Today , on the very last day of summer vacation , we finally made it to the beach . Oh , it 's not that we hadn 't been to a beach before . We were on a Lake Michigan beach just last weekend . And we 've stuck our toes in the water at AJ 's and my lake a couple of times . But today , we actually planned to go . We wore our bathing suits and everything . We went with AJ 's friend D , his brother and his mom .
separator: D and AJ became friends their very first year of preschool when they were two . They live in the next town over and we don 't see them as often as we would like . It 's not so much the distance , which isn 't far at all , but that the school and athletic schedules are constantly conflicting . But for the first time , they are both going back to school on the same day . So we decided to celebrate the end of summer together .
going to the beach It nearly looked too cold to go this morning ' the temperature didn 't reach 60 until after 9 :00. The lake water was chilly , too cool for me , but the kids didn 't mind . They splashed and shrieked with laughter and dug in the sand and pointed at the boat that looked like a hot dog and climbed onto the raft and jumped off and had races and splashed some more . D 's mom and I sat in the sun and talked about nothing in particular and waved off seagulls .
Example A.9 a short separator is included throwing a party ... My wife planned a surprise party for me at my place in the evening -I was told that we 'd go out and that I was supposed to meet her at Dhobi Ghaut exchange at 7 . separator: But I was getting bored in the office around 5 and thought I 'd go homewhen I came home , I surprised her ! She was busy blowing balloons , decorating , etc with her friend . I guess I ruined it for her . But the fun part started here -She invited my sister and my cousin ...

visiting sights
Before getting to the museum we swung by Notre Dame which was very beautiful . I tried taking some pictures inside Notre Dame but I dont think they turned out particularly well . After Notre Dame , Paul decided to show us the Crypte Archeologioue .
separator: This is apparently French for parking garage there are some excellent pictures on Flickr of our trip there .
Also on the way to the museum we swung by Saint Chapelle which is another church . We didnt go inside this one because we hadnt bought a museum pass yet but we plan to return later on in the trip 4. Similarly to intervening text (separator), there may be text before or after that is a motivation, pre or post condition for the applications of the script currently being referred to. Leave out the text if it is long. The text can be included if it is short, or relates to the scenario being addressed.
Example A.10 the first one or two sentences introduce the topic getting a haircut I AM , however , upset at the woman who cut his hair recently . He had an appointment with my stylist (the one he normally goes to ) but I FORGOT about it because I kept thinking that it was a different day than it was . When I called to reschedule , she couldn 't get him in until OCTOBER (?!?!?!) ... baking a cake I tried out this upside down cake from Bill Grangers , Simply Bill . As I have mentioned before , I love plums am always trying out new recipes featuring them when they are in season . I didnt read the recipe properly so was surprised when I came to make it that it was actually cooked much in the same way as a tarte tartin , ie making a caramel with the fruit in a frying pan first , then pouring over the cake mixture baking in the frypan in the oven before turning out onto a serving plate , the difference being that it was a cake mixture not pastry ....

5.
If a text passage refers to several related scenarios, e.g. "renovating a room" and "painting a wall", "laying flooring in a room", "papering a room"; or "working in the garden" and "growing vegetables", annotate all the related scenarios.
Example A.11 segment referring to related scenarios growing vegetables, working in the garden The tomato seedlings Mitch planted in the compost box have done really well and we noticed flowers on them today. Hopefully we will get a good crop. It has rained and rained here for the past month so that is doing the garden heaps of good. We bought some organic herbs seedlings recently and now have some thyme, parsley, oregano and mint growing in the garden. We also planted some lettuce and a grape vine. ...
6. If part of a longer text passage refers to a scenario that is more specific than the scenario currently being talked about, annotate the nested text passage with all referred scenarios.
Example A.12 nested segment preparing dinner I can remember the recipe, it's pretty adaptable and you can add or substitute the vegetables as you see fit!! One Pot Chicken Casserole 750g chicken thigh meat, cut into big cubes olive oil for frying 1 preparing dinner, chopping vegetables large onion, chopped 3 potatoes, waxy is best 3 carrots 4 stalks of celery, chopped 2 cups of chicken stock 2 zucchini, sliced large handful of beans 300 ml cream 1 or 2 tablespoons of wholegrain mustard salt and pepper parsley, chopped ##42 The potatoes and carrots need to be cut into chunks,. I used chat potatoes which are smaller and cut them in half, but I would probably cut a normal potato into quarters. Heat the oil; in a large pan and then fry the chicken in batches until it is well browned... 7. If you do not find a full match for a text segment in the scenario list, but a scenario that is related and similar in its structure, you may annotate it.
Example A.13 topic similarity • Same structure in scenario e.g. going fishing for leisure or for work, share the same core events in going fishing • Baking something with flour (baking a cake, baking Blondies, )