Lori Moon


2020

pdf bib
GLUCOSE: GeneraLized and COntextualized Story Explanations
Nasrin Mostafazadeh | Aditya Kalyanpur | Lori Moon | David Buchanan | Lauren Berkowitz | Or Biran | Jennifer Chu-Carroll
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

When humans read or listen, they make implicit commonsense inferences that frame their understanding of what happened and why. As a step toward AI systems that can build similar mental models, we introduce GLUCOSE, a large-scale dataset of implicit commonsense causal knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context. To construct GLUCOSE, we drew on cognitive psychology to identify ten dimensions of causal explanation, focusing on events, states, motivations, and emotions. Each GLUCOSE entry includes a story-specific causal statement paired with an inference rule generalized from the statement. This paper details two concrete contributions. First, we present our platform for effectively crowdsourcing GLUCOSE data at scale, which uses semi-structured templates to elicit causal explanations. Using this platform, we collected a total of ~670K specific statements and general rules that capture implicit commonsense knowledge about everyday situations. Second, we show that existing knowledge resources and pretrained language models do not include or readily predict GLUCOSE’s rich inferential content. However, when state-of-the-art neural models are trained on this knowledge, they can start to make commonsense inferences on unseen stories that match humans’ mental models.

2018

pdf bib
Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions
Lori Moon | Christos Christodoulopoulos | Cynthia Fisher | Sandra Franco | Dan Roth
Proceedings of the 27th International Conference on Computational Linguistics

This paper describes the augmentation of an existing corpus of child-directed speech. The resulting corpus is a gold-standard labeled corpus for supervised learning of semantic role labels in adult-child dialogues. Semantic role labeling (SRL) models assign semantic roles to sentence constituents, thus indicating who has done what to whom (and in what way). The current corpus is derived from the Adam files in the Brown corpus (Brown 1973) of the CHILDES corpora, and augments the partial annotation described in Connor et al. (2010). It provides labels for both semantic arguments of verbs and semantic arguments of prepositions. The semantic role labels and senses of verbs follow Propbank guidelines Kingsbury and Palmer, 2002; Gildea and Palmer 2002; Palmer et al., 2005) and those for prepositions follow Srikumar and Roth (2011). The corpus was annotated by two annotators. Inter-annotator agreement is given separately for prepositions and verbs, and for adult speech and child speech. Overall, across child and adult samples, including verbs and prepositions, the kappa score for sense is 72.6, for the number of semantic-role-bearing arguments, the kappa score is 77.4, for identical semantic role labels on a given argument, the kappa score is 91.1, for the span of semantic role labels, and the kappa for agreement is 93.9. The sense and number of arguments was often open to multiple interpretations in child speech, due to the rapidly changing discourse and omission of constituents in production. Annotators used a discourse context window of ten sentences before and ten sentences after the target utterance to determine the annotation labels. The derived corpus is available for use in CHAT (MacWhinney, 2000) and XML format.

2016

pdf bib
Selective Annotation of Modal Readings: Delving into the Difficult Data
Lori Moon | Patricija Kirvaitis | Noreen Madden
Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning

Modal auxiliaries have different readings, depending on the context in which they occur (Kratzer, 1981). Several projects have attempted to classify uses of modal auxiliaries in corpora according to their reading using supervised machine learning techniques (e.g., Rubinstein et al., 2013, Ruppenhofer & Rehbein, 2012). In each study, traditional taxonomic labels, such as ‘epistemic’ and ‘deontic’ are used by human annotators to label instances of modal auxiliaries in a corpus. In order to achieve higher agreement among annotators, results in these previous studies are reported after collapsing some of the initial categories. The results show that human annotators have fairly good agreement on some of the categories, such as whether or not a use is epistemic, but poor agreement on others. They also show that annotators agree more on modals such as might than on modals such as could. In this study, we used traditional taxonomic categories on sentences containing modal auxiliary verbs that were randomly extracted from the English Gigaword 4th edition corpus (Parker et al., 2009). The lowest inner-annotator agreement using traditional taxonomic labels occurred with uses of could, with raw agreements of 42%−48% (κ = 0.196−0.259), compared to might, for instance, with raw agreement of 98%. In response to the low numbers, rather than collapsing traditional categories, we tried a new method of classifying uses of could with respect to where the reading situates the eventuality being described relative to the speech time. For example, the sentence ‘Jess could swim.’ is about a swimming eventuality in the past leading up to the time of speech, if it is read as being an ability. The sentence is about a swimming eventuality in the future, if it is read as being a statement about a possibility. The classification labels we propose are crucial in separating uses of could that have actuality inferences (Bhatt, 1999, Hacquard, 2006) from uses that do not. For the temporal location of the event described by a use of could, using four category labels, we achieved 73%−90% raw agreement (κ = 0.614−0.744). Sequence of tense contexts (Abusch, 1997) present a major factor in the difficulty of determining the temporal properties present in uses of could. Among three annotators, we achieved raw agreement scores of 89%−96%(κ =0.779−0.919%) on identification of sequence of tense contexts. We discuss the role of our findings with respect to textual entailment.