Commonsense Reasoning for Natural Language Processing

Commonsense knowledge, such as knowing that “bumping into people annoys them” or “rain makes the road slippery”, helps humans navigate everyday situations seamlessly. Yet, endowing machines with such human-like commonsense reasoning capabilities has remained an elusive goal of artificial intelligence research for decades. In recent years, commonsense knowledge and reasoning have received renewed attention from the natural language processing (NLP) community, yielding exploratory studies in automated commonsense understanding. We organize this tutorial to provide researchers with the critical foundations and recent advances in commonsense representation and reasoning, in the hopes of casting a brighter light on this promising area of future research. In our tutorial, we will (1) outline the various types of commonsense (e.g., physical, social), and (2) discuss techniques to gather and represent commonsense knowledge, while highlighting the challenges specific to this type of knowledge (e.g., reporting bias). We will then (3) discuss the types of commonsense knowledge captured by modern NLP systems (e.g., large pretrained language models), and (4) present ways to measure systems’ commonsense reasoning abilities. We will finish with (5) a discussion of various ways in which commonsense reasoning can be used to improve performance on NLP tasks, exemplified by an (6) interactive session on integrating commonsense into a downstream task.


Introduction
Commonsense knowledge, such as knowing that "bumping into people annoys them" or "rain makes the road slippery", helps humans navigate everyday situations seamlessly (Apperly, 2010). Yet, endowing machines with such human-like commonsense reasoning capabilities has remained an elusive goal of artificial intelligence research for decades (Gunning, 2018).
Commonsense knowledge and reasoning have received renewed attention from the natural language processing (NLP) community in recent years, yielding multiple exploratory research directions into automated commonsense understanding. Recent efforts to acquire and represent common knowledge resulted in large knowledge graphs, acquired through extractive methods (Speer et al., 2017) or crowdsourcing (Sap et al., 2019a). Simultaneously, a large body of work in integrating reasoning capabilities into downstream tasks has emerged, allowing the development of smarter dialogue  and question answering agents .
Recent advances in large pretrained language models (e.g., Devlin et al., 2019;Liu et al., 2019b), however, have pushed machines closer to humanlike understanding capabilities, calling into question whether machines should directly model commonsense through symbolic integrations. But despite these impressive performance improvements in a variety of NLP tasks, it remains unclear whether these models are performing complex reasoning, or if they are merely learning complex surface correlation patterns (Davis and Marcus, 2015;Marcus, 2018). This difficulty in measuring the progress in commonsense reasoning using downstream tasks has yielded increased efforts at developing robust benchmarks for directly measuring commonsense capabilities in multiple settings, such as social interactions (Sap et al., 2019b;Rashkin et al., 2018a) and physical situations (Zellers et al., 2019;Talmor et al., 2019).
We hope that in the future, machines develop the kind of intelligence required to, for example, properly assist humans in everyday situations (e.g., a chatbot that anticipates the needs of an elderly person; Pollack, 2005). Current methods, however, are still not powerful or robust enough to be deployed in open-domain production settings, despite the clear improvements provided by largescale pretrained language models. This shortcoming is partially due to inadequacy in acquiring, understanding and reasoning about commonsense knowledge, topics which remain understudied by the larger NLP, AI, and Vision communities relative to its importance in building AI agents. We organize this tutorial to provide researchers with information about the critical foundations and recent advances in commonsense, in the hopes of casting a brighter light on this promising area of future research.
In our tutorial, we will (1) outline the various types of commonsense (e.g., physical, social), and (2) discuss techniques to gather and represent commonsense knowledge, while highlighting the challenges specific to this type of knowledge (e.g., reporting bias). We will also (3) discuss the types of commonsense knowledge captured by modern NLP systems (e.g., large pretrained language models), (4) review ways to incorporate commonsense knowledge into downstream task models, and (5) present various benchmarks used to measure systems' commonsense reasoning abilities.

Description
What is commonsense? The tutorial will start with a brief overview of what commonsense is, how it is defined in the literature, and how hu-mans acquire it (Moore, 2013;Baron-Cohen et al., 1985). We will discuss notions of social commonsense (Burke, 1969;Goldman, 2015) and physical commonsense (Hayes, 1978;McRae et al., 2005). We will cover the differences between taxonomic and inferential knowledge (Davis and Marcus, 2015;Pearl and Mackenzie, 2018), and differentiate commonsense knowledge from related concepts (e.g., script learning; Schank and Abelson, 1975;Chambers and Jurafsky, 2008).
How to represent commonsense? We will review existing methods for representing commonsense, most of which focus solely on English. At first, symbolic logic approaches were the main representation type (Forbus, 1989;Lenat, 1995). While still in use today (Davis, 2017;Gordon and Hobbs, 2017), computational advances have allowed for more data-driven knowledge collection and representation (e.g., automatic extraction; Etzioni et al., 2008;Zhang et al., 2016;Elazar et al., 2019). We will cover recent approaches that use natural language to represent commonsense (Speer et al., 2017;Sap et al., 2019a), and while noting the challenges that come with using datadriven methods (Gordon and Van Durme, 2013;Jastrzebski et al., 2018).
What do machines know? Pretrained language models (LMs) have recently been described as "rediscovering the NLP pipeline" (Tenney et al., 2019a), i.e. replacing previous dedicated components of the traditional NLP pipeline, starting from low-and mid-level syntactic and semantic tasks (POS tagging, parsing, verb agreement, e.g., Peters et al., 2018;Jawahar et al., 2019;Shwartz and Dagan, 2019, inter alia), to high-level semantic tasks such as named entity recognition, coreference resolution and semantic role labeling (Tenney et al., 2019b;Liu et al., 2019a). We will discuss recent investigations into pretrained LMs' ability to capture world knowledge (Petroni et al., 2019;Logan et al., 2019) and learn or reason about commonsense (Feldman et al., 2019).
How to incorporate commonsense knowledge into downstream models? Given that large number of NLP applications are designed to require commonsense reasoning, we will review efforts to integrate such knowledge into NLP tasks. Various works have looked at directly encoding commonsense knowledge from structured KBs as additional inputs to a neural network in generation (Guan et al., 2018), dialogue , QA (Mihaylov and Frank, 2018;Bauer et al., 2018;Lin et al., 2019;Weissenborn et al., 2017;Musa et al., 2019), and classification (Chen et al., 2018;Paul and Frank, 2019; tasks. For applications without available structured knowledge bases, researchers have relied on commonsense aggregated from corpus statistics pulled from unstructured text (Tandon et al., 2018;Lin et al., 2017;Li et al., 2018;Banerjee et al., 2019). More recently, rather than providing relevant commonsense as an additional input to neural networks, researchers have looked into indirectly encoding commonsense knowledge into the parameters of neural networks through pretraining on commonsense knowledge bases (Zhong et al., 2018) or explanations (Rajani et al., 2019), or by using multi-task objectives with commonsense relation prediction (Xia et al., 2019).
How to measure machines' ability of commonsense reasoning? We will explain that, despite their design, many natural language understanding (NLU) tasks hardly require machines to reason about commonsense (Lo Bue and Yates, 2011; Schwartz et al., 2017). This prompted efforts in creating benchmarks carefully designed to be impossible to solve without commonsense knowledge (Roemmele et al., 2011;Levesque, 2011).
In response, recent work has focused on using crowdsourcing and automatic filtering to design large-scale benchmarks while maintaining negative examples that are adversarial to machines (Zellers et al., 2018). We will review recent benchmarks that have emerged to assess whether machines have acquired physical (e.g., Talmor

Schedule
Talk 1 (15 min.) will introduce and motivate this tutorial and discuss long term vision for NLP commonsense research.
Talk 2 (20 min.) will focus on the question "Do pre-trained language models capture com-monsense knowledge?" and review recent work that studies what such models already capture due to their pre-training, what they can be fine-tuned to capture, and what types of knowledge are not captured.
Talk 3 (20 min.) will discuss ways of defining and representing commonsense, covering established symbolic methods and recent efforts for natural language representations.
Talk 4 (20 min.) will discuss neural and symbolic models of commonsense reasoning, focusing on models based on external knowledge integration for downstream tasks.
If time permits, we will end the first half with an interactive session and a preview to the second half.

Break (30 min.)
Talk 5 (20 min.) will continue the discussion on neural and symbolic models of commonsense knowledge representation, focusing on COMET , a language model trained on commonsense knowledge graphs. We will present its utility in a zero-shot model for a downstream commonsense question answering task.
Talk 6 (25 min.) will focus on temporal commonsense: how to represent it, how to incorporate it into downstream models, and how to test it.
Talk 7 (20 min.) will discuss ways to assess machine commonsense abilities, and challenges in developing benchmarks for such evaluations.
Concluding discussion (10 min.) will summarize the remaining challenges of commonsense research, and wrap up the tutorial.

Prerequisites
We will not expect attendees to be familiar with previous research on commonsense knowl-edge representation and reasoning, but participants should be familiar with: • Knowledge of machine learning and deep learning -recent neural network architectures (e.g., RNN, CNN, Transformers), as well as large pretrained language models models (e.g., BERT, GPT, GPT2).
• Familiarity with natural language processing tasks -understanding the basic problem to solve in tasks such as question answering (QA), natural language generation (NLG), textual entailment/natural language inference (NLI), etc.  Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. He was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR) and a program co-chair of AAAI, ACL and CoNLL. Dan has presented several tutorials in conferences including at ACL, on entity linking, temporal reasoning, transferable representation learning, and more.