Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks

Understating spatial semantics expressed in natural language can become highly complex in real-world applications. This includes applications of language grounding, navigation, visual question answering, and more generic human-machine interaction and dialogue systems. In many of such downstream tasks, explicit representation of spatial concepts and relationships can improve the capabilities of machine learning models in reasoning and deep language understanding. In this tutorial, we overview the cutting-edge research results and existing challenges related to spatial language understanding including semantic annotations, existing corpora, symbolic and sub-symbolic representations, qualitative spatial reasoning, spatial common sense, deep and structured learning models. We discuss the recent results on the above-mentioned applications –that need spatial language learning and reasoning – and highlight the research gaps and future directions.


Description
This tutorial provides an overview over the cutting edge research on spatial language understanding. However, we cover some background material from various perspectives given that ACL community has not paid enough attention, in the last two decades, to this topic. There are a few emerging research work very recently looking back into the importance of spatial language in various NLP tasks. One of the essential functions of natural language is to express spatial relationships between objects. Linguistic constructs can encode highly complex, relational structures of objects, spatial relations between them, and patterns of motion through space relative to some reference point. Spatial language understanding is useful in many research areas and real-world applications. This topic recently has attracted the attention of various sub-communities in the intersection of Natural Language, Computer Vision and Robotics. The complexity of spatial language understanding and its importance in downstream tasks that involve grounding the language in the physical world has become to some extent evident to the NLP research community. Compared to other semantically specialized linguistic tasks, standardizing tasks related to spatial language seems to be more challenging as it is harder to obtain an agreeable set of concepts and relationships together with a formal spatial meaning representation that is domain independent (Pustejovsky et al., 2011;Kordjamshidi et al., 2010;Mani, 2009;Pustejovsky, 2017;Dan et al., 2020). For example, compare this with recent work on temporal relations within Computational Linguistics. This has made research results on spatial language learning and reasoning diverse, task-specific and, to some extent, not comparable. While formal meaning representation is a general issue for language understanding, formalizing spatial concepts and building formal reasoning and machine learning models based on those constitute challenging research problems with a wealth of prior foundational work that can be exploited and linked to language understanding.
In this tutorial, we overview four themes: 1) Spatial Semantic Representation; 2) Spatial Information Extraction and; 3) Spatial qualitative representation and reasoning 4) Downstream applications of spatial semantic extraction and spatial reasoning including language grounding, robotics, navigation, dialogue systems and tasks that require combining vision and language.
The semantic representation section covers the works that have attempted to arrive at a common set of basic concepts and relationships (Bateman, 2010;Hois and Kutz, 2011), as well as making existing corpora interoperable (Pustejovsky et al., 2011;Mani and Pustejovsky, 2012;Kordjamshidi et al., 2017;Kordjamshidi, 2013). We discuss the existing qualitative and quantitative representation and reasoning models that can be used for investigation of interoperabiltiy of machine learning and reasoning over spatial semantics (Cohn et al., 1997). Spatial language meaning representation includes research related to cognitive and linguistically motivated spatial semantic representations, spatial knowledge representation and spatial ontologies, qualitative and quantitative representation models used for formal meaning representation, and various spatial annotation schema and efforts for creating specialized corpora. We discuss various datasets that either focus on spatial annotations or downstream tasks that need spatial language learning and reasoning. Particularly, natural language visual reasoning data (Suhr et al., 2017(Suhr et al., , 2018. Moreover, continuous meaning representations for spatial concepts is another aspect to be highlighted in the tutorial, e.g., Deruyt-tere et al.).
We overview the state-of-the-art for extraction of spatial information from language, both the abstract semantic extraction (Kordjamshidi et al., 2011;Kordjamshidi and Moens, 2015) and extraction that is driven by various target tasks and applications. We discuss machine learning models including structured output prediction models, deep learning architectures and probabilistic graphical models that have been used in the related work.
Finally, we overview the usage of spatial semantics by various downstream tasks and killer applications including language grounding, navigation, self-driving cars, robotics (Tellex et al., 2011;Kollar et al., 2010), dialogue systems (Kelleher and Kruijff, 2006) and human machine interaction, and geographical information systems and knowledge graphs (Stock et al., 2013;Mai et al., 2020). Spatial semantics is very closely connected and relevant to visualization of natural language and grounding language into perception, central to dealing with configurations in the physical world and motivating a combination of vision and language for richer spatial understanding. The related tasks include: text-to-scene conversion; image captioning; spatial and visual question answering; and spatial understanding in multimodal settings (Rahgooy et al., 2018) for robotics and navigation tasks and language grounding (Thomason et al., 2018).
The current research using end-to-end monolithic deep models fail to solve complex tasks that need deep language understanding and reasoning capabilities (Hudson and Manning, 2019). Throughout this proposal, we will highlight the importance of combining learning and reasoning for spatial language understanding and its influence on the semantic representation and type of the learning models as well as the performance on various applications. Regarding the question of reasoning, we (a) point out the role of qualitative and quantitative formal representations in helping spatial reasoning based on natural language and the possibility of learning such representations from data to support compositionality and inference (Hudson and Manning, 2018;Hu et al., 2017); and (b) examine how continuous representations contribute to supporting reasoning and alternative hypothesis formation in learning (Krishnaswamy et al., 2019). We point to the cutting edge research that shows the influence of explicit representation of spatial entities and concepts (Hu et al., 2019;Liu et al., 2019).
The main goal of this tutorial is to combine these current related efforts from different communities and application domains into one unified treatment, to identify the challenges, problems and future directions for spatial language understanding.

Outline
The tutorial will cover the following syllabus: • Spatial Representations -Linguistic corpora and semantic annotations -Spatial knowledge representation and spatial calculi models -Distributed representations • Spatial Information Extraction -Spatial entity and relation extraction -Spatial ontology population -Considering domain knowledge and pragmatics in spatial extractions • Spatial Semantic Grounding -Combining vision and language (symbolic and multimodal embeddings) -Capturing spatial common sense -Grounding language in 2D and 3D physical worlds -Generating referring expressions • Spatial Reasoning -Overview on natural language and visual reasoning tasks and data -Modeling compositionality and spatial reasoning in (Deep) learning models • Downstream tasks -Spatial concepts in dialogue systems -Spatial reasoning for QA and VQA -HRI, navigation and way-finding instructions -Corpus-based GIS systems

Prerequisites and reading list
Familiarity with machine learning and natural language processing will be helpful for this tutorial. Our selected reading list is as follows.

Acknowledgements
This project is supported by National Science Foundation (NSF) CAREER award #1845771.