Discourse Analysis and Its Applications

Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many downstream applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, machine translation, essay scoring, sentiment analysis, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis – monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. We also give an overview of linguistic structures and corresponding discourse analysis tasks that discourse researchers are generally interested in, as well as key applications on which these discourse structures have an impact.


Motivation
Discourse analysis has been a fundamental problem in the ACL community, where the focus is to develop tools to automatically model language phenomena that go beyond the individual sentences. With the ongoing neural revolution, as the methods become more effective and flexible, analysis and interpretability beyond the sentence-level is of particular interests for many core language processing tasks like language modeling (Ji et al., 2016) and applications such as machine translation and its evaluation (Sennrich, 2018;Läubli et al., 2018;Joty et al., 2017), text categorization (Ji and Smith, 2017), and sentiment analysis (Nejat et al., 2017). With the advent of Internet technologies, new forms of discourse are emerging (e.g., emails and discussion forums) with novel set of challenges for the computational models.
Furthermore, most computational models for discourse analysis are also going through a paradigm shift from traditional statistical models to deep neural models. Considering all these novel aspects at once, this tutorial is quite timely for the community, by providing the attendees with an up-to-date, critical overview of existing approaches and their evaluations, applications, and future challenges.

Tutorial Outline
We start with an overview of basic concepts in discourse analysis -monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. Attendees then get to learn about coherence structure and discourse parsers. We give a critical overview of different discourse theories, and available datasets annotated according to these formalisms. We cover methods for RST-and PDTBstyle discourse parsing. We cover traditional methods along with the most recent works using deep neural networks, interpret them and compare their performances on benchmark datasets.
Next, we discuss coherence models to evaluate monologues and conversations based on their coherence. We then show applications (evaluation tasks) of coherence models and discourse parsers. Special attention is paid to the new emerging applications of discourse analysis such as machine translation and its evaluation, sentiment analysis, and abstractive summarization.
In the final part of the tutorial, we cover conversational structures (e.g., speech acts, thread structure), computational methods to extract such structures, and their utility in downstream applications (e.g., conversation summarization). Again, evaluation metrics and approaches will be discussed and compared. We conclude with an interactive discussion of future challenges for discourse anal-ysis and its applications. In the following, we give a detailed breakdown of the tutorial content.

New emerging applications
Link to the Slides Our tutorial slides will be made available at https://ntunlpsg. github.io/project/acl19tutorial/

Prerequisites
Prior knowledge in basic machine learning, NLP (e.g., parsing methods, machine translation), and deep learning models is essential to understand the content of this tutorial.

Similar Tutorial
We gave a similar tutorial (shorter version) at the 2018 IEEE International Conference on Data Mining (ICDM-2018), a top conference in data mining.

Instructors
Dr. Shafiq Joty 1 is an Assistant Professor at the School of Computer Science and Engineering, NTU. He is also a senior research manager at the Salesforce AI Research lab. He holds a PhD in Computer Science from the University of British Columbia. His work has primarily focused on developing discourse analysis tools (e.g., discourse parser, coherence model, topic model, dialogue act recognizer), and exploiting these tools effectively in downstream applications like machine translation, summarization, and sentiment analysis. Apart from discourse and its applications, he has also developed novel machine learning models for question answering, machine translation, image/video captioning, visual question answering, and opinion analysis. His work has appeared in major journals and conferences such as CL, JAIR, CSL, ACL, EMNLP, NAACL, IJCAI, CVPR, ECCV, and ICWSM. He served as an area chair for ACL-2019 (QA track) and EMNLP-2019 (Discourse track) and a senior program committee member for IJCAI 2019. Shafiq is a recipient of NSERC CGS-D scholarship and Microsoft Research Excellent Intern award.
Dr. Giuseppe Carenini 2 is a Professor in Computer Science at UBC. Giuseppe has broad interdisciplinary interests. His work on NLP and information visualization to support decision making has been published in over 100 peer-reviewed papers (including best paper at UMAP-14 and ACM-TiiS-14). He was the area chair for ACL'09 "Sentiment Analysis, Opinion Mining, and Text Classification" , NAACL'12 and EMNLP'19 for "Summarization and Generation", ACL'19 for Discourse; the Program Co-Chair for IUI 2015, and the Program Co-Chair for SigDial 2016. He has also co-edited an ACM-TIST Special Issue on "Intelligent Visual Interfaces for Text Analysis". In 2011, he published a co-authored book on "Meth-ods for Mining and Summarizing Text Conversations". He has also extensively collaborated with industrial partners, including Microsoft and IBM. He was awarded a Google Research Award, an IBM CASCON Best Exhibit Award, and a Yahoo Faculty Research Award in 2007, 2010 and 2016 respectively.
Dr. Raymond T. Ng 3 is a Professor in Computer Science and the Director of the Data Science Institute at UBC. His main research area for the past two decades is on data mining, with a specific focus on health informatics and text mining. He has published over 180 peer-reviewed publications on data clustering, outlier detection, OLAP processing, health informatics and text mining. He is the recipient of two best paper awards from the 2001 ACM SIGKDD conference, the premier data mining conference in the world, and the 2005 ACM SIGMOD conference, one of the top database conferences worldwide. For the past decade, he has co-led several large-scale genomic projects funded by Genome Canada, Genome BC and industrial collaborators. Since the inception of the PROOF Centre of Excellence, which focuses on biomarker development for end-stage organ failures, he has held the position of the Chief Informatics Officer of the Centre. From 2009 to 2014, he was the associate director of the NSERCfunded strategic network on business intelligence. Since 2016, he has been the holder of the Canadian Research Chair on Data Science and Analytics.
Dr. Gabriel Murray 4 is an Associate Professor in Computer Information Systems at the University of the Fraser Valley (UFV). His background is in computational linguistics and multimodal speech and language processing. He holds a PhD in Informatics from the University of Edinburgh, completed under the supervision of Drs. Steve Renals and Johanna Moore. His research has focused on various aspects of multimodal conversational data, including automatic summarization and sentiment detection for group discussions. Recent research also focuses on predicting group performance and participant affect in conversational data. In 2011, Dr. Murray co-authored the book "Methods for Mining and Summarizing Text Conversations".