Final Call for Papers, Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ ACL-IJCNLP 2021)

Event Notification Type: 
Call for Papers
Abbreviated Title: 
CASE 2021
Location: 
Online
Thursday, 5 August 2021 to Friday, 6 August 2021
Contact: 
ahurriyetoglu@ku.edu.tr
Submission Deadline: 
Monday, 26 April 2021

Today, the unprecedented quantity of easily accessible data on social, political, and economic processes offers ground-breaking potential in guiding data-driven analysis in social and human sciences and in driving informed policy-making processes. The need for precise and high-quality information about a wide variety of events ranging from political violence, environmental catastrophes, and conflict, to international economic and health crises has rapidly escalated (Porta and Diani, 2015; Coleman et al. 2014). Governments, multilateral organizations, local and global NGOs, and social movements present an increasing demand for this data to prevent or resolve conflicts, provide relief for those that are afflicted, or improve the lives of and protect citizens in a variety of ways. For instance, Black Lives Matter protests[1] and conflict in Syria[2] events are only two examples where we must understand, analyze, and improve the real-life situations using such data.

Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods in defining event ontologies, creating language resources, and developing algorithmic approaches (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Social and political scientists have been working to create socio-political event databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML) and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Automation offers scholars not only the opportunity to improve existing practices, but also to vastly expand the scope of data that can be collected and studied, thus potentially opening up new research frontiers within the field of socio-political events, such as political violence & social movements. But automated approaches as well suffer from major issues like bias, generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019). Moreover, the results of the automated systems for socio-political event information collection may not be comparable to each other or not of sufficient quality (Wang et al. 2016; Schrodt 2020).

Socio-political events are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported. Therefore, all steps of information collection (event definition, language resources, and manual or algorithmic steps) may need to be constantly updated, leading to a series of challenging questions: Do events related to minority groups are represented well? Are new types of events covered? Are the event definitions and their operationalization comparable across systems (Hürriyetoğlu 2019, 2020a, 2020b)? This workshop aims to seek answers to these kind of questions, to inspire innovative technological and scientific solutions for tackling the aforementioned issues, and to quantify the quality of the automated event extraction systems. Moreover, the workshop will trigger a deeper understanding of the performance of the computational tools used and the usability of the resulting socio-political event datasets.

Call for Papers

We invite contributions from researchers in computer science, NLP, ML, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of socio-political event data. Social and political scientists will be interested in reporting and discussing their approaches and observe what the state-of-the-art text processing systems can achieve for their domain. Computational scholars will have the opportunity to illustrate the capacity of their approaches in this domain and benefit from being challenged by real-world use cases. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there is not a comparable effort for handling socio-political events. We hope to fill this gap and contribute to social and political sciences in a similar spirit. We invite work on all aspects of automated coding of socio-political events from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics

Extracting events in and beyond a sentence
Training data collection and annotation processes
Event coreference detection
Event-event relations, e.g., subevents, main events, causal relations
Event dataset evaluation in light of reliability and validity metrics
Defining, populating, and facilitating event schemas and ontologies
Automated tools and pipelines for event collection related tasks
Lexical, Syntactic, and pragmatic aspects of event information manifestation
Development and analysis of rule-based, ML, hybrid, and human-in-the-loop approaches for creating event datasets
COVID-19 related socio-political events
Applications of event databases
Online social movements
Bias and fairness of the sources and event datasets
Estimating what is missing in event datasets using internal and external information
Novel event detection
Release of new event datasets
Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets
Copyright issues on event dataset creation, dissemination, and sharing
Qualities of the event information on various online and offline platforms

Submissions

This call solicits full papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. Submissions should be between 4 and 8 pages in total, plus unlimited pages of references. Final versions of the papers will be given one additional page of content (up to 9 pages plus references) so that reviewers’ comments can be taken into account.

Authors are also invited to submit short papers not exceeding 4 pages (plus two additional pages for references). Short papers should describe:

a small, focused contribution;
work in progress;
a negative result;
a position paper.
a report on shared task participation.

Papers should be submitted on the START page of the workshop (https://www.softconf.com/acl2021/w22_case2021) in PDF format, in compliance with the ACL 2021 author guidelines provided on https://2021.aclweb.org/calls/papers .

The reviewing process will be double blind and papers should not include the authors’ names and affiliations. Each submission will be reviewed by at least three members of the program committee. If you do include any author names on the title page, your submission will be automatically rejected. In the body of your submission, you should eliminate all direct references to your own previous work.

Workshop Proceedings will be published on ACL Anthology.

Important Dates for Regular Papers

April 26, 2021: Workshop Paper Due Date

May 28, 2021: Notification of Acceptance

June 7, 2021: Camera-ready papers due

August 5-6, 2021: Workshop Dates

Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

Keynotes
Kristine Eck, Uppsala University

Machine Learning in Conflict Studies: Reflections on Ethics, Collaboration, and Ongoing Challenges

Advances in machine learning are nothing short of revolutionary in their potential to analyze massive amounts of data and in doing so, create new knowledge bases. But there is a responsibility in wielding the power to analyze these data since the public attributes a high degree of confidence to results which are based on big datasets.

In this keynote, I will first address our ethical imperative as scholars to “get it right.” This imperative relates not only to model precision but also to the quality of the underlying data, and to whether the models inadvertently reproduce or obscure political biases in the source material. In considering the ethical imperative to get it right, it is also important to define what is “right”: what is considered an acceptable threshold for classification success needs to be understood in light of the project’s objectives.

I then reflect on the different topics and data which are sourced in this field. Much of the existing research has focused on identifying conflict events (e.g. battles), but scholars are also increasingly turning to ML approaches to address other facets of the conflict environment.

Conflict event extraction has long been a challenge for the natural language processing (NLP) community because it requires sophisticated methods for defining event ontologies, creating language resources, and developing algorithmic approaches. NLP machine-learning tools are ill-adapted to the complex, often messy, and diverse data generated during conflicts. Relative to other types of NLP text corpora, conflicts tend to generate less textual data, and texts are generated non-systematically. Conflict-related texts are often lexically idiosyncratic and tend to be written differently across actors, periods, and conflicts. Event definition and adjudication present tough challenges in the context of conflict corpora.

Topics which rely on other types of data may be better-suited to NLP and machine learning methods. For example, Twitter and other social media data lend themselves well to studying hate speech, public opinion, social polarization, or discursive aspects of conflictual environments. Likewise, government-produced policy documents have typically been analyzed with historical, qualitative methods but their standardized formats and quantity suggest that ML methods can provide new traction. ML approaches may also allow scholars to exploit local sources and multi-language sources to a greater degree than has been possible.

Many challenges remain, and these are best addressed in collaborative projects which build on interdisciplinary expertise. Classification projects need to be anchored in the theoretical interests of scholars of political violence if the data they produce are to be put to analytical use. There are few ontologies for classification that adequately reflect conflict researchers’ interests, which highlights the need for conceptual as well as technical development.

***

Kristine Eck is an Associate Professor at the Department of Peace and Conflict Research at Uppsala University, where she serves as the Director of the Uppsala Rotary Peace Center. Her research interests concern coercion and resistance, including human rights, police misconduct, state surveillance, and conflict data production. She served as the Director of the Uppsala Conflict Data Program (UCDP) 2017-2018 and has been a Visiting Researcher at Oxford University, Copenhagen University, the University of Notre Dame, and Kobe University. Dr. Eck’s research has been funded by the Swedish Research Council, the Swedish Foundation for Humanities and Social Sciences, and the Norwegian Foreign Ministry.

Elizabeth Boschee, University of Southern California

Events on a Global Scale: Towards Language-Agnostic Event Extraction

Event extraction is a challenging and exciting task in the world of machine learning & natural language processing. The breadth of events of possible interest, the speed at which surrounding socio-political event contexts evolve, and the complexities involved in generating representative annotated data all contribute to this challenge. One particular dimension of difficulty is the intrinsically global nature of events: many downstream use cases for event extraction involve reporting not just in a few major languages but in a much broader context. The languages of interest for even a fixed task may still shift from day to day, e.g. when a disease emerges in an unexpected location.

Early approaches to multi-lingual event extraction (e.g. ACE) relied wholly on supervised data provided in each language of interest. Later approaches leveraged the success of machine translation to side-step the issue, simply translating foreign-language content to English and deploying English models on the result (often leaving some significant portion of the original content behind). Most recently, however, the community has begun to shown significant progress applying zero-shot transfer techniques to the problem, developing models using supervised English data but decoding in a foreign language without translation, typically using embedding spaces specifically designed to capture multi-lingual semantic content.

In this talk I will discuss multiple dimensions of these promising new approaches and the linguistic representations that underlie them. I will compare them with approaches based on machine translation (as well as with models trained using in-language training data, where available), and discuss their strengths and weaknesses in different contexts, including the amount of English/foreign bitext available and the nature of the target event ontology. I will also discuss possible future directions with an eye to improving the quality of event extraction no matter its source around the globe.

***

Elizabeth Boschee is the Director of the Boston office of the University of Southern California’s Information Sciences Institute and a Senior Supervising Computer Scientist in the Emerging Activities division. Her current efforts focus on cross-lingual information extraction, retrieval, and summarization, specifically targeting low or zero-resource settings, e.g. cross-lingual settings with <1M words of bitext or event extraction from non-English languages with only English training data. Prior to joining ISI, Boschee spent 17 years at BBN Technologies. As a Lead Scientist there, she was the chief architect of the BBN ACCENT event coder, the technology behind the W-ICEWS event data, which more than doubled the precision (while still increasing recall) of the previously deployed solution for CAMEO event coding.

[1] http://protestmap.raceandpolicing.com, accessed on September 28, 2020.
[2] https://www.cartercenter.org/peace/conflict_resolution/syria-conflict-re..., accessed on September 28, 2020.
[3] https://en.wikipedia.org/wiki/Protests_over_responses_to_the_COVID-19_pa..., accessed on September 28, 2020.