The 2nd Computational Linguistics Scientific Document Summarization Shared Task

Event Notification Type: 
Call for Abstracts
Abbreviated Title: 
CL SciSumm '16
Location: 
Rutgers University
Thursday, 23 June 2016
State: 
New Jersey
Country: 
USA
City: 
Newark
Contact: 
Muthu Kumar Chandrasekaran
Kokil Jaidka
Min-Yen Kan
Submission Deadline: 
Wednesday, 30 March 2016

You are invited to participate in the CL-SciSumm Shared Task at BIRNDL 2016. The shared task will be on automatic paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences ‘citances’). We also propose to group the citances by the facets of the text that they refer to.

This task follows up on the successful CL Pilot Task conducted as a part of the BiomedSumm Track at the Text Analysis Conference 2014 (TAC 2014). In this task, a training corpus of ten topics from CL research papers was released. Participants were invited to enter their systems in a task-based evaluation. Nine teams from four countries expressed an interest in participating in the shared task; three teams submitted system descriptions and findings. We also released the SciSumm14 manually annotated dataset, comprising of ACL Computational Linguistics research papers and summaries. It offers a community summary of a reference paper based on its collection of citing sentences “citances”. Furthermore, each citance is mapped to its referenced text in the reference paper and tagged with the information facet it represents. In our proposed shared task, we will extend this by releasing pairs of training and test datasets – each pair comprising the annotated citing sentences for a research paper, and the summaries of the research paper.

The CL-SciSumm 2016 corpus is expected to be of interest to a broad community including those working in computational linguistics and natural language processing, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification.

We have secured support for the costs of the shared task annotation from Microsoft Research Asia. The National University of Singapore will be primarily responsible for the task's oversight.

Tasks
Given: A topic consisting of a Reference Paper (RP) and up to ten Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP.

Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance. These are of the granularity of a sentence fragment, a full sentence, or several consecutive sentences (no more than 5).

Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets.

Evaluation: Task 1 will be scored by overlap of text spans in the system output vs the gold standard created by human annotators.

Registration
Organizations wishing to participate in the CL Shared Task track at BIRNDL 2016 are invited to register on EasyChair:

https://easychair.org/conferences/?conf=birndl2016

by 30 March 2016. Participants are advised to register as soon as possible in order to receive timely access to evaluation resources, including development and testing data. Registration for the task does not commit you to participation - but is helpful to know for planning. All participants who submit system runs are welcome to present their system at the BIRNDL Workshop.

Dissemination of CL-SciSumm work and results other than in the workshop proceedings is welcomed, but the conditions of participation specifically preclude any advertising claims based on these results. Any questions about conference participation may be sent to the organizers mentioned below.

Corpus
https://github.com/WING-NUS/scisumm-corpus

The corpus for the CL-SciSumm task has been created by randomly sampling ten documents from the ACL Anthology corpus and selecting their citing papers. The training, development and testing set will be made publicly available at the GitHub link above at dates specified below.

The training set of 10 articles is already available for download and can be used by participants to pilot their systems. The development set of 10 articles, a part of the same corpus, will be released in April, which participants can add to the training set to tune their system parameters. Finally the test set of 10 articles will be released in May. The system outputs from the test set should be submitted to the task organizers, for the collation of the final results to be presented at the workshop.

Call for Participation released - February 2016
Training Set Released - February 2016
Deadline for Registration and Short System Descriptions Due - 30 March 2016
Development Set Posted - 8 April 2016
Notification of Acceptance of Presentation Proposals - 22 April 2016
Test Set Released - 7 May 2016
System Runs and Preliminary System Reports Due in EasyChair - 20 May 2016
Camera-Ready Contributions Due in EasyChair - 3 June 2016
Participant Presentations at BIRNDL Workshop - 23 June 2016 in Newark, New Jersey, USA

Background
The CL-SciSumm task provides resources to encourage research in a promising direction of scientific paper summarization, which considers the set of citation sentences (i.e., "citances") that reference a specific paper as a (community created) summary of a topic or paper (Nanba, Kando and Okumura, 2011; Qazvinian and Radev, 2008). Citances for a reference paper are considered a synopses of its key points and also its key contributions and importance within an academic community. The advantage of using citances is that they are embedded with meta-commentary and offer a contextual, interpretative layer to the cited text. The drawback, however, is that though a collection of citances offers a view of the cited paper, it does not consider the context of the target user (Sparck Jones, 2007; Teufel and Moens, 2002; Nenkova and McKeown, 2011; Jaidka, Khoo and Na, 2013a), verify the claim of the citation or provide context from the reference paper, in terms the type of information cited or where it is in the referenced paper.

CL-SciSumm explores summarization of scientific research, for the computational linguistics research domain. An ideal summary of computational linguistics research papers would be able to summarize previous research by drawing comparisons and contrasts between their goals, methods and results, as well as distil the overall trends in the state of the art and their place in the larger academic discourse. Literature surveys and review articles in CL do help readers to gain a gist of the state-of-the-art in research for a topic. However, literature survey writing is labor-intensive and a literature survey is not always available for every topic of interest. What are needed, are resources which automate the synthesis and updating of automatic summaries of CL research papers.

Existing scientific summarization systems have automatically generated related work sections for a target paper by instantiating a hierarchical topic tree (Hoang and Kan, 2010), generating model citation sentences (Mohammad et al., 2009) or implementing a literature review framework (Jaidka et al., 2013). However, the limited availability of evaluation resources and human-created summaries constrains research in this area. The goal of the TAC 2014 CL Shared Task is to highlight the challenges and relevance of the scientific summarization problem, support research in automatic scientific document summarization and provide evaluation resources to push the current state of the art.

BIRNDL 2016 Workshop
http://wing.comp.nus.edu.sg/birndl-jcdl2016

The BIRNDL 2016 workshop will be held on 23 June 2016, in New Jersey, USA. The workshop is a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing techniques used, experiments run on the data, and other issues of interest to NLP researchers. TAC track participants who wish to give a presentation during the workshop will submit a short abstract describing the experiments they performed. As there is a limited amount of time for oral presentations, the abstracts will be used to determine which participants are asked to speak and which will present in a poster session.