The Second Workshop on Evaluating Vector-Space Representations for NLP

Event Notification Type: 
Call for Papers
Abbreviated Title: 
RepEval 2017
Location: 
EMNLP 2017
Friday, 8 September 2017
State: 
Country: 
Denmark
City: 
Copenhagen
Contact: 
Sam Bowman
Yoav Goldberg
Felix Hill
Angeliki Lazaridou
Omer Levy
Roi Reichart
Anders Søgaard
Submission Deadline: 
Wednesday, 14 June 2017

CALL FOR PAPERS

==========================================================================================
===RepEval 2017: The Second Workshop on Evaluating Vector-Space Representations for NLP===
==========================================================================================

Mission Statement: To foster the development of new and improved ways of measuring the quality and understanding the properties of vector space representations in NLP.

Time & Location: Copenhagen, Denmark (EMNLP 2017 workshop).

Website: https://repeval2017.github.io/

===Motivation===

Models that learn real-valued vector representations of words, phrases, sentences, and even document are ubiquitous in today's NLP landscape. These representations are usually obtained by training a model on large amounts of unlabeled data, and then employed in NLP tasks and downstream applications. While such representations should ideally be evaluated according to their value in these applications, doing so is laborious, and it can be hard to rigorously isolate the effects of different representations for comparison. There is therefore a need for evaluation via simple and generalizable proxy tasks. To date, these proxy tasks have been mainly focused on lexical similarity and relatedness, and do not capture the full spectrum of interesting linguistic properties that are useful for downstream applications. This workshop challenges its participants to propose methods and/or design benchmarks for evaluating the next generation of vector space representations, for presentation and detailed discussion at the event.

===Submissions===

We encourage researchers at all levels of experience to consider contributing to the discussion at RepEval by making a short submission to either of two tracks:

=Shared Task=

Starting from this year, RepEval will feature a shared task for evaluating general-purpose sentence representations. This year’s task will be natural language inference (also known as recognizing textual entailment, or RTE) in the style of SNLI - a three-class balanced classification problem over sentence pairs. The shared task will feature a new, dedicated dataset that spans several genres of text. The shared task will feature two evaluations, a standard in-domain evaluation in which the training and test data are drawn from the same sources, and a cross-domain evaluation in which the training and test data differ substantially. This cross-domain evaluation will test the ability of submitted systems to learn representations of sentence meaning that capture broadly useful features.

More details available online: https://repeval2017.github.io/shared/

=Proposals=

A proposal submission should propose a novel method for evaluating representations. It does not have to construct an actual dataset, but it should describe a way (or several optional ways) of collecting one. Proposals are expected to provide roughly 5-10 examples in the manuscript as a proof of concept.

In addition, each proposal should explicitly mention:
* Which type of representation it evaluates (e.g. word, sentence, document)
* For which downstream application(s) it functions as a proxy
* Any linguistic/semantic/psychological properties it captures

Among other important points, proposals should take the following into consideration:
* If the task captures some linguistic phenomenon via annotators, what evidence is there that it is robustly observed in humans (e.g., inter-annotator agreement)?
* How easy would it be for other researchers to accurately reproduce the evaluation (not necessarily the dataset)?
* Will the dataset be cost-effective to produce?
* Is a specific family of models expected to perform particularly better (or worse) on the task? In other words, which types of models is this evaluation targeted at?
* How should the evaluation's results be interpreted?

We hope that one or more of these proposals will evolve into next year’s shared task (RepEval 2018).

=Submission Format=

Submissions to both tracks should be 2-4 pages of content in EMNLP format, with an unlimited amount of pages for references. For the proposal track, we encourage shorter content (2-3 pages), leaving more room for examples and their visualization.

===Important Dates===

=Shared Task=

By March 15: Training and development data available, draft data description paper available, competition begins
By May 1: Expert-tagged development data for error analysis available
June 1: Unlabeled test data available, evaluation period begins, Kaggle evaluation site opens
June 14 (GMT-11, 23:59:59): Evaluation period ends, system description papers and code packages due
June 16: Winners formally announced
July 3 (GMT-11, 23:59:59): Reviews due
July 6: Notification of presentation acceptance
July 21 (GMT-11, 23:59:59): Camera ready papers due
September 8: Workshop at EMNLP 2017, Copenhagen: shared task poster session and selected short talks

=Proposals=

June 14 (GMT-11, 23:59:59): Proposal papers due
July 3 (GMT-11, 23:59:59): Reviews due
July 6: Acceptance notification
July 21 (GMT-11, 23:59:59): Camera-ready papers due

===Organizers===

Sam Bowman, New York University
Yoav Goldberg, Bar-Ilan University
Felix Hill, Google DeepMind
Angeliki Lazaridou, University of Trento
Omer Levy, University of Washington
Roi Reichart, Technion - Israel Institute of Technology
Anders Søgaard, University of Copenhagen