SemEval-2014 Task 3 - Cross-Level Semantic Similarity

Event Notification Type: 
Call for Papers
Saturday, 23 August 2014 to Sunday, 24 August 2014
Contact Email: 
David Jurgens (
Taher Pilehvar (
Roberto Navigli (
Submission Deadline: 
Wednesday, 30 April 2014

SemEval 2014 - Task 3 Cross-Level Semantic Similarity

The aim of this task is to evaluate semantic similarity when comparing lexical items of different types, such as paragraphs, sentences, phrases, words, and senses.

Semantic similarity is an essential component of many applications in Natural Language Processing (NLP). This task provides an evaluation for semantic similarity across different types of text, which we refer to as lexical levels. Unlike prior SemEval tasks on textual similarity that have focused on comparing similar-sized texts, this task evaluates the case where larger text must be compared to smaller text, or even to senses. Specifically, this task encompasses four types of semantic similarity comparisons:
paragraph to sentence,
sentence to phrase,
phrase to word, and
word to sense.
Task 3 unifies multiple objectives from different areas of NLP under a single task, e.g., Paraphrasing, Summarization, and Compositional Semantics. One of the major motivations of this task is to produce systems that handle all comparison types, thereby freeing downstream NLP applications from needing to consider the type of text being compared.

Task participants will be provided with pairs of each comparison type and asked to rate how similar is the meaning of the smaller item to the overall meaning of the larger item. For example, given a sentence and a paragraph, a system would assess how similar is the meaning of the sentence to the meaning of the paragraph. Ideally, a high-similarity sentence would reflect overall meaning of the paragraph.

For word-to-sense comparisons, two evaluation settings are used: (1) out-of-context and (2) in-context. In the out-of-context setting, a sense is paired with a word in isolation. In the in-context setting, a sense is compared with the meaning of a usage appearing in some context. Task 3 uses the WordNet 3.1 sense inventory.

Teams are free to participate in one, some, or all comparison types. Given the unified setting of the task, we especially encourage systems that handle all comparison types. However, we also allow specialized systems that target only a single comparison type.

Interested teams are encouraged to join the task’s mailing list for discussion and announcements.

Systems will be evaluated against human similarity scores using both rank-based and score-based comparisons. See the task’s Evaluation page for further details.

The Task 3 trial data set is now available and contains tens of examples for each comparison type to use in building initial systems. The full training data will be released later in December. Please see the task’s Data page for further details.

Trial data ready October 31, 2013
Training data ready December 15, 2013
Evaluation period March 15-30, 2014
Paper submission due April 30, 2014 [TBC]
SemEval workshop August 23-24, 2014, co-located with COLING and *SEM in Dublin, Ireland.

The Semeval-2014 Task 3 website includes details on the training data, evaluation, and examples of the comparison types:

If interested in the task, please join our mailing list for updates:

David Jurgens (, Sapienza University of Rome, Italy
Mohammad Taher Pilehvar (, Sapienza University of Rome, Italy
Roberto Navigli (, Sapienza University of Rome, Italy