SemEval 2017 Task 1: Semantic Textual Similarity (STS)

Event Notification Type: 
Call for Participation
Abbreviated Title: 
Eneko Agirre
Daniel Cer
Mona Diab
Lucia Specia
Submission Deadline: 
Saturday, 30 January 2016

Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. While making such an assessment is trivial for humans, constructing algorithms and computational models that mimic human level performance represents a difficult and deep natural language understanding problem. The 2017 STS shared task involves multilingual and cross-lingual evaluation of Arabic, Spanish and English data as well as a surprise language track to explore methods for cross-lingual transfer.

STS evaluations have seen significant progress in methods targeted at a specific language such as English or Spanish. For the 2017 shared task, the emphasis is on building multilingual textual similarity models that are capable of assessing both same language and cross-lingual sentence pairs. The primary evaluation for the shared task assesses methods over a combination of same language text snippet pairs in Arabic, English and Spanish as well as cross-lingual Arabic-English and Spanish-English pairs.

To encourage the development of methods that can be readily applied or adapted to new languages, we also provide an optional evaluation track with a surprise language that will only be announced at the beginning of the evaluation period. This optional track provides an opportunity to explore STS models capable of rapid cross-lingual transfer learning via mechanisms such as multilingual embeddings.

In addition to the multilingual primary evaluation and the surprise language track, a number of language and language pair specific tracks are also provided. We hope that these tracks will give participants with particular linguistic expertise a chance to excel as well as provide an opportunity to compare performance differences between multilingual and language specific methods.

Task Definition

Given two sentences, participants are asked to produce a continuous valued similarity score on a scale from 0 to 5, with 0 indicating that the semantics of the sentences are completely independent and 5 signifying semantic equivalence. Performance is assessed by computing the Pearson correlation between machine assigned semantic similarity scores and human judgments.

Following the emphasis on building multilingual and cross-lingual models, the 2017 shared task is organized into the following seven multilingual and cross-lingual tracks:

Track 0 - Primary: Combined evaluation of all announced monolingual and cross-lingual language pairings explored by the 2017 task: ar-ar, ar-en, es-es, es-en, and en-en. The primary track will not include the surprise language evaluation data.

Track 1 - Arabic-Arabic: Evaluation only on ar-ar pairs.

Track 2 - Arabic-English: Evaluation only on ar-en pairs.

Track 3 - Spanish-Spanish: Evaluation only on es-es pairs

Track 4 - Spanish-English: Evaluation only on es-en pairs.

Track 5 - English-English: Evaluation only on en-en pairs.

Track 6 - Surprise language track (announced during the evaluation period)

For all language pairings, participants will be provided with two sentence length snippets of text, s1 and s2. The two snippets will then be used to compute and return a continuous valued semantic similarity score.

The cross-lingual language pairings (ar-en, es-en) only differ from the monolingual language pairings (ar-ar, es-es, en-en) in that the two text snippets in each pair are written in different languages. The inclusion of cross-lingual STS pairs follows a successful pilot in 2016 that paired English and Spanish sentences. Depending on the approach being used to compute the similarity scores, this may present different degrees of difficulty in adapting the underlying model to handle the cross-lingual pairs.

Participants are encouraged to review the successful approaches to monolingual and cross-lingual STS from prior years of the STS shared task (Agirre et al. 2016; Agirre et al. 2015; Agirre et al. 2014; Agirre et al. 2013; Agirre et al. 2012)

2017 Data

This year's shared task includes one evaluation set for each of the seven tracks described above. Each evaluation set consists of between 200 to 250 sentence pairs. Within each evaluation set, we will attempt to approximately balance the distribution of STS scores.

For training data, participants are encouraged to make use of all existing English, Spanish and cross-lingual English-Spanish data sets from prior STS evaluations. This includes all previously released trial, training and evaluation data.

Since this is the first year that we will include Arabic as part of an STS evaluation, we will release training data for both monolingual Arabic and cross-lingual Arabic-English. Each training set will consist of thousands of pairs sourced from prior English STS evaluations.

As with the 2016 evaluation, participants are allowed and very much encouraged to train purely unsupervised models and model components on arbitrary data (e.g., unsupervised word embeddings).



To register, please complete the following form:

[Website and trial data]

For more details, including trial data, see the STS SemEval 2017 Task 1 webpage at:

[Mailing List]

Join the mailing list for task updates and discussion at:

Important dates

Trail data ready: Wed 21 Sep 2016
Arabic Training data ready: Tues 01 Nov 2016
Evaluation start: Mon 09 Jan 2017
Evaluation end: Mon 30 Jan 2017
Results posted: Mon 06 Feb 2017
Paper submissions due: Mon 27 Feb 2017
Author notifications: Mon 03 Apr 2017
Camera ready submissions due: Mon 17 Apr 2017
SemEval workshop: Summer 2017

Organizers (alpha. order)

Eneko Agirre, Daniel Cer, Mona Diab, Lucia Specia


Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, Janyce Wiebe. SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of SemEval 2016.

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria and Janyce Wiebe. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of SemEval 2015.

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau and Janyce Wiebe. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of SemEval 2014.

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre and WeiWei Guo. *SEM 2013 shared task: Semantic Textual Similarity. Proceedings of *SEM 2013.

Eneko Agirre, Daniel Cer, Mona Diab and Aitor Gonzalez-Agirre. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. Proceedings of SemEval 2012.