FLORES 101 Large-scale Multilingual Translation Task @WMT + compute grants

Event Notification Type: 
Call for Participation
Abbreviated Title: 
Location: 
in conjunction with EMNLP 2021
Wednesday, 10 November 2021 to Thursday, 11 November 2021
State: 
Country: 
City: 
Contact: 
Submission Deadline: 
Tuesday, 31 August 2021

FLORES 101 Large-scale Multilingual Translation Task @WMT + compute grants

As part of our effort to support research on low resource machine translation we have a few announcements:

We will be releasing Flores 101, a large evaluation benchmark for multilingual machine translation in over 100 languages;
we are organizing a multilingual machine translation task at WMT; and

we are supporting WMT participants to the large-scale multilingual task with compute credits that we encourage people to apply for.

CONTEXT

Translation is a key technology to connect people and ideas together across language barriers. However, current translation technology works very well mostly in a few languages, and it covers only a few domains. Many people around the world still lack access, due in part to the lack of compute and data resources to create translation models.

A prerequisite for developing new modeling techniques is having reliable evaluation. As a baby step in this direction back in 2019, we started FLORES, which came with two evaluation datasets for Nepali-English and Sinhala-English, that we later expanded to include Pashto and Khmer.

FLORES 101
We're announcing the FLORES101 evaluation benchmark: a full Many-to-Many evaluation dataset across over 100 languages, most of which are low-resource. True to the original multi-domain spirit of FLORES, this dataset consists of 3000 English sentences across several domains (news, books, and travel) all taken from Wikipedia, maintaining document-level context as well as document metadata, such as topics, hyperlinks, etc. These sentences are then professionally translated and undergo several rounds of thorough evaluation. You can see the full list of languages at the bottom of the page [here].

We are making the entire dev and devtest splits of FLORES101 available to the research community (2000 sentences total, aligned Many to Many), on June 4th, 2021, along with a tech report describing the dataset in detail.

SHARED TASK
We want to continue encouraging the research community to work on low-resource translation. As part of this, we are launching a WMT multilingual machine translation track as part of WMT 2021. The evaluation campaign will start on n June 4th, 2021.
To ensure robust and fair evaluation, we’ll keep the test split blind and not publicly accessible. Instead, we’ll host an evaluation server based on open-source code, which will enable us to track the progress of the community in low-resource languages.

Importantly, such a setup will enable comparison of models on several axes besides translation quality, such as compute and memory efficiency. The evaluation server will also be available starting June 2021.

We propose two small tracks --- one for low-resource European languages and another one for low-resource Southeast Asian languages --- along with the full track of 100+ languages.

See the task page for more details.
Important dates
Release of training data: April 2021
Release of dev and dev-test data: June 2021 (prior to that we encourage participants to use a portion of the training set for validation purposes)
Evaluation server opening: June 4, 2021
Evaluation on final test set: August 9-13, 2021
Notification of results: August 15, 2021
Draft of system papers: August 31, 2021
Reviews due: September 6, 2021
Camera-ready version of system papers: September 15, 2021
WMT Conference: November 10-11, 2021

COMPUTE GRANTS

Finally, we encourage people to apply for compute grants so that GPU compute is less of a barrier for translation research. You can see more detailed information and apply for the compute grant [here]. Applications are now open. Deadline to apply is May 10, 2021 anywhere on Earth.

If you want to learn more, please read below for more details, and feel free to reach flores [at] fb.com out if you have any questions.