SIGMORPHON 2021 Shared Task 0 - Generalization Across Typologically Diverse Languages and Cognitively Plausible Morphological Inflection

Event Notification Type: 
Call for Participation
Abbreviated Title: 
SIGMORPHON 2021 Shared Task 0
Thursday, 5 August 2021 to Friday, 6 August 2021
Country: 
Thailand
City: 
Bangkok
Contact: 
Tiago Pimentel
Brian Leonard
Maria Ryskina
Sabrina Mielke
Coleman Haley
Eleanor Chodroff
Ryan Cotterell
Ekaterina Vylomova
Ben Ambridge
Submission Deadline: 
Tuesday, 4 May 2021

SIGMORPHON’s sixth installment of its inflection generation shared task will be divided into two parts:

  • Part 1: Generalization Across Typologically Diverse Languages
  • Part 2: Are We There Yet? A Shared Task on Cognitively Plausible Morphological Inflection

Please join our Google Group to stay up to date.

Click here to register for the task!

The shared task will be part of the SIGMORPHON workshop, co-located with ACL-IJCNLP 2021 in Bangkok, Thailand, on either August 5 or 6, 2021.

----------------------------------------------------------------------------------------------------------------------------------------

Part 1: Generalization Across Typologically Diverse Languages

Summary

For the first part of the shared task, participants will design a model that learns to generate morphological inflections from a lemma and a set of morphosyntactic features of the target form. Each language has its own training, development, and test splits. Training and development splits contain triples, each consisting of a lemma, a target form, and a set of morphological features, provided in the UniMorph format. Test splits only provide lemmas and morphological tags: the participants' models will need to predict the missing target form.

The model should be general enough to work for natural languages of any typological patterning. For example, Tagalog verbs exhibit circumfixation; thus, a model with a strong inductive bias towards suffixing will likely not work well for Tagalog.

As part of the task, we will release data for 50 new languages annotated in the Unimorph schema. The data for the 35 development languages are already available on the shared task website. These include a number of languages indigenous to Russia, such as Itelmen and Chukchi, as well as many languages from the Americas, such as Aymara and Seneca.

Timeline

Stage 1: Development Phase

  • February 28, 2021: Training and development splits for development languages released; we invite participants to report errors.
  • February 28, 2021: Neural and non-neural baselines for development languages released.
  • March 7, 2021: Development language data are frozen.

Stage 2: Generalization Phase

  • April 20, 2021: Training and development splits for surprise languages released. (This is not a zero-shot learning task. Participants will be given training data for all languages.)

Stage 3: Evaluation Phase

  • April 27, 2021: Test splits for all languages (both development and surprise) released.
  • May 4, 2021: Participants submit test predictions on all languages.

Stage 4: Write-up Phase

  • June 1, 2021: Participants’ system description papers due.
  • June 7, 2021: Participants’ system description papers camera ready due.

----------------------------------------------------------------------------------------------------------------------------------------

Part 2: Are We There Yet? A Shared Task on Cognitively Plausible Morphological Inflection

Summary
An open question in the use of neural networks for the study of language is to what degree they resemble humans in how they generate language. In the realm of morphology, this question goes back 40 years to the infamous past-tense debate of the 1980s where one camp argued humans use rule-based mechanisms and another argued that humans inflect words with a process closer to neural networks.

This shared task adopts the experimental paradigm introduced by Albright and Hayes (2003). We have created a large number of new nonce words in four languages: English, German, Portuguese and Russian. To the best of our knowledge, this will be the largest and most multilingual collection of nonce words in existence. The goal of the participants in the shared task is to design a model that morphologically inflects the nonce words according to the grammar of the given languages. As an example, consider the following nonce verbs that obey English phonotactics:

  • blad /blæd/
  • crast /kɹæst/
  • flink /flɪŋk/
  • pide /paɪd/
  • sprake /spɹeɪk/

In many cases, there is arguably more than one “correct” way to inflect these verbs according to English grammar because they are unattested. For instance, who is to say that the past tense of “fink” should be “finked” and not “fank”. For that reason, we have elicited human judgements (on Amazon’s Mechanical Turk) that tell native speakers’ preferences towards specific past tense inflections. The candidate set of potential inflections was generated through a linguist-in-the-loop procedure that made use of the state-of-the-art neural inflector from Wu et al. (2021).

The training data are attested inflections in English, German, Portuguese and Russian and are already available on the shared task website.

Timeline

  • February 25, 2021: Training data for English, German, Portuguese and Russian are released. In contrast to previous year’s shared tasks, the data are in IPA. We invite participants to report errors.
  • March 8, 2021: Neural and non-neural baselines for development languages released.
  • May 1, 2021: Development data for nonce inflections are released. (This includes human judgements.)
  • May 23, 2021: Test data for the nonce inflections are released. (This includes human judgements.)
  • June 1, 2021: Users submit their system output.
  • June 7, 2021: Users submit their system description paper.

----------------------------------------------------------------------------------------------------------------------------------------

Please find additional information at the following links: