Call For Participation -- NADI Shared Task 2026

Event Notification Type: 
Call for Papers
Abbreviated Title: 
Location: 
State: 
Country: 
Contact Email: 
City: 
Contact: 

Dear All,
This year the Nuanced Arabic Dialect Identification (NADI) shared task is covering a wide range of dialectal Arabic speech processing tasks including new code-switched automatic speech recognition (ASR), text-to-speech, spoken language translation, and spoken language understanding tasks. Today we are pleased to announce the release of the NADI 2026 training data, evaluation scripts, and baselines, and kicking off the main system development phase of the shared task.

To register please submit this form: https://docs.google.com/forms/d/e/1FAIpQLSc1aPjBgKV5lr_dTCfwd4gBsTc3mC1_....
To download the data see the resources section of our website: https://nadi.dlnlp.ai/2026/#resources.
The deadline to register is July 20th 2026.

The following is an overview of the tasks and resources.

1.1 Noisy Country-Level ASR: Degraded audio quality has noticeable impact on certain Arabic phonemes. This task aims to tackle the challenge of difficult noisy data.

1.2 Mixed dialect ASR: Oftentime it is not known what dialect(s) may be present in a recording. This task asks participants to tackle this challenge in creative ways. While no training data is provided for this subtask, this is an open track, and participants are allowed to train on their own data, or use the data from 1.1. The development data consists of 3k utterances of various dialects.

1.3 Code-switched ASR: Code-switching remains a major struggle for DA ASR. This task focuses on code-switching between Tunisian, English, and French with Tunisian matrix language. 38 hours of training data is provided alongside a 1 hour development set. Participants also have access to 1 hour of test data from the original release, which can be used as part of the validation process, with new blind test data to be released for the final evaluation.

For all three of these ASR subtasks our baseline is the well established Whisper-Large v3 as a zero-shot model.

2. Spoken Dialect ID: This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of these models. For our baseline we provide a training script to finetune a pretrained ECAPA-TDNN language ID system on a 200hr subset of the ADI-20 dataset. Training is unrestricted, and participants are free to train on the full ADI-17/20 datasets. Because this is a blind out-of-domain evaluation, we encourage participants to consider evaluating their models on selected data from other domains such as radio, read speech, conversational telephone etc.

3. Text-To-Speech: Dialectal TTS systems are an emerging area of technology, and this year we are focusing on the fundamental audio quality of these TTS models. We provide example training scripts to get participants started on this task by training a baseline OmniVoice model, as well as evaluation scripts to measure the performance of their trained systems.

4. Spoken Language Translation: This task covers eight dialects translated with a target language of English. Although no development set is provided, participants can reserve part of the train set to perform hyperparameter finetuning and model selection. We use a Whisper Large-v3 translation baseline for comparison.

5.1 Intent Recognition: The first of our two spoken language understanding tasks is to classify utterances based on a set of 23 intent labels characteristic of common voice assistant queries. including two blind labels present in only the development set. We provide example code to finetune a Whisper small model for classification.

5.2 Slot filling: The second of our two spoken language understanding tasks involves identifying the boundaries for a specific intent alongside the correct categorization of the segment. We provide example code to adapt a Whisper small model for this task.

For both 5.1 and 5.2 we will be releasing new blind test sets as part of the final evaluation.

Looking forward to your participation!
NADI 2026 Organizers