The 2nd BabyLM Workshop

Event Notification Type: 
Call for Papers
Abbreviated Title: 
BabyLM 2026
Location: 
EMNLP 2026
Wednesday, 28 October 2026
Country: 
Hungary
City: 
Budapest
Contact: 
Leshem Choshen
Aaron Mueller
Submission Deadline: 
Wednesday, 15 July 2026

The goals of BabyLM are to bring together multiple disciplines to answer an enduring question: how can a computational system learn language from limited inputs? Cognitive scientists investigate this question by trying to understand how humans learn their native language during childhood. Computer scientists tackle this question by attempting to build efficient machine-learning systems to accomplish this task. BabyLM aims to bridge these communities, encouraging the integration insights from cognitive science into the design of more sample-efficient language models, while also using advances in language modeling architectures to generate new hypotheses and experimental paradigms for cognitive science.

The 2nd BabyLM Workshop will be co-located with EMNLP 2026 in Budapest, Hungary. We will accept two types of submission: challenge submissions and workshop submissions.

This year, the theme of the workshop is *Going beyond English*. Previous iterations of BabyLM have focused primarily on English; with the introduction of the new Multilingual track (see BabyLM Challenge below), we aim to inspire submissions for other languages. We hope the BabyBabelLM dataset can be a starting point for this, but also encourage submissions that introduce new resources that will foster progress on data-efficient modeling across diverse languages.

=== Workshop Topics ===
We invite submissions on topics including but not limited to the following:
* Data-efficient architectures and training techniques.
* Data curation for efficient training.
* Cognitively and linguistically inspired language modeling and evaluation.
* Small models (and scale comparisons).
* Relevant aspects of multimodality.
* Interaction with or feedback from teacher models during training.
* Second language acquisition, bilingualism or multilingualism.

The call for papers may be found here: https://arxiv.org/abs/2602.20092v2

=== BabyLM Challenge ===
The BabyLM Challenge (now in its fourth iteration) challenges participants to train language models on human-sized training corpora, up to 100 million words. This year’s iteration will remain largely the same as in previous iterations, except:
* We are debuting a new multilingual track, in which participants are tasked with training models on a trilingual split of the BabyBabelLM dataset.
* We continue to offer the strict and strict-small tracks.
* We have folded last year’s multimodal and interactive tracks into these tracks.

=== Key Dates ===
We will accept submissions through ACL Rolling Review (ARR) or directly to the workshop via OpenReview. Paper submissions to the workshop can ignore competition entry deadlines. Our tentative timeline (subject to ARR and conference deadlines, to be released) are as follows:

* February: Call for papers and training data released
* Early April: evaluation pipeline and baselines released
* May 25: ARR submission deadline
* Mid-July: Direct submissions deadline
* Early August: Direct submission reviews due; ARR commitment deadline
* Mid-August: Decisions released
* Early September: Camera-ready due
* 24-29 October: Workshop @ EMNLP in Budapest (exact date TBA)

=== Contact ===
If you have any questions, please join the BabyLM participants’ slack. Please see the link on the BabyLM website, or join at the following link: https://join.slack.com/t/babylmchallenge/shared_invite/zt-3r0lfjm6d-9bZZ...