Chinese BabyLM Challenge

Event Notification Type: 
Call for Participation
Abbreviated Title: 
ChineseBabyLM
Location: 
Co-located with NLPCC 2026 (15th CCF International Conference on Natural Language Processing and Chinese Computing)
Country: 
China
City: 
Macau
Contact: 
Hai Hu
Siyuan Song
Submission Deadline: 
Thursday, 11 June 2026

Chinese BabyLM is the first shared task dedicated to sample-efficient pretraining for Chinese, co-located with NLPCC 2026. The challenge is inspired by the BabyLM Challenge (first launched in 2023, now in its fourth year), which incentivizes research on pretraining language models under cognitively inspired data budgets.

Over the past several years, large language models have achieved remarkable success driven by scaling up model parameters and training data. However, this stands in stark contrast to human language acquisition: a typical child is exposed to fewer than 100 million words by age 13, yet achieves robust linguistic competence. Chinese presents unique challenges for data-efficient modeling due to its logographic writing system, lack of explicit word boundaries, rich morphological compounding, and flexible syntactic structures.

Event Website: https://chinese-babylm.github.io/
Registration: https://forms.gle/bBVZmoov72dyj6eF8

Tracks

  • NLU Track: Natural language understanding tasks under limited data
  • Cognitive Modeling Track: Cognitively inspired evaluation
  • HANZI Track: Character-level and logographic modeling

Training Data
~100M words across 6 corpus categories, available on HuggingFace at https://huggingface.co/datasets/chinese-babylm-org/babylm-zho-100M

Evaluation
Two-phase protocol (open evaluation → hidden test set), with leaderboard on HuggingFace.

Key Dates

  • March 20, 2026: Task announcement, registration opens
  • April 15, 2026: Data & guidelines release
  • April 22, 2026: Baseline models & leaderboard go live
  • May 25, 2026: Registration deadline
  • June 11, 2026: Model submission deadline
  • June 20, 2026: Final results due
  • June 30, 2026: Winners announced

Organizers
Hai Hu (CityU HK), Siyuan Song (UT Austin), Zhiheng Qian (SJTU), Shaonan Wang (PolyU HK), Yunhao Zhang (CASIA), Hong'ao Zhu (UCSD), Renfen Hu (BNU), Xiaozhe Ji (BNU), Rui Wang (SJTU), Luan Li (SJTU), Linyang He (Columbia), Yingxin Lin (THU)