First Call for Papers
Re-Data: First Workshop on Responsibly Enabling Data for Foundation Models (COLM 2026).
https://re-data-colm2026.github.io.
As foundation models scale, available training data sources have rapidly depleted. However, several forms of valuable data artifacts such as medical records, legal, and financial documents are restricted from use in model training due to their sensitive nature. In addition, the strong reasoning capabilities in current generative models have opened the possibility for highly personalizable AI applications but these remain bottlenecked by limited access to high quality user data. Hence, it is of immense value to responsibly unlock these data sources (for example: using data transformation or constrained training paradigms) or to generate synthetic alternatives. In this workshop, we aim to bring together domain experts in data, privacy, model training, and legal policy, to advance the frontier of responsibly leveraging such sensitive data with foundation models.
Topics of interest include (but are not limited to):
- Data Transformation: De-identification, Anonymization, Pseudonymization.
- Synthetic Data Generation: Controlled Regeneration, Data Diversity.
- Novel Training Paradigms: DP, Federated Learning, Architectural Solutions.
- Evaluation & Auditing: Privacy attack benchmarks, Utility-Privacy tradeoffs.
- Policy: Compliance, New regulations on data sharing.
Submissions are managed via OpenReview: submit here.
We invite long papers with novel research contributions (up to 8 pages long) as well as short papers (up to 4 pages) reflecting preliminary studies or negative results. Accepted papers are non-archival, and concurrent submissions are allowed. Please follow the COLM 2026 template.
Key Dates
Submissions open: May 27, 2026
Submission deadline: June 23, 2026
Acceptance Notifications: July 24, 2026
Workshops day at COLM: October 9, 2026
All deadlines are 23:59 AoE (anywhere on earth)
Confirmed Speakers
Sewon Min, UC Berkeley
Alex Dimakis, UC Berkeley, Bespoke Labs
Nouha Dziri, Cohere Labs
Niloofar Mireshghallah, humans&, CMU
Organizing Committee
Anil Ramakrishna, Meta
Zheng Xu, Meta
Gautam Kamath, University of Waterloo, Vector Institute, and NYU
Kamalika Chaudhuri, Google DeepMind
Om Thakkar, OpenAI
Natalia Ponomareva, Google
Aleksandra Korolova, Princeton
*if you wish to join the program committee, you can signup here.