WSDM Cup 2026 - Multilingual Retrieval
CALL FOR PARTICIPATION
Retrieval-Augmented Generation (RAG) systems provide an opportunity to expand the scope of available information to users, since they are able to retrieve and synthesize information from documents in languages that the user does not necessarily understand. The ability to retrieve documents only based on their relevance, regardless of language, is crucial for modern retrieval models to support better coverage of perspectives from different parts of the world. Thus, WSDM Cup 2026 features a multilingual retrieval task.
The participants will develop systems that receive English queries and search a collection of about 10 million documents in Chinese (3.1M), Persian (2.2M), and Russian (4.6M). For each query, the system must produce a ranked list of 1,000 documents selected from the entire multilingual collection, ordered by likelihood and relevance to the topic. All systems should operate automatically without human intervention. Submissions must be in the TREC run file format. Each team may submit up to 5 submissions and will be evaluated using nDCG@20.
IMPORTANT DATES
November 17, 2025: Document collection, development/test queries, and the submission portal are available.
December 1, 2025: Online Q&A session if needed
February 2, 2026: Submission due
February 22-26, 2026: WSDM Conference; winner and evaluation result announcement. An overview technical report will be released along with the final results.