Submission deadline
31 July 2026
Large Language Models (LLMs) are rapidly reshaping the AI landscape, and their use as automated “judges” or evaluators has the potential to revolutionize how we assess a wide range of outputs, from document relevance in information retrieval to the quality of machine-generated text. Crucially, their benefits extend far beyond traditional, well-resourced tasks where human judgments are costly to obtain. LLMs open the door to generating labels and evaluations in niche or emerging domains where no strong annotation tradition exists and data are scarce, so-called “low-resource” areas. Examples range from specialized biomedical subfields, environmental and climate-change–related text analysis, and underrepresented languages, to emerging areas such as online behavioral research or misinformation detection. In these contexts, LLMs can help create high-quality labels, accelerate dataset development, and enable reliable system evaluation where traditional human annotation would be prohibitively difficult or expensive.
This collection aims to bring together cutting-edge research that addresses the significant opportunities and inherent challenges of this paradigm. We welcome contributions on topics such as: foundational methods, novel prompting strategies, fine-tuning techniques, and frameworks for improving LLM evaluation capabilities.
Topics of interest include but are not limited to:
- Reliability and Bias: Studies on understanding, quantifying, and mitigating the inherent biases (e.g., position bias, self-preference) and ensuring the reliability of LLM-based judgments.
- Novel Applications: Papers exploring the use of LLM evaluators in diverse domains, including information retrieval, NLP, recommender systems, healthcare, computational social science, and education.
- Human-LLM Collaboration: Research on hybrid systems that combine human expertise with the scalability of LLMs to create more robust and efficient evaluation pipelines.
- Efficiency and Scalability: Techniques to make LLM-based evaluation more cost-effective and computationally efficient.