“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation

Stephanie Schoch, Diyi Yang, Yangfeng Ji


Abstract
Despite recent efforts reviewing current human evaluation practices for natural language generation (NLG) research, the lack of reported question wording and potential for framing effects or cognitive biases influencing results has been widely overlooked. In this opinion paper, we detail three possible framing effects and cognitive biases that could be imposed on human evaluation in NLG. Based on this, we make a call for increased transparency for human evaluation in NLG and propose the concept of human evaluation statements. We make several recommendations for design details to report that could potentially influence results, such as question wording, and suggest that reporting pertinent design details can help increase comparability across studies as well as reproducibility of results.
Anthology ID:
2020.evalnlgeval-1.2
Volume:
Proceedings of the 1st Workshop on Evaluating NLG Evaluation
Month:
December
Year:
2020
Address:
Online (Dublin, Ireland)
Editors:
Shubham Agarwal, Ondřej Dušek, Sebastian Gehrmann, Dimitra Gkatzia, Ioannis Konstas, Emiel Van Miltenburg, Sashank Santhanam
Venue:
EvalNLGEval
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–16
Language:
URL:
https://aclanthology.org/2020.evalnlgeval-1.2
DOI:
Bibkey:
Cite (ACL):
Stephanie Schoch, Diyi Yang, and Yangfeng Ji. 2020. “This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation, pages 10–16, Online (Dublin, Ireland). Association for Computational Linguistics.
Cite (Informal):
“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation (Schoch et al., EvalNLGEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.evalnlgeval-1.2.pdf