Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation

Emiel van Miltenburg, Chris van der Lee, Thiago Castro-Ferreira, Emiel Krahmer


Abstract
NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual similarity metrics, such as BLEU. This position paper argues in favour of two alternative evaluation strategies, using grammars or rule-based systems. These strategies are particularly useful to identify the strengths and weaknesses of different systems. We contrast our proposals with the (extended) WebNLG dataset, which is revealed to have a skewed distribution of predicates. We predict that this distribution affects the quality of the predictions for systems trained on this data. However, this hypothesis can only be thoroughly tested (without any confounds) once we are able to systematically manipulate the skewness of the data, using a rule-based approach.
Anthology ID:
2020.evalnlgeval-1.3
Volume:
Proceedings of the 1st Workshop on Evaluating NLG Evaluation
Month:
December
Year:
2020
Address:
Online (Dublin, Ireland)
Editors:
Shubham Agarwal, Ondřej Dušek, Sebastian Gehrmann, Dimitra Gkatzia, Ioannis Konstas, Emiel Van Miltenburg, Sashank Santhanam
Venue:
EvalNLGEval
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–27
Language:
URL:
https://aclanthology.org/2020.evalnlgeval-1.3
DOI:
Bibkey:
Cite (ACL):
Emiel van Miltenburg, Chris van der Lee, Thiago Castro-Ferreira, and Emiel Krahmer. 2020. Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation, pages 17–27, Online (Dublin, Ireland). Association for Computational Linguistics.
Cite (Informal):
Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation (van Miltenburg et al., EvalNLGEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.evalnlgeval-1.3.pdf
Data
MS COCO