Structured Prediction via Learning to Search under Bandit Feedback

Amr Sharaf, Hal Daumé III


Abstract
We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.
Anthology ID:
W17-4304
Volume:
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Kai-Wei Chang, Ming-Wei Chang, Vivek Srikumar, Alexander M. Rush
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–26
Language:
URL:
https://aclanthology.org/W17-4304
DOI:
10.18653/v1/W17-4304
Bibkey:
Cite (ACL):
Amr Sharaf and Hal Daumé III. 2017. Structured Prediction via Learning to Search under Bandit Feedback. In Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing, pages 17–26, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Structured Prediction via Learning to Search under Bandit Feedback (Sharaf & Daumé III, 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4304.pdf
Data
Penn Treebank