Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index

Roma Patel, Marta Garnelo, Ian Gemp, Chris Dyer, Yoram Bachrach


Abstract
The input vocabulary and the representations learned are crucial to the performance of neural NLP models. Using the full vocabulary results in less explainable and more memory intensive models, with the embedding layer often constituting the majority of model parameters. It is thus common to use a smaller vocabulary to lower memory requirements and construct more interpertable models. We propose a vocabulary selection method that views words as members of a team trying to maximize the model’s performance. We apply power indices from cooperative game theory, including the Shapley value and Banzhaf index, that measure the relative importance of individual team members in accomplishing a joint task. We approximately compute these indices to identify the most influential words. Our empirical evaluation examines multiple NLP tasks, including sentence and document classification, question answering and textual entailment. We compare to baselines that select words based on frequency, TF-IDF and regression coefficients under L1 regularization, and show that this game-theoretic vocabulary selection outperforms all baseline on a range of different tasks and datasets.
Anthology ID:
2021.naacl-main.223
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2789–2798
Language:
URL:
https://aclanthology.org/2021.naacl-main.223
DOI:
10.18653/v1/2021.naacl-main.223
Bibkey:
Cite (ACL):
Roma Patel, Marta Garnelo, Ian Gemp, Chris Dyer, and Yoram Bachrach. 2021. Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2789–2798, Online. Association for Computational Linguistics.
Cite (Informal):
Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index (Patel et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.223.pdf
Video:
 https://aclanthology.org/2021.naacl-main.223.mp4
Data
AG NewsCoLAGLUESNLISSTSST-2