Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Shudong Hao, Jordan Boyd-Graber, Michael J. Paul


Abstract
Multilingual topic models enable document analysis across languages through coherent multilingual summaries of the data. However, there is no standard and effective metric to evaluate the quality of multilingual topics. We introduce a new intrinsic evaluation of multilingual topic models that correlates well with human judgments of multilingual topic coherence as well as performance in downstream applications. Importantly, we also study evaluation for low-resource languages. Because standard metrics fail to accurately measure topic quality when robust external resources are unavailable, we propose an adaptation model that improves the accuracy and reliability of these metrics in low-resource settings.
Anthology ID:
N18-1099
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1090–1100
Language:
URL:
https://aclanthology.org/N18-1099
DOI:
10.18653/v1/N18-1099
Bibkey:
Cite (ACL):
Shudong Hao, Jordan Boyd-Graber, and Michael J. Paul. 2018. Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1090–1100, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation (Hao et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1099.pdf
Video:
 https://aclanthology.org/N18-1099.mp4