Multilingual Language Models Predict Human Reading Behavior

Nora Hollenstein, Federico Pirovano, Ce Zhang, Lena Jäger, Lisa Beinborn


Abstract
We analyze if large language models are able to predict patterns of human reading behavior. We compare the performance of language-specific and multilingual pretrained transformer models to predict reading time measures reflecting natural human sentence processing on Dutch, English, German, and Russian texts. This results in accurate models of human reading behavior, which indicates that transformer models implicitly encode relative importance in language in a way that is comparable to human processing mechanisms. We find that BERT and XLM models successfully predict a range of eye tracking features. In a series of experiments, we analyze the cross-domain and cross-language abilities of these models and show how they reflect human sentence processing.
Anthology ID:
2021.naacl-main.10
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
106–123
Language:
URL:
https://aclanthology.org/2021.naacl-main.10
DOI:
10.18653/v1/2021.naacl-main.10
Bibkey:
Cite (ACL):
Nora Hollenstein, Federico Pirovano, Ce Zhang, Lena Jäger, and Lisa Beinborn. 2021. Multilingual Language Models Predict Human Reading Behavior. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 106–123, Online. Association for Computational Linguistics.
Cite (Informal):
Multilingual Language Models Predict Human Reading Behavior (Hollenstein et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.10.pdf
Video:
 https://aclanthology.org/2021.naacl-main.10.mp4
Code
 DS3Lab/multilingual-gaze