Measuring Linguistic Diversity During COVID-19

Jonathan Dunn, Tom Coupe, Benjamin Adams


Abstract
Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them.
Anthology ID:
2020.nlpcss-1.1
Volume:
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Month:
November
Year:
2020
Address:
Online
Editors:
David Bamman, Dirk Hovy, David Jurgens, Brendan O'Connor, Svitlana Volkova
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2020.nlpcss-1.1
DOI:
10.18653/v1/2020.nlpcss-1.1
Bibkey:
Cite (ACL):
Jonathan Dunn, Tom Coupe, and Benjamin Adams. 2020. Measuring Linguistic Diversity During COVID-19. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 1–10, Online. Association for Computational Linguistics.
Cite (Informal):
Measuring Linguistic Diversity During COVID-19 (Dunn et al., NLP+CSS 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.nlpcss-1.1.pdf
Optional supplementary material:
 2020.nlpcss-1.1.OptionalSupplementaryMaterial.zip
Video:
 https://slideslive.com/38940618