Dataset for Temporal Analysis of English-French Cognates

Esteban Frossard, Mickael Coustaty, Antoine Doucet, Adam Jatowt, Simon Hengchen


Abstract
Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention. In this paper, we focus on creating a dataset to support investigating the similarity in evolution between different languages. We look in particular into the similarities and differences between the use of corresponding words across time in English and French, two languages from different linguistic families yet with shared syntax and close contact. For this we select a set of cognates in both languages and study their frequency changes and correlations over time. We propose a new dataset for computational approaches of synchronized diachronic investigation of language pairs, and subsequently show novel findings stemming from the cognate-focused diachronic comparison of the two chosen languages. To the best of our knowledge, the present study is the first in the literature to use computational approaches and large data to make a cross-language diachronic analysis.
Anthology ID:
2020.lrec-1.107
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
855–859
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.107
DOI:
Bibkey:
Cite (ACL):
Esteban Frossard, Mickael Coustaty, Antoine Doucet, Adam Jatowt, and Simon Hengchen. 2020. Dataset for Temporal Analysis of English-French Cognates. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 855–859, Marseille, France. European Language Resources Association.
Cite (Informal):
Dataset for Temporal Analysis of English-French Cognates (Frossard et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.107.pdf