The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study

Stefan Fischer, Jörg Knappen, Katrin Menzel, Elke Teich


Abstract
We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings. The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases. We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.
Anthology ID:
2020.lrec-1.99
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
794–802
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.99
DOI:
Bibkey:
Cite (ACL):
Stefan Fischer, Jörg Knappen, Katrin Menzel, and Elke Teich. 2020. The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 794–802, Marseille, France. European Language Resources Association.
Cite (Informal):
The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study (Fischer et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.99.pdf