Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts

Rosa Filgueira, Claire Grover, Melissa Terras, Beatrice Alex


Abstract
This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.
Anthology ID:
2020.cmlc-1.4
Volume:
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Venue:
CMLC
SIG:
Publisher:
European Language Ressources Association
Note:
Pages:
24–30
Language:
English
URL:
https://aclanthology.org/2020.cmlc-1.4
DOI:
Bibkey:
Cite (ACL):
Rosa Filgueira, Claire Grover, Melissa Terras, and Beatrice Alex. 2020. Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 24–30, Marseille, France. European Language Ressources Association.
Cite (Informal):
Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts (Filgueira et al., CMLC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cmlc-1.4.pdf