Allgemeine Musikalische Zeitung as a Searchable Online Corpus

Bernd Kampe, Tinghui Duan, Udo Hahn


Abstract
The massive digitization efforts related to historical newspapers over the past decades have focused on mass media sources and ordinary people as their primary recipients. Much less attention has been paid to newspapers published for a more specialized audience, e.g., those aiming at scholarly or cultural exchange within intellectual communities much narrower in scope, such as newspapers devoted to music criticism, arts or philosophy. Only some few of these specialized newspapers have been digitized up until now, but they are usually not well curated in terms of digitization quality, data formatting, completeness, redundancy (de-duplication), supply of metadata, and, hence, searchability. This paper describes our approach to eliminate these drawbacks for a major German-language newspaper resource of the Romantic Age, the Allgemeine Musikalische Zeitung (General Music Gazette). We here focus on a workflow that copes with a posteriori digitization problems, inconsistent OCRing and index building for searchability. In addition, we provide a user-friendly graphic interface to empower content-centric access to this (and other) digital resource(s) adopting open-source software for the purpose of Web presentation.
Anthology ID:
2020.lrec-1.122
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
969–976
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.122
DOI:
Bibkey:
Cite (ACL):
Bernd Kampe, Tinghui Duan, and Udo Hahn. 2020. Allgemeine Musikalische Zeitung as a Searchable Online Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 969–976, Marseille, France. European Language Resources Association.
Cite (Informal):
Allgemeine Musikalische Zeitung as a Searchable Online Corpus (Kampe et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.122.pdf