Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain

Mark-Christoph Müller, Sucheta Ghosh, Maja Rey, Ulrike Wittig, Wolfgang Müller, Michael Strube


Abstract
We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task
Anthology ID:
2020.sdp-1.9
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
81–90
Language:
URL:
https://aclanthology.org/2020.sdp-1.9
DOI:
10.18653/v1/2020.sdp-1.9
Bibkey:
Cite (ACL):
Mark-Christoph Müller, Sucheta Ghosh, Maja Rey, Ulrike Wittig, Wolfgang Müller, and Michael Strube. 2020. Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain. In Proceedings of the First Workshop on Scholarly Document Processing, pages 81–90, Online. Association for Computational Linguistics.
Cite (Informal):
Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain (Müller et al., sdp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sdp-1.9.pdf
Video:
 https://slideslive.com/38940718