Suchet Chachra


2016

pdf bib
A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature
Suchet Chachra | Asma Ben Abacha | Sonya Shooshan | Laritza Rodriguez | Dina Demner-Fushman
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Readers usually rely on abstracts to identify relevant medical information from scientific articles. Abstracts are also essential to advanced information retrieval methods. More than 50 thousand scientific publications in PubMed lack author-generated abstracts, and the relevancy judgements for these papers have to be based on their titles alone. In this paper, we propose a hybrid summarization technique that aims to select the most pertinent sentences from articles to generate an extractive summary in lieu of a missing abstract. We combine i) health outcome detection, ii) keyphrase extraction, and iii) textual entailment recognition between sentences. We evaluate our hybrid approach and analyze the improvements of multi-factor summarization over techniques that rely on a single method, using a collection of 295 manually generated reference summaries. The obtained results show that the hybrid approach outperforms the baseline techniques with an improvement of 13% in recall and 4% in F1 score.