An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction

Wang Chen, Hou Pong Chan, Piji Li, Lidong Bing, Irwin King


Abstract
In this paper, we present a novel integrated approach for keyphrase generation (KG). Unlike previous works which are purely extractive or generative, we first propose a new multi-task learning framework that jointly learns an extractive model and a generative model. Besides extracting keyphrases, the output of the extractive model is also employed to rectify the copy probability distribution of the generative model, such that the generative model can better identify important contents from the given document. Moreover, we retrieve similar documents with the given document from training data and use their associated keyphrases as external knowledge for the generative model to produce more accurate keyphrases. For further exploiting the power of extraction and retrieval, we propose a neural-based merging module to combine and re-rank the predicted keyphrases from the enhanced generative model, the extractive model, and the retrieved keyphrases. Experiments on the five KG benchmarks demonstrate that our integrated approach outperforms the state-of-the-art methods.
Anthology ID:
N19-1292
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2846–2856
Language:
URL:
https://aclanthology.org/N19-1292
DOI:
10.18653/v1/N19-1292
Bibkey:
Cite (ACL):
Wang Chen, Hou Pong Chan, Piji Li, Lidong Bing, and Irwin King. 2019. An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2846–2856, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction (Chen et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1292.pdf
Code
 Chen-Wang-CUHK/KG-KE-KR-M
Data
KP20k