Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models

Woon Sang Cho, Yizhe Zhang, Sudha Rao, Chris Brockett, Sungjin Lee


Abstract
Ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined queries, each of which relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. We propose a new task of generating common question from multiple documents and present simple variant of an existing multi-source encoder-decoder framework, called the Multi-Source Question Generator (MSQG). We first train an RNN-based single encoder-decoder generator from (single document, question) pairs. At test time, given multiple documents, the Distribute step of our MSQG model predicts target word distributions for each document using the trained model. The Aggregate step aggregates these distributions to generate a common question. This simple yet effective strategy significantly outperforms several existing baseline models applied to the new task when evaluated using automated metrics and human judgments on the MS-MARCO-QA dataset.
Anthology ID:
D19-5604
Volume:
Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:
November
Year:
2019
Address:
Hong Kong
Editors:
Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, Katsuhito Sudoh
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–43
Language:
URL:
https://aclanthology.org/D19-5604
DOI:
10.18653/v1/D19-5604
Bibkey:
Cite (ACL):
Woon Sang Cho, Yizhe Zhang, Sudha Rao, Chris Brockett, and Sungjin Lee. 2019. Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 32–43, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models (Cho et al., NGT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5604.pdf
Data
MS MARCO