Difference between revisions of "Question Answering (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
m
(21 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
== Answer Sentence Selection ==
 
== Answer Sentence Selection ==
  
The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.
+
The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.
 +
 
 +
* [http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz QA Answer Sentence Selection Dataset]: labeled sentences using TREC QA track data, provided by [http://cs.stanford.edu/people/mengqiu/ Mengqiu Wang] and first used in [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf Wang et al. (2007)].
 +
* Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.
 +
* Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to [http://www.cs.umd.edu/~jinfeng/publications/PairwiseNeuralNetwork_CIKM2016.pdf Rao et al. (2016)].  
  
  
 
{| border="1" cellpadding="5" cellspacing="1"
 
{| border="1" cellpadding="5" cellspacing="1"
 
|-
 
|-
! Algorithm
+
! Algorithm - Raw Version of TREC QA
 
! Reference
 
! Reference
 
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
 
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
Line 40: Line 44:
 
| 0.631
 
| 0.631
 
| 0.748
 
| 0.748
 +
|-
 +
| S&M (2013)
 +
| Severyn and Moschitti (2013)
 +
| 0.678
 +
| 0.736
 
|-
 
|-
 
| Shnarch (2013) - Backward  
 
| Shnarch (2013) - Backward  
Line 51: Line 60:
 
| 0.770
 
| 0.770
 
|-
 
|-
 +
| Yu (2014) - TRAIN-ALL bigram+count
 +
| Yu et al. (2014)
 +
| 0.711
 +
| 0.785
 +
|-
 +
| W&N (2015) - Three-Layer BLSTM+BM25
 +
| Wang and Nyberg (2015)
 +
| 0.713
 +
| 0.791
 +
|-
 +
| Feng (2015) - Architecture-II
 +
| Tan et al. (2015)
 +
| 0.711
 +
| 0.800
 +
|-
 +
| S&M (2015)
 +
| Severyn and Moschitti (2015)
 +
| 0.746
 +
| 0.808
 +
|-
 +
| H&L (2016) - Pairwise Word Interaction Modelling
 +
| He and Lin (2016)
 +
| 0.758
 +
| 0.822
 +
|-
 +
| H&L (2015) - Multi-Perspective CNN
 +
| He and Lin (2015)
 +
| 0.762
 +
| 0.830
 +
|-
 +
| Rao (2016) - PairwiseRank + Multi-Perspective CNN
 +
| Rao et al. (2016)
 +
| 0.780
 +
| 0.834
 +
|}
 +
 +
 +
{| border="1" cellpadding="5" cellspacing="1"
 +
|-
 +
! Algorithm - Clean Version of TREC QA
 +
! Reference
 +
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
 +
! [http://en.wikipedia.org/wiki/Mean_reciprocal_rank MRR]
 +
|-
 +
| W&I (2015)
 +
| Wang and Ittycheriah (2015)
 +
| 0.746
 +
| 0.820
 +
|-
 +
| Tan (2015) - QA-LSTM/CNN+attention
 +
| Tan et al. (2015)
 +
| 0.728
 +
| 0.832
 +
|-
 +
| dos Santos (2016) - Attentive Pooling CNN
 +
| dos Santos et al. (2016)
 +
| 0.753
 +
| 0.851
 +
|-
 +
| Wang et al.  (2016) - Lexical Decomposition and Composition
 +
| Wang et al. (2016)
 +
| 0.771
 +
| 0.845
 +
|-
 +
| H&L (2015) - Multi-Perspective CNN
 +
| He and Lin (2015)
 +
| 0.777
 +
| 0.836
 +
|-
 +
| Rao et al.  (2016) - PairwiseRank + Multi-Perspective CNN
 +
| Rao et al. (2016)
 +
| 0.801
 +
| 0.877
 
|}
 
|}
  
 
== References ==
 
== References ==
* Wang, Mengqiu and Smith, Noah A. and Mitamura, Teruko. [http://www.aclweb.org/anthology/D/D07/D07-1003 What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA]. In EMNLP-CoNLL 2007.
+
* Vasin Punyakanok, Dan Roth, and Wen-Tau Yih. 2004. [http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi04a.pdf Mapping dependencies trees: An application to question answering]. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA.
* Heilman, Michael and Smith, Noah A. [http://www.aclweb.org/anthology/N10-1145 Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions]. In NAACL-HLT 2010.
+
* Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. [http://ws.csie.ncku.edu.tw/login/upload/2005/paper/Question%20answering%20Question%20answering%20passage%20retrieval%20using%20dependency%20relations.pdf Question answering passage retrieval using dependency relations]. In Proceedings of the 28th ACM-SIGIR International Conference on Research and Development in Information Retrieval, Salvador, Brazil.
* E. Shnarch. Probabilistic Models for Lexical Inference. Ph.D. thesis, Bar Ilan University. 2013.
+
* Wang, Mengqiu and Smith, Noah A. and Mitamura, Teruko. 2007. [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA]. In EMNLP-CoNLL 2007.
* Yao, Xuchen and Van Durme, Benjamin and Callison-Burch, Chris and Clark, Peter. [http://www.aclweb.org/anthology/N13-1106 Answer Extraction as Sequence Tagging with Tree Edit Distance]. In NAACL-HLT 2013.
+
* Heilman, Michael and Smith, Noah A. 2010. [http://www.aclweb.org/anthology/N10-1145 Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions]. In NAACL-HLT 2010.
* Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej. [http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf Question Answering Using Enhanced Lexical Semantic Models]. In ACL 2013.
+
* Wang, Mengqiu and Manning, Christopher. 2010. [http://aclweb.org/anthology//C/C10/C10-1131.pdf Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering]. In COLING 2010.
* Severyn, Aliaksei and Moschitti, Alessandro. [http://www.aclweb.org/anthology/D13-1044 Automatic Feature Engineering for Answer Selection and Extraction]. In EMNLP 2013.
+
* E. Shnarch. 2013. Probabilistic Models for Lexical Inference. Ph.D. thesis, Bar Ilan University.
 
+
* Yao, Xuchen and Van Durme, Benjamin and Callison-Burch, Chris and Clark, Peter. 2013. [http://www.aclweb.org/anthology/N13-1106.pdf Answer Extraction as Sequence Tagging with Tree Edit Distance]. In NAACL-HLT 2013.
 +
* Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej. 2013. [http://research.microsoft.com/pubs/192357/QA-SentSel-Updated-PostACL.pdf Question Answering Using Enhanced Lexical Semantic Models]. In ACL 2013.
 +
* Severyn, Aliaksei and Moschitti, Alessandro. 2013. [http://www.aclweb.org/anthology/D13-1044.pdf Automatic Feature Engineering for Answer Selection and Extraction]. In EMNLP 2013.
 +
* Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. [http://arxiv.org/pdf/1412.1632v1.pdf Deep Learning for Answer Sentence Selection]. In NIPS deep learning workshop.
 +
* Di Wang and Eric Nyberg. 2015. [http://www.aclweb.org/anthology/P15-2116 A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering]. In ACL 2015.
 +
* Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou. 2015. [http://arxiv.org/abs/1508.01585 Applying deep learning to answer selection: A study and an open task]. In ASRU 2015.
 +
* Aliaksei Severyn and Alessandro Moschitti. 2015. [http://disi.unitn.it/~severyn/papers/sigir-2015-long.pdf Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks]. In SIGIR 2015.
 +
* Zhiguo Wang and Abraham Ittycheriah. 2015. [http://arxiv.org/abs/1507.02628 FAQ-based Question Answering via Word Alignment]. In eprint arXiv:1507.02628.
 +
* Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou. 2015. [http://arxiv.org/abs/1511.04108 LSTM-Based Deep Learning Models for Nonfactoid Answer Selection]. In eprint arXiv:1511.04108.
 +
* Cicero dos Santos, Ming Tan, Bing Xiang & Bowen Zhou. 2016. [http://arxiv.org/abs/1602.03609 Attentive Pooling Networks]. In eprint arXiv:1602.03609.
 +
* Zhiguo Wang, Haitao Mi and Abraham Ittycheriah. 2016. [http://arxiv.org/pdf/1602.07019v1.pdf Sentence Similarity Learning by Lexical Decomposition and Composition]. In eprint arXiv:1602.07019.
 +
* Hua He, Kevin Gimpel and Jimmy Lin. 2015. [http://aclweb.org/anthology/D/D15/D15-1181.pdf Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks]. In EMNLP 2015.
 +
* Hua He and Jimmy Lin. 2016. [https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement]. In NAACL 2016.
 +
* Jinfeng Rao, Hua He and Jimmy Lin. 2016. [http://www.cs.umd.edu/~jinfeng/publications/PairwiseNeuralNetwork_CIKM2016.pdf Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks]. In CIKM 2016
 
[[Category:State of the art]]
 
[[Category:State of the art]]

Revision as of 15:25, 20 October 2016

Answer Sentence Selection

The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.

  • QA Answer Sentence Selection Dataset: labeled sentences using TREC QA track data, provided by Mengqiu Wang and first used in Wang et al. (2007).
  • Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.
  • Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to Rao et al. (2016).


Algorithm - Raw Version of TREC QA Reference MAP MRR
Punyakanok (2004) Wang et al. (2007) 0.419 0.494
Cui (2005) Wang et al. (2007) 0.427 0.526
Wang (2007) Wang et al. (2007) 0.603 0.685
H&S (2010) Heilman and Smith (2010) 0.609 0.692
W&M (2010) Wang and Manning (2010) 0.595 0.695
Yao (2013) Yao et al. (2013) 0.631 0.748
S&M (2013) Severyn and Moschitti (2013) 0.678 0.736
Shnarch (2013) - Backward Shnarch (2013) 0.686 0.754
Yih (2013) - LCLR Yih et al. (2013) 0.709 0.770
Yu (2014) - TRAIN-ALL bigram+count Yu et al. (2014) 0.711 0.785
W&N (2015) - Three-Layer BLSTM+BM25 Wang and Nyberg (2015) 0.713 0.791
Feng (2015) - Architecture-II Tan et al. (2015) 0.711 0.800
S&M (2015) Severyn and Moschitti (2015) 0.746 0.808
H&L (2016) - Pairwise Word Interaction Modelling He and Lin (2016) 0.758 0.822
H&L (2015) - Multi-Perspective CNN He and Lin (2015) 0.762 0.830
Rao (2016) - PairwiseRank + Multi-Perspective CNN Rao et al. (2016) 0.780 0.834


Algorithm - Clean Version of TREC QA Reference MAP MRR
W&I (2015) Wang and Ittycheriah (2015) 0.746 0.820
Tan (2015) - QA-LSTM/CNN+attention Tan et al. (2015) 0.728 0.832
dos Santos (2016) - Attentive Pooling CNN dos Santos et al. (2016) 0.753 0.851
Wang et al. (2016) - Lexical Decomposition and Composition Wang et al. (2016) 0.771 0.845
H&L (2015) - Multi-Perspective CNN He and Lin (2015) 0.777 0.836
Rao et al. (2016) - PairwiseRank + Multi-Perspective CNN Rao et al. (2016) 0.801 0.877

References