Paraphrase Identification (State of the art)

2011-12-15T18:20:24Z

Eyeh: Added entries for Finch et al. 2005, Socher et al. 2011

* '''source''': [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP)
* '''task''': given a pair of sentences, classify them as paraphrases or not paraphrases
* '''see''': Dolan et al. (2004)
* '''train''': 4,076 sentence pairs (2,753 positive: 67.5%)
* '''test''': 1,725 sentence pairs (1,147 positive: 66.5%)

== Sample data ==

* '''Sentence 1''': Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
* '''Sentence 2''': Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
* '''Class''': 1 (true paraphrase)

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference
! Description
! Accuracy
! F
|-
| FHS
| Finch et al. (2005)
| supervised combination of MT evaluation measures as features
| 75.0%
| 82.7%
|-
| KM
| Kozareva and Montoyo (2006)
| supervised combination of lexical and semantic features
| 76.6%
| 79.6%
|-
| RMLMG
| Rus et al. (2008)
| unsupervised graph subsumption
| 70.6%
| 80.5%
|-
| MCS
| Mihalcea et al. (2006)
| unsupervised combination of several word similarity measures
| 70.3%
| 81.3%
|-
| STS
| Islam and Inkpen (2007)
| unsupervised combination of semantic and string similarity
| 72.6%
| 81.3%
|-
| QKC
| Qiu et al. (2006)
| supervised sentence dissimilarity classification
| 72.0%
| 81.6%
|-
| matrixJcn
| Fernando and Stevenson (2008)
| unsupervised JCN WordNet similarity with matrix
| 74.1%
| 82.4%
|-
| SHPNM
| Socher et al. (2011)
| supervised recursive autoencoder with dynamic pooling
| 76.8%
| 83.6%
|-
| WDDP
| Wan et al. (2006)
| supervised dependency-based features
| 75.6%
| 83.0%
|-
|}

== References ==

Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356.

Fernando, S., and Stevenson, M. (2008). [http://www.dcs.shef.ac.uk/~samf/clukPaper.pdf A semantic similarity approach to paraphrase detection], ''Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium''.

Finch, A., and H, Y.S., and Sumita, E. (2005). [http://aclweb.org/anthology/I/I05/I05-5003.pdf Using machine translation evaluation techniques to determine sentence-level semantic equivalence], "Proceedings of the Third International Workshop on Paraphrasing (IWP 2005)", Jeju Island, South Korea, pp. 17-24.

Islam, A., and Inkpen, D. (2007). [http://www.site.uottawa.ca/~diana/publications/ranlp_2007_textsim_camera_ready.pdf Semantic similarity of short texts], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007)'', Borovets, Bulgaria, pp. 291-297.

Kozareva, Z., and Montoyo, A. (2006). [http://www.dlsi.ua.es/~zkozareva/papers/fintalKozareva.pdf Paraphrase identification on the basis of supervised machine learning techniques], ''Advances in Natural Language Processing: 5th International Conference on NLP (FinTAL 2006)'', Turku, Finland, 524-533.

Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780.

Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). [http://acl.ldc.upenn.edu/W/W06/W06-1603.pdf Paraphrase recognition via dissimilarity significance classification], ''Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006)'', pp. 18-26.

Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). [http://csep.psyc.memphis.edu/McNamara/pdf/Paraphrase_Identification.pdf Paraphrase identification with lexico-syntactic graph subsumption], ''FLAIRS 2008'', pp. 201-206.

Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). [http://www.socher.org/uploads/Main/SocherHuangPenningtonNgManning_NIPS2011.pdf Dynamic pooling and unfolding recursive autoencoders for paraphrase detection], "Advances in Neural Information Processing Systems 24"

Wan, S., Dras, M., Dale, R., and Paris, C. (2006). [http://www.alta.asn.au/events/altw2006/proceedings/swan-final.pdf Using dependency-based features to take the "para-farce" out of paraphrase], ''Proceedings of the Australasian Language Technology Workshop (ALTW 2006)'', pp. 131-138.

== See also ==

* [[State of the art]]

[[Category:State of the art]]

ACL Wiki - User contributions [en]

Paraphrase Identification (State of the art)