Paraphrase Identification (State of the art)

2014-10-16T09:40:36Z

Jmcejuela: Fixed link

* '''source''': [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP)
* '''task''': given a pair of sentences, classify them as paraphrases or not paraphrases
* '''see''': Dolan et al. (2004)
* '''train''': 4,076 sentence pairs (2,753 positive: 67.5%)
* '''test''': 1,725 sentence pairs (1,147 positive: 66.5%)
* '''see also:''' [[Similarity (State of the art)]]

== Sample data ==

* '''Sentence 1''': Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
* '''Sentence 2''': Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
* '''Class''': 1 (true paraphrase)

== Table of results ==

* '''Listed in order of increasing F score.'''

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference
! Description
! Supervision
! Accuracy
! F
|-
| Vector Based Similarity (Baseline)
| Mihalcea et al. (2006)
| cosine similarity with tf-idf weighting
| unsupervised
| 65.4%
| 75.3%
|-
| ESA
| Hassan (2011)
| explicit semantic space
| unsupervised
| 67.0%
| 79.3%
|-
| KM
| Kozareva and Montoyo (2006)
| combination of lexical and semantic features
| supervised
| 76.6%
| 79.6%
|-
| LSA
| Hassan (2011)
| latent semantic space
| unsupervised
| 68.8%
| 79.9%
|-
| RMLMG
| Rus et al. (2008)
| graph subsumption
| unsupervised
| 70.6%
| 80.5%
|-
| MCS
| Mihalcea et al. (2006)
| combination of several word similarity measures
| unsupervised
| 70.3%
| 81.3%
|-
| STS
| Islam and Inkpen (2007)
| combination of semantic and string similarity
| unsupervised
| 72.6%
| 81.3%
|-
| SSA
| Hassan (2011)
| salient semantic space
| unsupervised
| 72.5%
| 81.4%
|-
| QKC
| Qiu et al. (2006)
| sentence dissimilarity classification
| supervised
| 72.0%
| 81.6%
|-
| ParaDetect
| Zia and Wasif (2012)
| PI using semantic heuristic features
| supervised
| 74.7%
| 81.8%
|-
| SDS
| Blacoe and Lapata (2012)
| simple distributional semantic space
| supervised
| 73.0%
| 82.3%
|-
| matrixJcn
| Fernando and Stevenson (2008)
| JCN WordNet similarity with matrix
| unsupervised
| 74.1%
| 82.4%
|-
| FHS
| Finch et al. (2005)
| combination of MT evaluation measures as features
| supervised
| 75.0%
| 82.7%
|-
| PE
| Das and Smith (2009)
| product of experts
| supervised
| 76.1%
| 82.7%
|-
| WDDP
| Wan et al. (2006)
| dependency-based features
| supervised
| 75.6%
| 83.0%
|-
| SHPNM
| Socher et al. (2011)
| recursive autoencoder with dynamic pooling
| supervised
| 76.8%
| 83.6%
|-
| MTMETRICS
| Madnani et al. (2012)
| combination of eight machine translation metrics
| supervised
| 77.4%
| 84.1%
|-
|}

== References ==

* '''Listed alphabetically.'''

Blacoe, W. and Lapata, M. (2012). [http://newdesign.aclweb.org/anthology/D/D12/D12-1050.pdf A comparison of vector-based representations for semantic composition], ''Proceedings of EMNLP'', Jeju Island, Korea, pp. 546-556.

Das, D., and Smith, N. (2009). [http://www.aclweb.org/anthology-new/P/P09/P09-1053.pdf Paraphrase identification as probabilistic quasi-synchronous recognition]. ''Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP'', pp. 468-476, Suntec, Singapore.

Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356.

Fernando, S., and Stevenson, M. (2008). [http://staffwww.dcs.shef.ac.uk/people/S.Fernando/pubs/clukPaper.pdf A semantic similarity approach to paraphrase detection], ''Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium''.

Finch, A., and H, Y.S., and Sumita, E. (2005). [http://aclweb.org/anthology/I/I05/I05-5003.pdf Using machine translation evaluation techniques to determine sentence-level semantic equivalence], "Proceedings of the Third International Workshop on Paraphrasing (IWP 2005)", Jeju Island, South Korea, pp. 17-24.

Hassan, Samer. [http://samerhassan.com/images/0/01/Dissertation.pdf Measuring Semantic Relatedness Using Salient Encyclopedic Concepts]. Doctor of Philosophy, August 2011

Islam, A., and Inkpen, D. (2007). [http://www.site.uottawa.ca/~diana/publications/ranlp_2007_textsim_camera_ready.pdf Semantic similarity of short texts], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007)'', Borovets, Bulgaria, pp. 291-297.

Kozareva, Z., and Montoyo, A. (2006). [http://www.dlsi.ua.es/~zkozareva/papers/fintalKozareva.pdf Paraphrase identification on the basis of supervised machine learning techniques], ''Advances in Natural Language Processing: 5th International Conference on NLP (FinTAL 2006)'', Turku, Finland, 524-533.

Madnani, N., Tetreault, J., and Chodorow, M. (2012). [http://www.aclweb.org/anthology-new/N/N12/N12-1019.pdf Re-examining Machine Translation Metrics for Paraphrase Identification], ''Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012)'', pp. 182-190.

Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://www.cse.unt.edu/~rada/papers/mihalcea.aaai06.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780.

Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). [http://acl.ldc.upenn.edu/W/W06/W06-1603.pdf Paraphrase recognition via dissimilarity significance classification], ''Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006)'', pp. 18-26.

Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). [http://csep.psyc.memphis.edu/McNamara/pdf/Paraphrase_Identification.pdf Paraphrase identification with lexico-syntactic graph subsumption], ''FLAIRS 2008'', pp. 201-206.

Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). [http://www.socher.org/uploads/Main/SocherHuangPenningtonNgManning_NIPS2011.pdf Dynamic pooling and unfolding recursive autoencoders for paraphrase detection], "Advances in Neural Information Processing Systems 24"

Wan, S., Dras, M., Dale, R., and Paris, C. (2006). [http://www.alta.asn.au/events/altw2006/proceedings/swan-final.pdf Using dependency-based features to take the "para-farce" out of paraphrase], ''Proceedings of the Australasian Language Technology Workshop (ALTW 2006)'', pp. 131-138.

Zia Ul-Qayyum and Wasif Altaf, (2012). [http://maxwellsci.com/print/rjaset/v4-4894-4904.pdf Paraphrase Identification using Semantic Heuristic Features], ''Research Journal of Applied Sciences, Engineering and Technology'', 4(22): 4894-4904.



[[Category:State of the art]]

ACL Wiki - User contributions [en]

Paraphrase Identification (State of the art)