<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Jmcejuela</id>
	<title>ACL Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Jmcejuela"/>
	<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/Special:Contributions/Jmcejuela"/>
	<updated>2026-04-09T19:45:55Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.6</generator>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art)&amp;diff=10862</id>
		<title>Paraphrase Identification (State of the art)</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art)&amp;diff=10862"/>
		<updated>2014-10-16T09:40:36Z</updated>

		<summary type="html">&lt;p&gt;Jmcejuela: Fixed link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* &#039;&#039;&#039;source&#039;&#039;&#039;: [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP)&lt;br /&gt;
* &#039;&#039;&#039;task&#039;&#039;&#039;: given a pair of sentences, classify them as paraphrases or not paraphrases&lt;br /&gt;
* &#039;&#039;&#039;see&#039;&#039;&#039;: Dolan et al. (2004)&lt;br /&gt;
* &#039;&#039;&#039;train&#039;&#039;&#039;: 4,076 sentence pairs (2,753 positive: 67.5%)&lt;br /&gt;
* &#039;&#039;&#039;test&#039;&#039;&#039;: 1,725 sentence pairs (1,147 positive: 66.5%)&lt;br /&gt;
* &#039;&#039;&#039;see also:&#039;&#039;&#039; [[Similarity (State of the art)]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Sample data ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Sentence 1&#039;&#039;&#039;: Amrozi accused his brother, whom he called &amp;quot;the witness&amp;quot;, of deliberately distorting his evidence.&lt;br /&gt;
* &#039;&#039;&#039;Sentence 2&#039;&#039;&#039;: Referring to him as only &amp;quot;the witness&amp;quot;, Amrozi accused his brother of deliberately distorting his evidence.&lt;br /&gt;
* &#039;&#039;&#039;Class&#039;&#039;&#039;: 1 (true paraphrase)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Table of results ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Listed in order of increasing F score.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;5&amp;quot; cellspacing=&amp;quot;1&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Algorithm&lt;br /&gt;
! Reference&lt;br /&gt;
! Description&lt;br /&gt;
! Supervision&lt;br /&gt;
! Accuracy&lt;br /&gt;
! F&lt;br /&gt;
|-&lt;br /&gt;
| Vector Based Similarity (Baseline)&lt;br /&gt;
| Mihalcea et al. (2006)&lt;br /&gt;
| cosine similarity with tf-idf weighting&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 65.4%&lt;br /&gt;
| 75.3%&lt;br /&gt;
|-&lt;br /&gt;
| ESA&lt;br /&gt;
| Hassan (2011)&lt;br /&gt;
| explicit semantic space&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 67.0%&lt;br /&gt;
| 79.3%&lt;br /&gt;
|-&lt;br /&gt;
| KM&lt;br /&gt;
| Kozareva and Montoyo (2006)&lt;br /&gt;
| combination of lexical and semantic features&lt;br /&gt;
| supervised&lt;br /&gt;
| 76.6%&lt;br /&gt;
| 79.6%&lt;br /&gt;
|-&lt;br /&gt;
| LSA&lt;br /&gt;
| Hassan (2011)&lt;br /&gt;
| latent semantic space&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 68.8%&lt;br /&gt;
| 79.9%&lt;br /&gt;
|-&lt;br /&gt;
| RMLMG&lt;br /&gt;
| Rus et al. (2008)&lt;br /&gt;
| graph subsumption&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 70.6%&lt;br /&gt;
| 80.5%&lt;br /&gt;
|-&lt;br /&gt;
| MCS&lt;br /&gt;
| Mihalcea et al. (2006)&lt;br /&gt;
| combination of several word similarity measures&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 70.3%&lt;br /&gt;
| 81.3%&lt;br /&gt;
|-&lt;br /&gt;
| STS&lt;br /&gt;
| Islam and Inkpen (2007)&lt;br /&gt;
| combination of semantic and string similarity&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 72.6%&lt;br /&gt;
| 81.3%&lt;br /&gt;
|-&lt;br /&gt;
| SSA&lt;br /&gt;
| Hassan (2011)&lt;br /&gt;
| salient semantic space&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 72.5%&lt;br /&gt;
| 81.4%&lt;br /&gt;
|-&lt;br /&gt;
| QKC&lt;br /&gt;
| Qiu et al. (2006)&lt;br /&gt;
| sentence dissimilarity classification&lt;br /&gt;
| supervised&lt;br /&gt;
| 72.0%&lt;br /&gt;
| 81.6%&lt;br /&gt;
|-&lt;br /&gt;
| ParaDetect&lt;br /&gt;
| Zia and Wasif (2012)&lt;br /&gt;
| PI using semantic heuristic features&lt;br /&gt;
| supervised&lt;br /&gt;
| 74.7%&lt;br /&gt;
| 81.8%&lt;br /&gt;
|-&lt;br /&gt;
| SDS&lt;br /&gt;
| Blacoe and Lapata (2012)&lt;br /&gt;
| simple distributional semantic space&lt;br /&gt;
| supervised&lt;br /&gt;
| 73.0%&lt;br /&gt;
| 82.3%&lt;br /&gt;
|-&lt;br /&gt;
| matrixJcn&lt;br /&gt;
| Fernando and Stevenson (2008)&lt;br /&gt;
| JCN WordNet similarity with matrix&lt;br /&gt;
| unsupervised&lt;br /&gt;
| 74.1%&lt;br /&gt;
| 82.4%&lt;br /&gt;
|-&lt;br /&gt;
| FHS&lt;br /&gt;
| Finch et al. (2005)&lt;br /&gt;
| combination of MT evaluation measures as features&lt;br /&gt;
| supervised&lt;br /&gt;
| 75.0%&lt;br /&gt;
| 82.7%&lt;br /&gt;
|-&lt;br /&gt;
| PE&lt;br /&gt;
| Das and Smith (2009)&lt;br /&gt;
| product of experts&lt;br /&gt;
| supervised&lt;br /&gt;
| 76.1%&lt;br /&gt;
| 82.7%&lt;br /&gt;
|-&lt;br /&gt;
| WDDP&lt;br /&gt;
| Wan et al. (2006)&lt;br /&gt;
| dependency-based features&lt;br /&gt;
| supervised&lt;br /&gt;
| 75.6%&lt;br /&gt;
| 83.0%&lt;br /&gt;
|-&lt;br /&gt;
| SHPNM&lt;br /&gt;
| Socher et al. (2011)&lt;br /&gt;
| recursive autoencoder with dynamic pooling&lt;br /&gt;
| supervised&lt;br /&gt;
| 76.8%&lt;br /&gt;
| 83.6%&lt;br /&gt;
|-&lt;br /&gt;
| MTMETRICS&lt;br /&gt;
| Madnani et al. (2012)&lt;br /&gt;
| combination of eight machine translation metrics&lt;br /&gt;
| supervised&lt;br /&gt;
| 77.4%&lt;br /&gt;
| 84.1%&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Listed alphabetically.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Blacoe, W. and Lapata, M. (2012). [http://newdesign.aclweb.org/anthology/D/D12/D12-1050.pdf A comparison of vector-based representations for semantic composition], &#039;&#039;Proceedings of EMNLP&#039;&#039;, Jeju Island, Korea, pp. 546-556.&lt;br /&gt;
&lt;br /&gt;
Das, D., and Smith, N. (2009). [http://www.aclweb.org/anthology-new/P/P09/P09-1053.pdf Paraphrase identification as probabilistic quasi-synchronous recognition]. &#039;&#039;Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP&#039;&#039;, pp. 468-476, Suntec, Singapore.&lt;br /&gt;
&lt;br /&gt;
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], &#039;&#039;Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)&#039;&#039;, Geneva, Switzerland, pp. 350-356.&lt;br /&gt;
&lt;br /&gt;
Fernando, S., and Stevenson, M. (2008). [http://staffwww.dcs.shef.ac.uk/people/S.Fernando/pubs/clukPaper.pdf A semantic similarity approach to paraphrase detection], &#039;&#039;Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Finch, A., and H, Y.S., and Sumita, E. (2005). [http://aclweb.org/anthology/I/I05/I05-5003.pdf Using machine translation evaluation techniques to determine sentence-level semantic equivalence], &amp;quot;Proceedings of the Third International Workshop on Paraphrasing (IWP 2005)&amp;quot;, Jeju Island, South Korea, pp. 17-24.&lt;br /&gt;
&lt;br /&gt;
Hassan, Samer. [http://samerhassan.com/images/0/01/Dissertation.pdf Measuring Semantic Relatedness Using Salient Encyclopedic Concepts]. Doctor of Philosophy, August 2011&lt;br /&gt;
&lt;br /&gt;
Islam, A., and Inkpen, D. (2007). [http://www.site.uottawa.ca/~diana/publications/ranlp_2007_textsim_camera_ready.pdf Semantic similarity of short texts], &#039;&#039;Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007)&#039;&#039;, Borovets, Bulgaria, pp. 291-297.&lt;br /&gt;
&lt;br /&gt;
Kozareva, Z., and Montoyo, A. (2006). [http://www.dlsi.ua.es/~zkozareva/papers/fintalKozareva.pdf Paraphrase identification on the basis of supervised machine learning techniques], &#039;&#039;Advances in Natural Language Processing: 5th International Conference on NLP (FinTAL 2006)&#039;&#039;, Turku, Finland, 524-533.&lt;br /&gt;
&lt;br /&gt;
Madnani, N., Tetreault, J., and Chodorow, M. (2012). [http://www.aclweb.org/anthology-new/N/N12/N12-1019.pdf Re-examining Machine Translation Metrics for Paraphrase Identification], &#039;&#039;Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012)&#039;&#039;, pp. 182-190.&lt;br /&gt;
&lt;br /&gt;
Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://www.cse.unt.edu/~rada/papers/mihalcea.aaai06.pdf Corpus-based and knowledge-based measures of text semantic similarity], &#039;&#039;Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)&#039;&#039;, Boston, Massachusetts, pp. 775-780.&lt;br /&gt;
&lt;br /&gt;
Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). [http://acl.ldc.upenn.edu/W/W06/W06-1603.pdf Paraphrase recognition via dissimilarity significance classification], &#039;&#039;Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006)&#039;&#039;, pp. 18-26.&lt;br /&gt;
&lt;br /&gt;
Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). [http://csep.psyc.memphis.edu/McNamara/pdf/Paraphrase_Identification.pdf Paraphrase identification with lexico-syntactic graph subsumption], &#039;&#039;FLAIRS 2008&#039;&#039;, pp. 201-206.&lt;br /&gt;
&lt;br /&gt;
Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). [http://www.socher.org/uploads/Main/SocherHuangPenningtonNgManning_NIPS2011.pdf Dynamic pooling and unfolding recursive autoencoders for paraphrase detection], &amp;quot;Advances in Neural Information Processing Systems 24&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wan, S., Dras, M., Dale, R., and Paris, C. (2006). [http://www.alta.asn.au/events/altw2006/proceedings/swan-final.pdf Using dependency-based features to take the &amp;quot;para-farce&amp;quot; out of paraphrase], &#039;&#039;Proceedings of the Australasian Language Technology Workshop (ALTW 2006)&#039;&#039;, pp. 131-138.&lt;br /&gt;
&lt;br /&gt;
Zia Ul-Qayyum and Wasif Altaf, (2012). [http://maxwellsci.com/print/rjaset/v4-4894-4904.pdf Paraphrase Identification using Semantic Heuristic Features], &#039;&#039;Research Journal of Applied Sciences, Engineering and Technology&#039;&#039;, 4(22): 4894-4904.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Please keep this list in alphabetical order --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:State of the art]]&lt;/div&gt;</summary>
		<author><name>Jmcejuela</name></author>
	</entry>
</feed>