<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Eyeh</id>
	<title>ACL Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Eyeh"/>
	<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/Special:Contributions/Eyeh"/>
	<updated>2026-05-28T21:05:07Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.6</generator>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art)&amp;diff=9115</id>
		<title>Paraphrase Identification (State of the art)</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art)&amp;diff=9115"/>
		<updated>2011-12-15T18:20:24Z</updated>

		<summary type="html">&lt;p&gt;Eyeh: Added entries for Finch et al. 2005, Socher et al. 2011&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* &#039;&#039;&#039;source&#039;&#039;&#039;: [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP)&lt;br /&gt;
* &#039;&#039;&#039;task&#039;&#039;&#039;: given a pair of sentences, classify them as paraphrases or not paraphrases&lt;br /&gt;
* &#039;&#039;&#039;see&#039;&#039;&#039;: Dolan et al. (2004)&lt;br /&gt;
* &#039;&#039;&#039;train&#039;&#039;&#039;: 4,076 sentence pairs (2,753 positive: 67.5%)&lt;br /&gt;
* &#039;&#039;&#039;test&#039;&#039;&#039;: 1,725 sentence pairs (1,147 positive: 66.5%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Sample data ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Sentence 1&#039;&#039;&#039;: Amrozi accused his brother, whom he called &amp;quot;the witness&amp;quot;, of deliberately distorting his evidence.&lt;br /&gt;
* &#039;&#039;&#039;Sentence 2&#039;&#039;&#039;: Referring to him as only &amp;quot;the witness&amp;quot;, Amrozi accused his brother of deliberately distorting his evidence.&lt;br /&gt;
* &#039;&#039;&#039;Class&#039;&#039;&#039;: 1 (true paraphrase)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Table of results ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;5&amp;quot; cellspacing=&amp;quot;1&amp;quot; width=&amp;quot;100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Algorithm&lt;br /&gt;
! Reference&lt;br /&gt;
! Description&lt;br /&gt;
! Accuracy&lt;br /&gt;
! F&lt;br /&gt;
|-&lt;br /&gt;
| FHS&lt;br /&gt;
| Finch et al. (2005)&lt;br /&gt;
| supervised combination of MT evaluation measures as features&lt;br /&gt;
| 75.0%&lt;br /&gt;
| 82.7%&lt;br /&gt;
|-&lt;br /&gt;
| KM&lt;br /&gt;
| Kozareva and Montoyo (2006)&lt;br /&gt;
| supervised combination of lexical and semantic features&lt;br /&gt;
| 76.6%&lt;br /&gt;
| 79.6%&lt;br /&gt;
|-&lt;br /&gt;
| RMLMG&lt;br /&gt;
| Rus et al. (2008)&lt;br /&gt;
| unsupervised graph subsumption&lt;br /&gt;
| 70.6%&lt;br /&gt;
| 80.5%&lt;br /&gt;
|-&lt;br /&gt;
| MCS&lt;br /&gt;
| Mihalcea et al. (2006)&lt;br /&gt;
| unsupervised combination of several word similarity measures&lt;br /&gt;
| 70.3%&lt;br /&gt;
| 81.3%&lt;br /&gt;
|-&lt;br /&gt;
| STS&lt;br /&gt;
| Islam and Inkpen (2007)&lt;br /&gt;
| unsupervised combination of semantic and string similarity&lt;br /&gt;
| 72.6%&lt;br /&gt;
| 81.3%&lt;br /&gt;
|-&lt;br /&gt;
| QKC&lt;br /&gt;
| Qiu et al. (2006)&lt;br /&gt;
| supervised sentence dissimilarity classification&lt;br /&gt;
| 72.0%&lt;br /&gt;
| 81.6%&lt;br /&gt;
|-&lt;br /&gt;
| matrixJcn&lt;br /&gt;
| Fernando and Stevenson (2008)&lt;br /&gt;
| unsupervised JCN WordNet similarity with matrix&lt;br /&gt;
| 74.1%&lt;br /&gt;
| 82.4%&lt;br /&gt;
|-&lt;br /&gt;
| SHPNM&lt;br /&gt;
| Socher et al. (2011)&lt;br /&gt;
| supervised recursive autoencoder with dynamic pooling&lt;br /&gt;
| 76.8%&lt;br /&gt;
| 83.6%&lt;br /&gt;
|-&lt;br /&gt;
| WDDP&lt;br /&gt;
| Wan et al. (2006)&lt;br /&gt;
| supervised dependency-based features&lt;br /&gt;
| 75.6%&lt;br /&gt;
| 83.0%&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], &#039;&#039;Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)&#039;&#039;, Geneva, Switzerland, pp. 350-356.&lt;br /&gt;
&lt;br /&gt;
Fernando, S., and Stevenson, M. (2008). [http://www.dcs.shef.ac.uk/~samf/clukPaper.pdf A semantic similarity approach to paraphrase detection], &#039;&#039;Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Finch, A., and H, Y.S., and Sumita, E. (2005). [http://aclweb.org/anthology/I/I05/I05-5003.pdf Using machine translation evaluation techniques to determine sentence-level semantic equivalence], &amp;quot;Proceedings of the Third International Workshop on Paraphrasing (IWP 2005)&amp;quot;, Jeju Island, South Korea, pp. 17-24.&lt;br /&gt;
&lt;br /&gt;
Islam, A., and Inkpen, D. (2007). [http://www.site.uottawa.ca/~diana/publications/ranlp_2007_textsim_camera_ready.pdf Semantic similarity of short texts], &#039;&#039;Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007)&#039;&#039;, Borovets, Bulgaria, pp. 291-297.&lt;br /&gt;
&lt;br /&gt;
Kozareva, Z., and Montoyo, A. (2006). [http://www.dlsi.ua.es/~zkozareva/papers/fintalKozareva.pdf Paraphrase identification on the basis of supervised machine learning techniques], &#039;&#039;Advances in Natural Language Processing: 5th International Conference on NLP (FinTAL 2006)&#039;&#039;, Turku, Finland, 524-533.&lt;br /&gt;
&lt;br /&gt;
Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], &#039;&#039;Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)&#039;&#039;, Boston, Massachusetts, pp. 775-780.&lt;br /&gt;
&lt;br /&gt;
Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). [http://acl.ldc.upenn.edu/W/W06/W06-1603.pdf Paraphrase recognition via dissimilarity significance classification], &#039;&#039;Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006)&#039;&#039;, pp. 18-26.&lt;br /&gt;
&lt;br /&gt;
Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). [http://csep.psyc.memphis.edu/McNamara/pdf/Paraphrase_Identification.pdf Paraphrase identification with lexico-syntactic graph subsumption], &#039;&#039;FLAIRS 2008&#039;&#039;, pp. 201-206.&lt;br /&gt;
&lt;br /&gt;
Socher, R. and Huang, E.H., and Pennington, J. and Ng, A.Y., and Manning, C.D. (2011). [http://www.socher.org/uploads/Main/SocherHuangPenningtonNgManning_NIPS2011.pdf Dynamic pooling and unfolding recursive autoencoders for paraphrase detection], &amp;quot;Advances in Neural Information Processing Systems 24&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wan, S., Dras, M., Dale, R., and Paris, C. (2006). [http://www.alta.asn.au/events/altw2006/proceedings/swan-final.pdf Using dependency-based features to take the &amp;quot;para-farce&amp;quot; out of paraphrase], &#039;&#039;Proceedings of the Australasian Language Technology Workshop (ALTW 2006)&#039;&#039;, pp. 131-138.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[State of the art]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:State of the art]]&lt;/div&gt;</summary>
		<author><name>Eyeh</name></author>
	</entry>
</feed>