Paraphrase Identification (State of the art): Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
| Line 11: | Line 11: | ||
* Class: 1 (true paraphrase) | * Class: 1 (true paraphrase) | ||
== Table of results == | |||
{| border="1" cellpadding="5" cellspacing="1" width="100%" | |||
|- | |||
! Algorithm | |||
! Reference | |||
! Type | |||
! Accuracy | |||
! F | |||
|- | |||
| MCS | |||
| Mihalcea et al. (2006) | |||
| combination of several word similarity measures | |||
| 70.3% | |||
| 81.3% | |||
|- | |||
|} | |||
== References == | == References == | ||
| Line 16: | Line 34: | ||
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: | Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: | ||
Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356. | Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356. | ||
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780. | |||
Revision as of 19:46, 24 March 2009
- Microsoft Research Paraphrase Corpus (MSRP)
- see Dolan, Quirk, and Brockett (2004)
- train: 4076 sentence pairs (2753 positive: 67.5%)
- test: 1725 sentence pairs (1147 positive: 66.5%)
Sample data
- Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
- Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
- Class: 1 (true paraphrase)
Table of results
| Algorithm | Reference | Type | Accuracy | F |
|---|---|---|---|---|
| MCS | Mihalcea et al. (2006) | combination of several word similarity measures | 70.3% | 81.3% |
References
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, Proceedings of the National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, pp. 775-780.