Paraphrase Identification (State of the art)

From ACL Wiki
Revision as of 13:31, 24 March 2009 by Pdturney (talk | contribs) (New page: * [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP) * see Dolan, Quirk, and Brockett (2004) * tr...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
  • Microsoft Research Paraphrase Corpus (MSRP)
  • see Dolan, Quirk, and Brockett (2004)
  • train: 4076 sentence pairs (2753 positive: 67.5%)
  • test: 1725 sentence pairs (1147 positive: 66.5%)


Sample data

  • Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
  • Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
  • Class: 1 (true paraphrase)


References

Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.


See also