UPPC - Urdu Paraphrase Plagiarism Corpus

Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab


Abstract
Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.
Anthology ID:
L16-1289
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1832–1836
Language:
URL:
https://aclanthology.org/L16-1289
DOI:
Bibkey:
Cite (ACL):
Muhammad Sharjeel, Paul Rayson, and Rao Muhammad Adeel Nawab. 2016. UPPC - Urdu Paraphrase Plagiarism Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1832–1836, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
UPPC - Urdu Paraphrase Plagiarism Corpus (Sharjeel et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1289.pdf