SimplifyUR: Unsupervised Lexical Text Simplification for Urdu

Namoos Hayat Qasmi, Haris Bin Zia, Awais Athar, Agha Ali Raza


Abstract
This paper presents the first attempt at Automatic Text Simplification (ATS) for Urdu, the language of 170 million people worldwide. Being a low-resource language in terms of standard linguistic resources, recent text simplification approaches that rely on manually crafted simplified corpora or lexicons such as WordNet are not applicable to Urdu. Urdu is a morphologically rich language that requires unique considerations such as proper handling of inflectional case and honorifics. We present an unsupervised method for lexical simplification of complex Urdu text. Our method only requires plain Urdu text and makes use of word embeddings together with a set of morphological features to generate simplifications. Our system achieves a BLEU score of 80.15 and SARI score of 42.02 upon automatic evaluation on manually crafted simplified corpora. We also report results for human evaluations for correctness, grammaticality, meaning-preservation and simplicity of the output. Our code and corpus are publicly available to make our results reproducible.
Anthology ID:
2020.lrec-1.428
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3484–3489
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.428
DOI:
Bibkey:
Cite (ACL):
Namoos Hayat Qasmi, Haris Bin Zia, Awais Athar, and Agha Ali Raza. 2020. SimplifyUR: Unsupervised Lexical Text Simplification for Urdu. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3484–3489, Marseille, France. European Language Resources Association.
Cite (Informal):
SimplifyUR: Unsupervised Lexical Text Simplification for Urdu (Qasmi et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.428.pdf