Extending the coverage of a MWE database for Persian CPs exploiting valency alternations

Pollet Samvelian, Pegah Faghiri, Sarra El Ayari


Abstract
PersPred is a manually elaborated multilingual syntactic and semantic Lexicon for Persian Complex Predicates (CPs), referred to also as “Light Verb Constructions” (LVCs) or “Compound Verbs”. CPs constitutes the regular and the most common way of expressing verbal concepts in Persian, which has only around 200 simplex verbs. CPs can be defined as multi-word sequences formed by a verb and a non-verbal element and functioning in many respects as a simplex verb. Bonami & Samvelain (2010) and Samvelian & Faghiri (to appear) extendedly argue that Persian CPs are MWEs and consequently must be listed. The first delivery of PersPred, contains more than 600 combinations of the verb zadan ‘hit’ with a noun, presented in a spreadsheet. In this paper we present a semi-automatic method used to extend the coverage of PersPred 1.0, which relies on the syntactic information on valency alternations already encoded in the database. Given the importance of CPs in the verbal lexicon of Persian and the fact that lexical resources cruelly lack for Persian, this method can be further used to achieve our goal of making PersPred an appropriate resource for NLP applications.
Anthology ID:
L14-1679
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4023–4026
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/883_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Pollet Samvelian, Pegah Faghiri, and Sarra El Ayari. 2014. Extending the coverage of a MWE database for Persian CPs exploiting valency alternations. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4023–4026, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Extending the coverage of a MWE database for Persian CPs exploiting valency alternations (Samvelian et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/883_Paper.pdf