Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Ekaterina Kochmar, Sian Gooding, Matthew Shardlow


Abstract
Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.
Anthology ID:
2020.lrec-1.545
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4426–4435
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.545
DOI:
Bibkey:
Cite (ACL):
Ekaterina Kochmar, Sian Gooding, and Matthew Shardlow. 2020. Detecting Multiword Expression Type Helps Lexical Complexity Assessment. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4426–4435, Marseille, France. European Language Resources Association.
Cite (Informal):
Detecting Multiword Expression Type Helps Lexical Complexity Assessment (Kochmar et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.545.pdf
Code
 ekochmar/MWE-CWI
Data
MWE-CWI