Broad Linguistic Complexity Analysis for Greek Readability Classification

Savvas Chatzipanagiotidis, Maria Giagkou, Detmar Meurers


Abstract
This paper explores the linguistic complexity of Greek textbooks as a readability classification task. We analyze textbook corpora for different school subjects and textbooks for Greek as a Second Language, covering a very wide spectrum of school age groups and proficiency levels. A broad range of quantifiable linguistic complexity features (lexical, morphological and syntactic) are extracted and calculated. Conducting experiments with different feature subsets, we show that the different linguistic dimensions contribute orthogonal information, each contributing towards the highest result achieved using all linguistic feature subsets. A readability classifier trained on this basis reaches a classification accuracy of 88.16% for the Greek as a Second Language corpus. To investigate the generalizability of the classification models, we also perform cross-corpus evaluations. We show that the model trained on the most varied text collection (for Greek as a school subject) generalizes best. In addition to advancing the state of the art for Greek readability analysis, the paper also contributes insights on the role of different feature sets and training setups for generalizable readability classification.
Anthology ID:
2021.bea-1.5
Volume:
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
April
Year:
2021
Address:
Online
Editors:
Jill Burstein, Andrea Horbach, Ekaterina Kochmar, Ronja Laarmann-Quante, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–58
Language:
URL:
https://aclanthology.org/2021.bea-1.5
DOI:
Bibkey:
Cite (ACL):
Savvas Chatzipanagiotidis, Maria Giagkou, and Detmar Meurers. 2021. Broad Linguistic Complexity Analysis for Greek Readability Classification. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 48–58, Online. Association for Computational Linguistics.
Cite (Informal):
Broad Linguistic Complexity Analysis for Greek Readability Classification (Chatzipanagiotidis et al., BEA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bea-1.5.pdf