On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification

Juan Soler-Company, Leo Wanner


Abstract
The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.
Anthology ID:
E17-2108
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
681–687
Language:
URL:
https://aclanthology.org/E17-2108
DOI:
Bibkey:
Cite (ACL):
Juan Soler-Company and Leo Wanner. 2017. On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 681–687, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification (Soler-Company & Wanner, EACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/E17-2108.pdf