Private or Corporate? Predicting User Types on Twitter

Nikola Ljubešić, Darja Fišer


Abstract
In this paper we present a series of experiments on discriminating between private and corporate accounts on Twitter. We define features based on Twitter metadata, morphosyntactic tags and surface forms, showing that the simple bag-of-words model achieves single best results that can, however, be improved by building a weighted soft ensemble of classifiers based on each feature type. Investigating the time and language dependence of each feature type delivers quite unexpecting results showing that features based on metadata are neither time- nor language-insensitive as the way the two user groups use the social network varies heavily through time and space.
Anthology ID:
W16-3904
Volume:
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Bo Han, Alan Ritter, Leon Derczynski, Wei Xu, Tim Baldwin
Venue:
WNUT
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
4–12
Language:
URL:
https://aclanthology.org/W16-3904
DOI:
Bibkey:
Cite (ACL):
Nikola Ljubešić and Darja Fišer. 2016. Private or Corporate? Predicting User Types on Twitter. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pages 4–12, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Private or Corporate? Predicting User Types on Twitter (Ljubešić & Fišer, WNUT 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-3904.pdf