Regularizing Text Categorization with Clusters of Words

Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis
Ecole Polytechnique


Abstract

Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy.