Deceptive Review Spam Detection via Exploiting Task Relatedness and Unlabeled Data

Zhen Hai1, Peilin Zhao1, Peng Cheng2, Peng Yang1, Xiao-Li Li1, Guangxia Li3
1Data Analytics Department, Institute for Infocomm Research (I2R), A*STAR, Singapore, 2School of Computer Science and Engineering, Nanyang Technological University, Singapore, 3School of Computer Science and Technology, Xidian University, China


Abstract

Existing work on detecting deceptive reviews primarily focuses on feature engineering and applies off-the-shelf supervised classification algorithms to the problem. Then, one real challenge would be to manually recognize plentiful ground truth spam review data for model building, which is rather difficult and often requires domain expertise in practice. In this paper, we propose to exploit the relatedness of multiple review spam detection tasks and readily available unlabeled data to address the scarcity of labeled opinion spam data. We first develop a multi-task learning method based on logistic regression (MTL-LR), which can boost the learning for a task by sharing the knowledge contained in the training signals of other related tasks. To leverage the unlabeled data, we introduce a graph Laplacian regularizer into each base model. We then propose a novel semi-supervised multi-task learning method via Laplacian regularized logistic regression (SMTL-LLR) to further improve the review spam detection performance. We also develop a stochastic alternating method to cope with the optimization for SMTL-LLR. Experimental results on real-world review data demonstrate the benefit of SMTL-LLR over several well-established baseline methods.