Sentiment Classification Using Document Embeddings Trained with Cosine Similarity

Tan Thongtan, Tanasanee Phienthrakul


Abstract
In document-level sentiment classification, each document must be mapped to a fixed length vector. Document embedding models map each document to a dense, low-dimensional vector in continuous vector space. This paper proposes training document embeddings using cosine similarity instead of dot product. Experiments on the IMDB dataset show that accuracy is improved when using cosine similarity compared to using dot product, while using feature combination with Naive Bayes weighted bag of n-grams achieves a competitive accuracy of 93.68%. Code to reproduce all experiments is available at https://github.com/tanthongtan/dv-cosine.
Anthology ID:
P19-2057
Original:
P19-2057v1
Version 2:
P19-2057v2
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Fernando Alva-Manchego, Eunsol Choi, Daniel Khashabi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
407–414
Language:
URL:
https://aclanthology.org/P19-2057
DOI:
10.18653/v1/P19-2057
Bibkey:
Cite (ACL):
Tan Thongtan and Tanasanee Phienthrakul. 2019. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 407–414, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Sentiment Classification Using Document Embeddings Trained with Cosine Similarity (Thongtan & Phienthrakul, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-2057.pdf
Code
 tanthongtan/dv-cosine +  additional community code
Data
IMDb Movie Reviews