Gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data

Sunil Gundapu, Radhika Mamidi


Abstract
The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis of Hindi-English code-mixed social media text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.
Anthology ID:
2020.semeval-1.166
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1247–1252
Language:
URL:
https://aclanthology.org/2020.semeval-1.166
DOI:
10.18653/v1/2020.semeval-1.166
Bibkey:
Cite (ACL):
Sunil Gundapu and Radhika Mamidi. 2020. Gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1247–1252, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
Gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data (Gundapu & Mamidi, SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.166.pdf
Data
SentiMix