DA-LD-Hildesheim at SemEval-2019 Task 6: Tracking Offensive Content with Deep Learning using Shallow Representation

Sandip Modha, Prasenjit Majumder, Daksh Patel


Abstract
This paper presents the participation of team DA-LD-Hildesheim of Information Retrieval and Language Processing lab at DA-IICT, India in Semeval-19 OffenEval track. The aim of this shared task is to identify offensive content at fined-grained level granularity. The task is divided into three sub-tasks. The system is required to check whether social media posts contain any offensive or profane content or not, targeted or untargeted towards any entity and classifying targeted posts into the individual, group or other categories. Social media posts suffer from data sparsity problem, Therefore, the distributed word representation technique is chosen over the Bag-of-Words for the text representation. Since limited labeled data was available for the training, pre-trained word vectors are used and fine-tuned on this classification task. Various deep learning models based on LSTM, Bidirectional LSTM, CNN, and Stacked CNN are used for the classification. It has been observed that labeled data was highly affected with class imbalance and our technique to handle the class-balance was not effective, in fact performance was degraded in some of the runs. Macro F1 score is used as a primary evaluation metric for the performance. Our System achieves Macro F1 score = 0.7833 in sub-task A, 0.6456 in the sub-task B and 0.5533 in the sub-task C.
Anthology ID:
S19-2103
Volume:
Proceedings of the 13th International Workshop on Semantic Evaluation
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Editors:
Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki, Saif M. Mohammad
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
577–581
Language:
URL:
https://aclanthology.org/S19-2103
DOI:
10.18653/v1/S19-2103
Bibkey:
Cite (ACL):
Sandip Modha, Prasenjit Majumder, and Daksh Patel. 2019. DA-LD-Hildesheim at SemEval-2019 Task 6: Tracking Offensive Content with Deep Learning using Shallow Representation. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 577–581, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
Cite (Informal):
DA-LD-Hildesheim at SemEval-2019 Task 6: Tracking Offensive Content with Deep Learning using Shallow Representation (Modha et al., SemEval 2019)
Copy Citation:
PDF:
https://aclanthology.org/S19-2103.pdf
Data
Hate Speech and Offensive Language