Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

Koustav Rudra1, Shruti Rijhwani2, Rafiya Begum3, Kalika Bali3, Monojit Choudhury4, Niloy Ganguly1
1Indian Institute of Technology, Kharagpur, 2Carnegie Mellon University, 3Microsoft Research India, 4Microsoft Research


Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment. Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, specifically Hindi-English bilinguals, to understand the language of preference, if any, for expressing opinion and sentiment. To this end, we develop classifiers for opinion detection in these languages, and further classifying opinionated tweets into positive, negative and neutral sentiments. Our study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing. As an aside, we explore some common pragmatic functions of code-switching through sentiment detection.