The Role of Emotions in Native Language Identification

We explore the hypothesis that emotion is one of the dimensions of language that surfaces from the native language into a second language. To check the role of emotions in native language identification (NLI), we model emotion information through polarity and emotion load features, and use document representations using these features to classify the native language of the author. The results indicate that emotion is relevant for NLI, even for high proficiency levels and across topics.


Introduction
Native Language Identification (NLI) is the task of identifying the native language (L1) of a person based on his/her writing in the second language (L2). NLI can inform security, marketing and educational applications by tuning pedagogical materials to L1s, and for this it is important to understand the phenomena that get transfered from L1 to L2 (native language interference). Emotion is one of these. Linguistics research (Dewaele, 2010) has focused on the way emotions are encoded in different text types and in different languages. How to express emotion appropriately is related to the origin of the speaker (country, region), situational context in which social norms might be different (formal vs. informal setting), interlocutors (age, gender, social distance), topic.
As emotions are psychological constructions of cultural meaning, there may be a misfit between emotions and social context when individuals change cultural contexts or live two cultural models (Leersnyder et al., 2011). The use of emotions is considered both culture-and languagespecific (Wierzbicka, 1994(Wierzbicka, , 1999. We hypothesize that this leads to different emotion signals in writings in a second language, by authors with different native languages. We test this hypothesis through multi-class classification of the L1 of the authors of essays written in L2 in different experimental set-ups that take into account proficiency levels and topics of the written essays. We encode emotion information using polarity and sentiment information from the NRC Word-Emotion Association Lexicon (NRC emotion lexicon) (Mohammad and Turney, 2013), taking into account not only the finegrained (word-level) emotion information, but also general aspects of the written material (overall high-or low-emotion load). The results show that emotional information contributes to detecting the native language of the speaker.

Related Work
Caldwell-Harris (2014) shows that emotion usage depends on the language by focusing on differences in emotion usage in L1 and L2. The author states that there is a correlation between the usage of emotions and proficiency levels and the age a language is acquired.
While emotion-based features have been used in other NLP tasks, such as sentiment analysis (Sidorov et al., 2013), classification of documents into the corresponding emotion category (Wen and Wan, 2014), deception detection (Newman et al., 2003), among others, they are an underexplored area of second language writing. Torney et al. (2012) use psycholinguistic features extracted by the Linguistic Inquiry and Word Count (LIWC) tool (Pennebaker et al., 2007) to identify the first language of an author, where emotion-based features are included as part of the feature vector, e.g., percentage of positive/negative emotion words. The LIWC feature set used in the paper also contains other types of features, e.g., personal concern categories (work, leisure), paralinguistic dimensions (assents, fillers, nonfluencies), which obscure the contribution of the actual emotion features. Rangel and Rosso (2013;2016) investigate and confirm the hypothesis that the use of emotions depends on author's age and gender. The authors used a graph-based approach, where each node and edge were represented by the corresponding part-of-speech (POS) tag, then the representation was enriched with semantic information, emoticons, and with emotion information, which included polarity of words (polarity of common nouns, adjectives, adverbs or verbs in a sentiment lexicon) and emotionally charged words (replacing common nouns, adjectives, adverbs or verbs with the emotion information from the Spanish Emotion Lexicon (Sidorov et al., 2013)). The representation combining all the features described above was used with a SVM classifier. Rangel and Rosso (2013;2016) suggest that there are commonalities in the use of emotions across author age and gender. We examine the hypothesis that there are commonalities in the use of emotions in L2 across different L1s, suggested by the linguistic and psycholinguistic studies (Leersnyder et al., 2011;Wierzbicka, 1999). We test this by evaluating the impact of emotion-based features on classifying the L1 of the authors of essays written in L2.

Emotion features for NLI
The best performing features for NLI are word and character n-grams (Jarvis et al., 2013). They cover -and obscure -a wide range of phenomena, because language usage has multiple dimensions that can reveal information such as age, gender, cultural influences. In this study, we investigate the impact of words that have an emotion signal, since studies have shown that emotion is culture specific (Wierzbicka, 1994(Wierzbicka, , 1999, and thus could be indicative of the native language of a speaker.

Datasets
We conduct experiments on two datasets commonly used in NLI research: Turkish (TUR). The essays were written in response to eight different writing prompts, all of which appear in all 11 L1 groups. The dataset contains information regarding the proficiency level (low, medium, high) of the authors.

ICLE
( Granger et al., 2009): the ICLEv2 dataset consists of essays written by highlyproficient non-native college-level students of English. We used a 7-language subset of the corpus normalized for topic and character encoding (Tetreault et al., 2012;Ionescu et al., 2014) to which we refer as ICLE. This subset contains 110 essays (avg. 747 tokens/essay after tokenization and removal of metadata) for each of the 7 languages: Bulgarian (BUL), Chinese (CHI), Czech (CZE), French (FRE), Japanese (JPN), Russian (RUS), and Spanish (SPA).

Experiment setup
We used the (pre-)tokenized version of TOEFL11 and tokenized ICLE with the Natural Language Toolkit (NLTK) 1 tokenizer. ICLE metadata was removed in pre-processing. Each essay was represented through the sets of features described below, using term frequency (tf) and the liblinear scikit-learn (Pedregosa et al., 2011) implementation of Support Vector Machines (SVM) with OvR (one vs. the rest) multi-class strategy. We report classification accuracy on 10-fold cross-validation experiments.

Part-of-speech tags and function words
POS tag n-grams and function words (FWs) are considered core features in NLI research (Malmasi and Dras, 2015), not susceptible to topic bias, unlike word and character n-grams (Brooke and Hirst, 2011).
POS n-grams, n=1..3 POS features capture the morpho-syntactic patterns in a text, and are indicative of the L1, especially when used in combination with other types of features (Cimino and Dell'Orletta, 2017;Markov et al., 2017). POS tags were obtained with TreeTagger (Schmid, 1999), which uses the Penn Treebank tagset (36 tags).
Function words (FWs) n-grams, n=1..3 Function words clarify the relationships between the content-carrying elements of a sentence, and introduce syntactic structures like verbal complements, relative clauses, and questions (Smith and Witten, 1993). They are considered one of the most important stylometric features (Kestemont, 2014). The FW feature set consists of 318 English FWs from the scikit-learn package (Pedregosa et al., 2011). With respect to emotion features, FWs can appear as quantifiers, intensifiers (e.g., very good) or modify the emotion expressed in other ways.

Emotion words
We use the 14,182 emotion words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive) from the NRC emotion lexicon (Mohammad and Turney, 2013 Before committing to analyzing emotion features, we want to test whether emotion-loaded words have any impact on the NLI task. The bagof-words (BoW) representation covers a variety of phenomena, without distinguishing them and giving us insight into their individual impact on the task. We represent our data using BoW variations -including and excluding words that have an emotional dimension. To verify that the effect in classification is not just due to a smaller feature set, we match the BoW size by removing a selection of random words. Table 2 presents the 10-fold cross-validation results (accuracy, %) on the TOEFL11 and ICLE datasets, when using emotion words and random words of such that the BoW representations have the same size, as well as the results when excluding emotion words and the random words. 2 The results in Table 2 show that emotion words have higher impact on classification accuracy than random words when evaluated in isolation. Moreover, the accuracy drop is higher when excluding 2 Random words accuracy was calculated as average over five experiments with five different sets of random words.  emotion words from the BoW approach than when excluding random words, confirming that emotion is a useful dimension for L1 classification, and not just an effect of having additional features.

Emotion features
Having confirmed that due to cultural identity and linguistic habits of an author's native language, we can distinguish the L1 of the author of an essay, we proceed with a deeper analysis, for which we build two types of emotion features.
Emotion polarity features (emoP) In the NRC emotion lexicon, binary associations are provided for each emotion word for 8 emotions (anger, fear, anticipation, trust, surprise, sadness, joy, or disgust) and two sentiments (negative or positive)e.g., good = "0100101011". This representation is used as a categorial feature (not a 10-dimensional binary vector). It performed best compared to other ways of encoding the emotion information we tried, e.g., using a 10-dimensional binary vector or excluding the sentiment information.
Emotion load features (emoL) Speakers of different L1s may use a higher or lower number of emotionally charged words than speakers of other L1s, reflecting cultural customs or linguistic habits of the respective cultures. We modeled this information using three types of emotion load features: (i) two binary features, emoL (binary) that capture whether an essay has a high or low emotional load: (a) we compute the average ratio of emotion words in all essays in each dataset: for TOEFL11 this was 0.236 and for ICLE 0.246; (b) if the ratio of emotion words in an essay was higher/lower than the average, assigned it a "highly-emotional"/"lowemotional" feature. We used this representation to examine whether the polarity as such is informative. We also used more fine-grained emoL features: (ii) the ratio of the emotion words in each essay as a numeric feature (1 feature, emoL (1)), and (iii) the ratio of each emotion/sentiment in each essay (10 numeric features: 8 emotions and 2 sentiments, emoL (10)). Overall, three different types of emoL features are examined.

Results and Discussion
Following previous studies on NLI (Markov et al., 2018) and author profiling (Rangel and Rosso, 2016), we provide the results when adding emotion-based features to POS tag feature set. We also experiment with POS and FW feature sets similarly to, e.g., (Malmasi and Dras, 2015).
The 10-fold cross-validation results in terms of accuracy (%) on the TOEFL11 and ICLE datasets for POS and POS & FW n-gram (n = 1-3) representations are shown in Tables 3 and 4, respectively. The number of features (No.) is included. Statistically significant gains/drops according to McNemar's statistical significance test (McNemar, 1947) with α < 0.05 are marked with '*'.
The experimental results show that emotion features, in particular the emoP features, significantly contribute to the results for all the considered settings, indicating that different cultures (as defined by the authors' L1) have different emotion word usage. It is very interesting to note that despite being very general, the three types of emoL features -13 features that characterize the emotional load of a document -also improve the results in the majority of settings, including when combined with the emoP features. This supports the hypothesis that some cultures use a bigger or smaller emotional vocabulary. More fine grained emotional load features could improve the results further.
To explore whether emotion usage depends on specific topics, we conducted experiments for the topics in the TOEFL11 dataset (Table 5). 3 The improvement brought by the emotion-based features does seem to depend on the topic, as some topics more naturally elicit emotional reactions. The highest improvements were achieved for P5 (car usage) and P7 (young vs. old people comparison). When combined with the POS & FW representation, emotion-based features are less helpful (not statistically significant improvements) for the topics discussing traveling (P1), ideas vs. facts (P3), and education (P4). Overall, adding emotionbased features to POS and POS & FW representations leads to accuracy improvement for all the topics present in the dataset.
The ability to choose the proper words to express oneself increases with the proficiency level. From this perspective, identifying the L1 of authors of essays in L2 using emotion words information should be performed with better results. On the other hand, we expect other linguistic characteristics to become closer to a native L2 speaker, and thus make identifying L1 harder. We experiment with L1 classification separating the data based on the three different proficiency levels in TOEFL11. The results are included in Table 6. With respect to the emotion features, medium and high proficiency levels have a much better performance. As postulated above, this could be explained by the different ability of the L1 speakers to choose the words that express closely the message and nuances they wish to convey.

Conclusions
We investigated the hypothesis that the use of emotions is indicative of an author's native language. We used two types of emotion-based features -one that captures the types of sentiments expressed, the other captures the frequency of emotion words in documents. We expected these features to capture cultural characteristics and linguistic habits from the authors' L1. The fact that adding these features to POS and function word n-grams leads to improvements in predicting a text's author's native language leads us to conclude that emotion characteristics from a native language are "imported" into the production of L2.
The overall goal of this paper was to understand the influence of various facets of L1 speakers' language and culture on their acquisition (and production) of L2. These influences from L1 are not under the author's conscious control, and it is very interesting to understand their nature. Emotion is one of these. The fact that we explore the use of emotions on learner corpora ("controlled environment"), with a specific task and specific requirement -and a (implied, not specifically requested) more neutral style -should probably lower the effect of emotional influences from the L1 and its culture. From that point of view, it is even more remarkable that such an effect is detected.     Table 6: 10-fold cross-validation accuracy for each proficiency level. '*' marks statistically significant differences.