A semantic-affective compositional approach for the affective labelling of adjective-noun and noun-noun pairs

Motivated by recent advances in the area of Compo-sitional Distributional Semantic Models (CDSMs), we propose a compositional approach for estimating continuous affective ratings for adjective-noun (AN) and noun-noun (NN) pairs. The ratings are computed for the three basic dimensions of continuous affective spaces, namely, valence, arousal and dominance. We propose that similarly to the semantic modiﬁcation that underlies CDSMs, affective mod-iﬁcation may occur within the framework of affective spaces, especially when the constituent words of the linguistic structures under investigation form modiﬁer-head pairs (e.g., AN and NN). The affective content of the entire structure is determined from the interaction between the respective constituents, i.e., the affect conveyed by the head is altered by the modiﬁer. In addition, we investigate the fusion of the proposed model with the semantic-affective model proposed in (Malandrakis et al., 2013) applied both at word-and phrase-level. The automatically computed affective ratings were evaluated against human ratings in terms of correlation. The most accurate estimates are achieved via fusion and absolute performance improvement up to 5% and 4% is reported for NN and AN, respectively.


Introduction
Affective analysis of text aims at eliciting emotion from linguistic information and it can be relevant for a wide range of applications such as sentiment analysis (Pang and Lee, 2008;Rosenthal et al., 2014;Rosenthal et al., 2015), news headlines analysis (Strapparava and Mihalcea, 2007) or affective analysis of social media (Quercia et al., 2011;Celli, 2012;Rosenthal et al., 2014;Rosenthal et al., 2015).
Word-level affective lexica can be created automatically with high accuracy (Turney and Littman, 2002;Strapparava and Valitutti, 2004;Esuli and Sebastiani, 2006;Malandrakis et al., 2013;Palogiannidi et al., 2015). Affective lexica are required in hierarchical models that combine words' affective ratings for estimating affective ratings of larger lexical units, e.g., phrases (Turney and Littman, 2002;Wilson et al., 2005), sentences (Malandrakis et al., 2013;Rosenthal et al., 2014;Rosenthal et al., 2015) and whole documents (Pang et al., 2002;Pang and Lee, 2008). A sentiment classification approach on movies review documents was proposed in (Pang et al., 2002). Word-level semantic representations constitute the core aspect of DSMs typically constructed from cooccurrence statistics of word tuples. Such representations are the building block for models of larger lexical units, e.g., phrases and sentences, following the principle of semantic compositionality (Pelletier, 1994). They are meant to address a number of properties that are relevant to the compositional aspects of meaning, namely, "linguistic creativity", "order sensitivity", "adaptive capacity", and "information scalability" (Turney, 2012). Compositional approaches have been reported for the estimation of compositional structures semantic similarity (Mitchell and Lapata., 2008;Mitchell and Lapata, 2010;Baroni and Zamparelli., 2010;Georgiladakis et al., 2015). A combination of a symbolic and a distributional compositionality approach was investigated by (Clark and Pulman, 2007), while an approach for compositional neural networks was presented in (Hammer, 2003). A recursive neural network model that learns compositional vector representations for phrases and sentences at any length and syntactic type was proposed by (Socher et al., 2012) and showed that it can be used for the prediction of sentiment as well.
In this work, we propose a compositional semanticaffective model, that is applicable to continuous affective spaces with one or more dimensions. This model is applied for the affective estimation of AN and NN pairs. The semantic-affective models are motivated by the assumption that "semantic similarity implies affective sim-ilarity", while the proposed compositional model is motivated by the CDSM proposed by (Baroni and Zamparelli., 2010), that focuses on adjective-noun composition and represents adjectives as functions and nouns as vectors. The affective ratings estimated by the compositional model are compared and combined with the ones obtained by the non-compositional semantic-affective models. Similar fusion schemes to (Georgiladakis et al., 2015) that aim to capture the compositionality degree of each word pair are investigated.
2 Semantic-Affective model Semantic-affective models are employed in order to estimate affective ratings of AN and NN pairs. We focus on three continuous affective dimensions: valence (positive vs. negative), arousal (calm vs. activated) and dominance (controlled vs. controller). Affective ratings of each dimension are estimated by the semantic-affective model which was proposed in (Malandrakis et al., 2013). This model, that is an expansion of (Turney and Littman, 2002), is defined in (1) and it is applicable both to words as well as n-gram tokens. The assumption of (1) is that the affective rating of a lexical token can be estimated by the linear combination of its semantic similarities to a set of seeds weighted with the seeds' affective ratings and trainable weights: where t j is the unknown lexical token, w 1..N are the seed words, υ(w i ), α i are the affective rating and the weight corresponding to the word w i and S(·) is the semantic similarity between two tokens. S(·) is implemented within the distributional semantic models framework, following the assumption that tokens that occur in similar context tend to be semantically related (Harris, 1954). In this framework, each token is represented by a contextual feature vector that is formulated by the words that occur in a given distance from the current token in a corpus. The elements of the vectors are set according to a mutual information scheme as shown in (Palogiannidi et al., 2015). Then, the semantic similarity between two tokens is computed as the cosine of their feature vectors. The Affective Norms for English Words (Bradley and Lang, 1999) manually annotated affective lexicon was used for the selection of the seed words and Least Squares Estimation (LSE) was used for learning the weights α i .

Unigram and Bigram Affective Models
The unigram affective model (U ) is based on the assumption that the two words that constitute the word pair contribute equally to its affective content and it is defined as where p is the word pair, andυ(w 1 ) andυ(w 2 ) are the affective ratings that are estimated using (1). The bigram semantic-affective model (B) handles each word pair as a single token, i.e.,υ B (p) =υ(w 1 w 2 ), where p is the word pair andυ B (p) is the affective rating of the word pair that it has been estimated using (1) for the bigram lexical token t = w 1 w 2 .

Compositional Affective Models
Word semantics have been represented efficiently with vector space models, as shown in (Androutsopoulos and Malakasiotis, 2010;Malandrakis et al., 2013), however representing the semantics of lexical structures larger than words is not trivial (Baroni and Zamparelli., 2010). The reason is that the meaning of complex structures derives from various compositional phenomena (Pelletier, 1994). Semantic compositionality allows the construction of complex meanings from simpler elements based on the principle that the meaning of a whole is a function of the meaning of the parts (Partee, 1995). The key characteristic of compositionality is that the meanings of the constituent parts are combined into a single token (Mitchell and Lapata., 2008;Mitchell and Lapata, 2010). Compositional approaches in vector-based semantics can be modelled by applying a function f that acts on two constituents a, b in order to produce the compositional meaning p. Functions that were investigated by (Mitchell and Lapata., 2008;Mitchell and Lapata, 2010) are addition and multiplication. The additive compositional model takes the sum of the two vectors weighted with the appropriate weight matrices A and B respectively (p = Aa+Bb) and the multiplicative model is the projection of the ab tensor product using a weight tensor C (p = Cab). These composition forms can be also simplified using the additive model with scalars instead of matrices. Similarly the multiplicative approach can be reduced to component-wise multiplication. Motivated by compositionality modeling (Baroni and Zamparelli., 2010) proposed an approach that focuses on adjectivenoun composition according to which nouns are represented as vectors and adjective as functions.
In this work we assume that composition occurs in the affective rather than the semantic space and thus, we combine the affective ratings instead of the semantic representation of a phrase's constituent words. A compositional model that is applicable on continuous affective spaces with one or three dimensions is proposed. Results are shown for valence, however the proposed models are applicable to the rest affective dimensions as well 1 .

Proposed compositional approach
In this paper, we focus on word pairs, i.e., the modifier (first word) and the head (second word). A word pair (also referred as test pair) p is defined as: p = m.h, where m is the modifier and h is the head. The assumption of our affective compositional model is that the modifier modifies the head's affective rating in order to estimate the word pair's affective rating. Modifier's impact is defined by a weight coefficients matrix or weight scalars depending on the dimensions of the compositional model. The weight coefficients are learned in a distributional approach, according to which K pairs are extracted from a large corpus and serve as the training set of each modifier. The training pairs contain the same modifier with the current test pair and different head, i.e., p = m. * , where p is the training word pair, m is the modifier and * indicates any head except from the test pair's p head. For each test pair p, K training pairs p are extracted.
The semantic-affective model of (1) is then employed to estimate the affective ratings of the training heads and the training pairs. The estimated affective ratings are incorporated in an LSE formulation where the ratings of the training pairs formulate the dependent variable and the ratings of the training heads formulate the independent variable. The proposed compositional approach is depicted in Figure 1. Training is performed separately for each modifier and the impact of each modifier is estimated through supervised learning, e.g., LSE. Finally, the continuous affective ratings of the word pair are estimated via an additive model. This model takes directly into consideration only the head, while the modifier's impact is captured by the weight matrix. The general compositional model is shown below: whereυ c (p) is the compositional affective rating of the word pair p, β is the bias vector, W is the coefficients matrix andυ(h) is the affective rating of the head, estimated via (1). The compositional model may not always be the most appropriate for the estimation of a word pair's affective ratings. Thus, in order to measure the compositionality degree of each word pair the Mean Squared Error (MSE) of the model is measured. Specifically, the distance between the compositional affective ratings and the affective ratings is estimated via (1) and averaged over all training pairs: where M SE(p) is the MSE estimated for each pair p, K is the number of the training pairs p ,υ(p j ) is the affective rating of the training word pair p j estimated using (1) for bigram tokens andυ c (p j ) is the corresponding affective rating estimated using the compositional model described in (4). 3D compositional model (com3D): Here we assume that three affective dimensions (valence, arousal, dominance) contribute to the affective content of the word pairs. The 3D compositional model is a special case of the general compositional model of (4), where W ∈ IR 3×3 and contains the weight coefficients for the three affective dimensions and the bias vector β ∈ IR 3×1 and contains the bias of each affective dimension. The coefficient matrix W and the bias vector β are estimated using LSE. Specifically, for each test pair p with K training pairs p K linear equations with 4 unknown variables (3 for the affective dimensions and 1 for the bias) have to be solved.
The linear system is shown in (3). The columns of the weight matrix correspond to valence (w 1 * ), arousal (w 2 * ) and dominance (w 3 * ), while the first row is the bias. Witĥ v(·),â(·),d(·) we denote the valence, arousal and dominance affective ratings respectively. 1D compositional model (com1D): The 1D compositional model is a special case of the 3D compositional model and follows the assumption that each affective dimension is independent of the other affective dimensions.
The dimensionality of the model is reduced and thus the behaviour of the modifier, i.e., the coefficient matrix W and the bias vector β are substituted by two scalars. The general compositional model of (4) is transformed to the 1D compositional model using scalar variables. These two scalar coefficients are estimated similarly to the 3D model.

Fusion of affective models
The same word may have different contribution to the affective content of a phrase depending on the context. For example, when the word "dog" appears in the word pair "happy dog" the conveyed affect is positive, which it is reversed when it appears in the word pair "dead dog". The accuracy of the proposed affective models depend on the compositionality degree that characterize the pairs of interest. By combining different models we aim to achieve more accurate affective scores. The first fusion scheme combines the affective ratings estimated by the compositional and semantic-affective models. The underlying assumption is that all models contribute equally to the affective meaning of a word pair, that is a word pair exhibits both compositional and noncompositional aspects. This scheme is based on the averaging of affective ratings defined as follows: where Φ avg (p) is the fused affective rating of the word pair p, M denotes the number of fused models andυ i (p) stands for the estimated affective rating of word pair p, i.e.,υ 1 (p) =υ U (p),υ 2 (p) =υ B (p),υ 3 (p) =υ c1 (p),υ 4 (p) =υ c3 (p). A weighted variant of (6) was also investigated, as follows: where Φ w avg (p) is the fused affective rating of the word pair p,υ ( p) stands for the affective rating of word pair p estimated by the affective models, as explained in (6) and w i are weights estimated via linear regression. Motivated by the fusion scheme proposed in (Georgiladakis et al., 2015), we propose the use of MSE weight appropriately the compositional and the semantic-affective models. MSE is estimated during training phase as shown in (5) and the weight parameter is defined as λ (p) = 0.5 1+e −M SE(p) , where M SE(p) is the MSE measured for each word pair p and λ (p) is estimated for each word pair p based on the compositional model. (4) is applied both on 1D and 3D compositional models and the derived λ (p) are averaged in order to formulate the parameter λ(p) that is used in the fusion scheme as follows: where w 1 , ...w 4 are the weights that correspond to each affective model and estimated through LSE and 4 i=1 w i = 1,υ * (p) is the affective rating of each word pair derived from each affective model, and λ(p) is meant for weighting the contribution of compositional and noncompositional models.

Dataset
For evaluating the proposed semantic-affective models we use two word pairs datasets, one consisting of AN and one consisting of NN. The word pairs of the evaluation datasets were extracted from movie reviews by (Socher et al., 2013) as follows. Each movie review was first split into the constituent sentences and then into the constituent phrases. The derived sentences and phrases were annotated with respect to their polarity using crowdsourcing. We kept only the word pairs that have an adjective or noun as their first word, and their second word is a noun. The created dataset consists of 1009 AN and 357 NN pairs.

Semantic-affective models
The proposed models estimate the affective ratings for each affective dimension in a continuous scale in [-1,1], however we only report results for valence. The semantic-affective model shown in (1) was applied for the unigram (U) and bigram (B) models as defined in Section 2.1. The parameters of the semantic-affective model are set as follows: N = 600 seeds, for S(·) a contextbased metric of semantic similarity was applied with window size equal to one, while the extracted features were weighted according to positive pointwise mutual information (Church and Hanks, 1990). LSE was applied for estimating the weights α of (1). The parameters of the model are detailed in (Palogiannidi et al., 2015). The compositional model requires a large corpus 2 for extracting the training pairs of each modifier (Iosif et al., 2016). For each modifier all word pairs with the same modifier are extracted creating hundreds of training pairs.

Fusion
We investigate both weighted and unweighted average schemes while a compositionality criterion based on the compositional models' MSE is also incorporated. In (6) the average of all affective models is estimated, while a weighted version of this scheme is shown in (7). LSE was adopted in order to estimate the weights that capture each model's contribution. Φ MSE avg was implemented in a two-fold cross validation scenario. Moreover, we introduce a compositionality criterion based on the MSE that was measured during the compositional training process. Then the weighted average of the compositional and the non-compositional semantic-affective models were estimated as shown in (8). The weights w i were estimated through LSE and two-fold cross validation. The parameter λ(p) used in (14) was computed as the average of the corresponding parameters that are estimated for the 1D and the 3D compositional models.

Results
We compare the valence ratings that were automatically estimated against the human valence ratings that were collected via crowdsoursing. We report evaluation results based on three evaluation metrics, namely, Pearson correlation coefficient between the estimated and the human valence ratings, binary classification accuracy (positive vs. negative valence ratings) and F-measure. The evaluation results are reported in Table 1 for several semantic-affective models. Regarding individual models, the highest performance is achieved by the unigram model for both AN and NN. This may be attributed to the very good performance at the unigram model as reported in (Malandrakis et al., 2013;Palogiannidi et al., 2015). However the fact that when moving from words to word pairs the performance drops by about 11% is a strong indicator of the need a compositional modeling. As expected, the accuracy of compositional models is between the accuracy of two semantic-affective models. Similar results have been also obtained for compositional models in the semantic space (Georgiladakis et al., 2015). Using fusion schemes that combine the compositional with the semantic-affective models we can achieve the best performance exceeding the performance of all individual models. Simple fusion schemes such as average of the affective ratings derived from the individual models can increase the performance of the best model up to 5% in terms of correlation. Similar performance increase is observed for the rest of the evaluation metrics as well.

Conclusions and Future Work
We proposed a compositional model for estimating continuous affective ratings of AN and NN structures consisting of words that formulate modifier-head pairs. The composition was motivated by the affective interaction of modifier and head words, while it was implemented as a affine operation in the continuous affective space. The compositional models were compared and fused with two semantic-affective models defined at the unigram and bigram level. The best performance overall was achieved by the fusion-based approach suggesting that there is not a single model that works for all word pairs, i.e., the degree and type of compositionality is different for each word pair. Currently, we are working on the improvement of the fusion schemes with focus on the identifying the parameters that control the degree of compositionality , i.e., the λ(p) parameter. Also, we are investigating the generalization of the proposed model across semantically/affectively more complex structures. Our long-term goal is to formulate a generic framework for integrating the compositional and non-compositional aspects of semantic and affective spaces bringing together theories from the areas of cognitive science psycholinguistics and data-driven computational models.