Towards Syntactic Iberian Polarity Classification

Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines. Thus, rules are also dependent and require adaptation, especially in multilingual scenarios. We tackle this challenge in the context of the Iberian Peninsula, releasing the first symbolic syntax-based Iberian system with rules shared across five official languages: Basque, Catalan, Galician, Portuguese and Spanish. The model is made available.


Introduction
Finding the scope of linguistic phenomena in natural language processing (NLP) is a core utility of parsing. In sentiment analysis (SA), it is used to address structures that play a role in polarity classification, both in supervised (Socher et al., 2013) and symbolic (Vilares et al.) models. In the latter case, these are mostly monolingual and dependent on the annotation of the training treebank, and so the rules are annotation-dependent too. Advances in NLP make it now possible to overcome such issues. We present a model that analyzes five official languages in the Iberian Peninsula: Basque (eu), Catalan (ca), Galician (gl), Portuguese (pt) and Spanish (es). We rely on three premises: 1. Syntactic structures can be defined in a universal way (Nivre et al., 2015).
1. A single set of syntactic rules to handle linguistic phenomena across five Iberian languages from different families. 2. The first end-to-end multilingual syntax-based SA system that analyzes five official languages of the Iberian Peninsula. This is also the first evaluation for SA that provides results for some of them.

Related work
Polarity classification has been addressed through machine learning (Mohammad et al., 2013;Socher et al., 2013;Vo and Zhang, 2016), and lexiconbased models (Turney, 2002). Most of the research involves English texts, although studies can be found for other languages such as Chinese (Chen and Chen, 2016) or Arabic (Shoukry and Rafea, 2012).
For the official languages in the Iberian Peninsula, much of the literature has focused on Spanish. Brooke et al. (2009) proposed a lexiconbased SA system that defines rules at the lexical level to handle negation, intensification or adversative subordinate clauses. They followed a crosslingual approach, adapting their English method (Taboada et al., 2011) to obtain the semantic orientation (SO) of Spanish texts. Vilares et al. created a syntactic rule-based system, by making an interpretation of Brooke et al.'s system, but limited to AnCora trees (Taulé et al., 2008). Martínez-Cámara et al. (2011) were one of the first to report a wide set of experiments on a number of bag-of-words supervised classifiers. The TASS workshop on sentiment analysis focused on Spanish language (Villena-Román et al., 2013) annually proposes different challenges related to polarity classification, and a number of approaches have used its framework to build their Spanish systems, most of them based on supervised learning (Saralegi and San Vicente, 2013;Gamallo et al., 2013;Hurtado et al., 2015;Vilares et al., 2015).
Sentiment analysis for Portuguese has also attracted the interest of the research community. Silva et al. (2009)  For Basque, Catalan and Galician, literature is scarce. Cruz et al. (2014) introduce a method to create multiple layered lexicons for different languages including co-official languages in Spain. San Vicente and Saralegi (2016) explore different ways to create lexicons, and apply them to the Basque case. They report an evaluation on a Basque dataset intended for polarity classification. Bosco et al. (2016) discuss the collection of data for the Catalan Elections and design an annotation scheme to apply SA techniques, but the dataset is still not available. With respect to Galician, in this article we will present the first published results for this language.
3 SISA: Syntactic Iberian SA 3.1 Preliminaries Vilares et al. (2017) propose a formalism to define compositional operations. Given a dependency tree for a text, a compositional operation defines how a node in the tree modifies the semantic orientation (SO) of a branch or node, based on elements such as the word form, part-of-speech (PoS) tag or dependency type, without limitations in terms of its location inside such tree. They released an implementation, where an ar-  bitrary number of practical compositional operations can be defined. The system queues and propagates them through the tree, until the moment they must be dequeued and applied to their target. The authors showed how the same set of operations, defined to work under the Universal Treebank (UT) guidelines (McDonald et al., 2013), can be shared across languages, but they do not explore how to create a single pipeline for analyzing many languages. This paper explores that path in the context of Iberian Peninsula, presenting an unified syntactic Iberian SA model (SISA). We below present how to build SISA, from the bottom (subjectivity lexica, tagging and dependency parsing) to the top levels (application of compositional operations to compute the final SO).

Subjectivity Lexica
SISA needs multilingual polarity lexica in order to predict the sentiment of a text. We used two sets of monolingual lexica as our starting points: 1. Spanish SFU lexicon (Brooke et al., 2009): It contains SO's for subjective words that range from 1 to 5 for positive and negative terms. We translated it to ca, eu, gl and pt using apertium (Forcada et al., 2011). We removed the unknown words and obtained the numbers in Table 1. 2 2. ML-Senticon (Cruz et al., 2014): Multi-layered lexica (not available for pt) with SO's where each layer contains a larger number of terms, but less trustable. We used the seventh layer for each language. As eu, ca and gl files have the same PoStag for adverbs and adjectives, they were automatically classified using monolingual tools (Agerri et al., 2014;Padró and Stanilovsky, 2012;Garcia and Gamallo, 2015) (Table 2 contains the statistics). SO's (originally from 0 to 1) were linearly transformed to the scale of the SFU lexicon.
The SFU and ML-Senticon lexica for each language were combined to obtain larger monolingual resources, and these were in turn combined   into a common Iberian lexicon (see Table 3). When merging lexica, we must consider that: 1. In monolingual mergings, the same word can have different SO's. E.g., the Catalan adjective 'abandonat' (abandoned) has −1.875 and −3 in ML-Senticon and SFU, respectively.
2. When combining lexica of different languages, the same word form might have different meanings (and SOs) in each language. Merging them in a multilingual resource could be problematic. For example, the adjective 'espantoso' has a value of −4.1075 in the combined es lexicon (frightening), and of −3.125 in the gl one (frightening), while the same word in the pt data (astonishing) has a positive value of 5. Note, however, that even if they could be considered very similar from a lexical or morphological perspective, many phonological false friends have different spellings in each language (such as the negative 'vessar' (to spill) in ca and the positive 'besar' (to kiss) in es), so these cases end up not being a frequent problem (only 0.36% of the words have both positive and negative polarity in the monolingual lexica).
These two problems were tackled by averaging the polarities of words with the same form. Thus, the first monolingual mergings produced a balanced SO (e.g., 'abandonat' has −2.4375 in the combined ca lexicon), while in the subsequent multilingual fusion, contradictory false friends have a final value close to no polarity (e.g., 'espantoso', with a SO of −0.7 in the Iberian lexicon). The impact of these mergings is analyzed in §4.

PoS-tagging and dependency parsing
For the compositional operations to be triggered, we first need to do the tagging and the dependency parse for a sentence. To do so, we trained an Iberian PoS-tagger and parser, i.e. single modules that can analyze Iberian languages without applying any language identification tool. Multilingual taggers and parsers can be trained following approaches based on (Vilares et al., 2016;Ammar et al., 2016). We are relying on the Universal Dependency (UD) guidelines (Nivre et al., 2015) to train these tools, since they provide corpora for all languages studied in this paper.
For the Iberian tagger we relied on Toutanova and Manning (2000), obtaining the following accuracies (%) in the monolingual UD

Compositional operations
For a detailed explanation of compositional operations, we encourage the reader to consult Vilares et al. (2017), but we here include an overview as part of SISA. Briefly, a compositional operation is tuple o = (τ, C, δ, π, S) such that: • τ : R → R is a transformation function to apply on the semantic orientation of nodes, where τ can be weighting β (SO) = SO × (1 + β) or • C : V → {true, f alse} is a predicate that determines whether a node in the tree will trigger the operation, based on word forms, PoS-tags and dependency types, • δ ∈ N is a number of levels that we need to ascend in the tree to calculate the scope of o, i.e., the nodes of T whose SO is affected by the transformation function τ , • π is a priority used to break ties when several operations coincide on a given node, and • S is a scope function that will be used to determine the nodes affected by the operation.
We adapt the UT operations used by Vilares et al. (2017) to the UD style to handle, which are now described: 1. Intensification: It diminishes or amplifies the SO of a word or a phrase. It operates from adjectives or adverbs modifying the SO of the head structure they depend on: e.g., the SO of 'grande' (big, in es) increases from 1.87 to 2.34 if a word such as 'muy' (very) depends on it and its labeled with the dependency type advmod. Formally, for o intensif ication , τ = weight β (SO), C = w ∈ intensifiers ∧ t ∈ {ADV,ADJ} ∧ d ∈ {advmod,amod,nmod}, δ = 1, π = 3 and S = {target node, b(advmod), b(amod)}, where b(x) indicates that the scope is the first branch at the target level whose dependency type is x. β is extracted from a lexicon with booster values (in this work obtained from SFU, where 'muy' has a booster value of 0.25).

Subordinate adversative clauses:
This rule is designed for dealing with structures coordinated by adversative conjunctions (such as but), which usually involve opposite polarities between the two joint elements (e.g., "good but expensive"). Here, the SO of the first element is multiplied by 1 − 0.25, so its polarity decreases. Formally, τ = weight −0.25 (SO), C = w ∈ adversatives ∧ t ∈ {CONJ,SCONJ} ∧ d ∈ {cc,advmod,mark}, δ = 1, π = 1 and S = {subjl}. Subjl indicates that the scope is the first left branch with SO ! = 0 at the target level.
3. Negation: In most cases, negative adverbs shift the polarity of the structures they depend on ("It is nice" versus "It is not nice"). In order to handle these cases, the present rule shifts the polarity of the head structures of a negative adverb by α (where α = 4, in our experiments). In the previous example, the polarity of "nice" would drop from 3.5 to −0.5 if affected by the rule. Formally, for o negation , τ = shif t 4 (SO), C = w ∈ negators ∧ d ∈ {neg,advmod}, δ = 1, π = 2 and S = {target node, b(root), b(cop), b(nsubj), subjr, all}. Subjr indicates that the scope is the first branch with SO ! = 0 and all indicates to apply negation at the target level as a backoff option, if none of the previous scopes matched.

'If' irrealis:
In conditional statements, a SA system may obtain an incorrect polarity due to the presence of polarity words which actually do not reflect a real situation ("This is good" vs "If this is good"). This rule attempts to better analyze these structures by shifting the polarity (here, multiplied by −1) if a conditional conjunction depends on it. Formally, for o irrealis , τ = weight −1 (SO), C = w ∈ irrealis ∧ d ∈ {mark,advmod,cc}, δ = 1, π = 3 and S = {target node, subjr}.

Evaluation
This section presents the results of the experiments we carried out with our system using both the monolingual and the multilingual lexica, compared to the performance of a supervised classifier for three of the five analyzed languages.

Testing corpora
• Spanish SFU (Brooke et al., 2009): A set of 400 long reviews (200 positive, 200 negative) from different domains such as movies, music, computers or washing machines.
• Portuguese SentiCorpus-PT 0.1 (Carvalho et al., 2011): A collection of comments from the Portuguese newspaper Público with polarity annotation at the entity level. As our system assigns the polarity at the sentence level, we selected the SentiCorpus sentences with (a) only one SO and (b) with > 1 SO iff all of them were the same, generating a corpus with 2, 086 (from 2, 604) sentences.
• Basque Opinion Dataset (San Vicente and Saralegi, 2016): Two small corpora in Basque containing news articles and reviews (music and movie domains). We merged them to create a larger dataset, containing a total of 224 reviews.
In addition, due to the lack of available sentenceor document-level corpora for Catalan or Galician, we opted for synthetic corpora: • Synthetic Catalan SFU: An automatically translated version to ca of the Spanish SFU, with 5% of the words from the original corpus considered as unknown by the translation tool.
• Synthetic Galician SFU: An automatically translated version to gl of the Spanish SFU (≈ 6.4% of the words not translated).

Experiments
We performed different experiments on binary polarity classification for knowing (a) the accuracy of the system, (b) the impact of the merged resources, and (c) the impact of the universal rules in monolingual and multilingual settings:  The performance of our system was compared to LinguaKit (LKit), an open-source toolkit which performs supervised sentiment analysis in several languages (Gamallo et al., 2013;Gamallo and Garcia, 2017). Table 4 shows the results of each of these models on the different corpora. The baseline (SL-O) obtained values between 54% (ca) and 62.95% (eu), results that are in line to those obtained by the supervised model. 5 As we are not aware of available SA tools for ca, we could not compare our results with other systems. For Basque, San Vicente and Saralegi (2016) evaluated several lexica (both automatically translated and extracted, as well as with human annotation) in the same dataset used in this paper. They used a simple average polarity ratio classifier, which is similar to our baseline. Even if the lexica are different, their results are very similar to our SL-O system (63% vs 62, 95%), and they also show that manually reviewing the lexica can boost the accuracy by up to 13%.
The central columns of Table 4 show the results of using universal rules and a merged lexicon in the same datasets. In gl and pt the best values were obtained using individual lexica together with syntactic rules, while the Iberian system achieved the best results in the other languages. Table 5 summarizes the impact that the rules have in both the monolingual and the multilingual setting, as well as the differences in performance due to the fusion process. Concerning the rules (columns 2 and 3), the results show that using the same set of universal rules improves the performance of the classifier in all the languages and settings. Their impact varies between 3.5 percentage 5 LinguaKit was intended for tweets (not long texts).  points (ca) and more than 15 (es) and, for each language, the rules provide a similar effect in monolingual and multilingual lexica (except for ca, with much higher values in the ML scenario). The fusion of the different lexica had different results (columns 4 and 5 of Table 5): in gl and pt, it had a negative impact (between −0.75% and −3.21%) while in the other three the ML setting achieved better values (between 0.75 and 15.5 points, again with huge differences in ca). On average, using multilingual lexica had a positive impact of 1.3 (-O) and 2.8 points (+O). As mentioned, ca has a different behaviour: the gain from rules when using monolingual lexica is about 3.50 points (lower than other languages), and the benefit of the ML lexicon without syntactic rules is of 4.25 points. However, when combining both the universal rules and the ML lexicon its performance increases ≈ 15 points, turning out that the combination of these two factors is decisive.
In sum, the results of the experiments indicate that syntactic rules defined by means of a harmonized annotation can be used in several languages with positive results. Furthermore, the merging of monolingual lexica (some of them automatically translated) can be applied to perform multilingual SA with little impact in performance when compared to language-dependent systems.

Conclusions and current work
We built a single symbolic syntactic system for polarity classification that analyzes five official languages of the Iberian peninsula. With little effort we obtain robust results for many languages. As current work, we are working on texts harder to parse and low-resource languages: we developed a Galician corpus of manually labeled tweets, where SISA obtains between 62% and 65% accuracy for different settings, 6 and plan to incorporate Kong et al. (2014) parser to improve its performance.