Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment (Dewaele, 2010). Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, speciﬁcally Hindi-English bilinguals, to understand the language of preference, if any, for expressing opinion and sentiment. To this end, we develop classiﬁers for opinion detection in these languages, and further classifying opinionated tweets into positive, negative and neutral sentiments. Our study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing. As an aside, we explore some common pragmatic functions of code-switching through sentiment detection.


Introduction
The pattern of language use in a multilingual society is a complex interplay of socio-linguistic, discursive and pragmatic factors. Sometimes speakers have a preference for a particular language for certain conversational and discourse settings; on other occasions, there is fluid alteration between two or more languages in a single conversation, also known as Code-switching (CS) or Code-mixing 1 . Under- * * This work was done when the author was a Research Fellow at Microsoft Research Lab India. 1 Although some linguists differentiate between Codeswitching and Code-mixing, this paper will use the two terms interchangeably.
standing and characterizing language preference in multilingual societies has been the subject matter of linguistic inquiry for over half a century (see Milroy and Muysken (1995) for an overview).
Conversational phenomena such as CS were observed only in speech and therefore, all previous studies are based on data collected from a small set of speakers or from interviews. With the growing popularity of social media, we now have an abundance of conversation-like data that exhibit CS and other speech phenomena, hitherto unseen in text . Leveraging such data from Twitter, we conduct a large-scale study on language preference, if any, for the expression of opinion and sentiment by Hindi-English (Hi-En) bilinguals.
We first build a corpus of 430,000 unique Indiaspecific tweets across four domains (sports, entertainment, politics and current events) and automatically classify the tweets by their language: English, Hindi and Hi-En CS. We then develop an opinion detector for each language class to further categorize them into opinionated and non-opinionated tweets. Sentiment detectors further classify the opinionated tweets as positive, negative or neutral. Our study shows that there is a strong preference towards Hindi (i.e. the native language or L1) over English (L2) for expression of negative opinion. The effect is clearly visible in CS tweets, where a switch from English to Hindi is often correlated with a switch from a positive to negative sentiment. This is referred to as the polarity-switch function of CS (Sanchez, 1983). Using the same experimental technique, we also explore other pragmatic functions of CS, such as reinforcement and narrative-evaluative.
Apart from being the first large-scale quantitative study of language preference in multilingual societies, this work also has several other contributions: (a) We develop one of the first opinion and sentiment classifiers for Romanized Hindi and CS Hi-En tweets with higher accuracy than the only known previous attempt (Sharma et al., 2015b). (b) We present a novel methodology for automatically detecting pragmatic functions of codeswitching through opinion and sentiment detection.
The rest of the paper is organized as follows: Sec. 2 introduces language preference, functions of CS and Hindi-English bilingualism on the web. Sec. 3 formulates the problem and presents the fundamental questions that this paper seeks to answer. Sec. 4 and 5 discuss dataset creation and opinion and sentiment detection techniques respectively. Sec. 6 evaluates the hypotheses in light of the observations on the tweet corpus. We conclude in Sec. 7, and raise some interesting sociolinguistic questions for future studies.

Background and Related Work
In order to situate the questions addressed in our work in existing literature, we present a brief overview of the past research in pragmatic and discursive analysis of code-switching, and specifically, on language preference for emotional expression. A primer to Hi-En bilingualism and its presence in social media shall follow.

CS Functions and Language Preference
In multilingual communities, where there are more than one linguistic channels for information exchange, the choice of the channel depends on a variety of factors, and is usually unpredictable (Auer, 1995). Nevertheless, linguistic studies point out certain frequently-observed patterns. For instance, certain speech activities might be exclusively or more commonly related to a certain language choice (e.g. Fishman (1971) reports use of English for professional purposes and Spanish for informal chat for English-Spanish bilinguals from Puerto Rico). Apart from association between such conversational contexts and language preference, language alteration is often found to be used as a signaling device to imply certain pragmatic functions (Barredo, 1997;Sanchez, 1983;Nishimura, 1995;Maschler, 1991;Maschler, 1994) such as: (a) reported speech (b) narrative to evaluative switch (c) reiterations or emphasis (d) topic shift (e) puns and language play (f) topic/comment structuring etc. Attempts of predicting the preferred language, or even exhaustively listing such functions, have failed. However, linguists agree that language alteration in multilingual communities is not a random process.
Of specific interest to us are the studies on language preference for expression of emotions. Through large-scale interviews and two decades of research, Dewaele (2004;2010) argued that for most multilinguals, L1 (the dominant language, which is often, but not always, the native or mother tongue) is the language preference for emotions, which include emotional inner speech, swearing and even emotional conversations. Dewaele argues that emotionally charged words in L1 elicit stronger emotions than those in other languages, and hence L1 is preferred for emotion expression.

Hindi-English Bilingualism
Around 125 million people in India speak English, half of whom have Hindi as their mother-tongue. The large proportion of the remaining half, especially those residing in the metropolitan cities, also know at least some Hindi. This makes Hi-En codeswitching, commonly called Hinglish, extremely widespread in India. There is historical attestation, as well as recent studies on the growing use of Hinglish in general conversation, and in entertainment and media (see Parshad et al. (2016) and references therein). Several recent studies Barman et al., 2014;Solorio et al., 2014;Sequiera et al., 2015) also provide evidence of Hinglish and other instances of CS on online social media such as Twitter and Facebook. In a Facebook dataset analyzed by , almost all sufficiently long conversation threads were found to be multilingual, and as much as 17% of the comments had CS. This study also indicates that on online social media, Hindi is seldom written in the Devanagari script. Instead, loose Roman transliteration, or Romanized Hindi, is common, especially when users code-switch between Hindi and English.
While there has been some effort towards computational processing of CS text (Solorio and Liu, 2008;Solorio and Liu, 2010;Peng et al., 2014), to the best of our knowledge, there has been no study on automatic identification of functional aspects of CS or any large-scale, data-driven study of language preference. The current study adds to the growing repertoire of work on quantitative analysis of social media data for understanding socio-linguistic and pragmatic issues, such as detection of depression (De Choudhury et al., 2013), politeness (Danescu-Niculescu-Mizil et al., 2013), speech acts (Vosoughi and Roy, 2016), and social status (Tchokni et al., 2014).

Problem Formulation
Along the lines of (Dewaele, 2010), we ask the following question: Is there a preferred language for expression of opinion and sentiment by the Hi-En bilinguals on Twitter?
Let T = {t 1 , t 2 , . . . t |T | } be a set of tweets (or any text) generated by Hi-En bilinguals. We define: • λ(T ), σ(T ) and (T ) as the subsets of T that respectively contain all tweets in language λ, script σ and sentiment .
The preference towards a language-script pair λσ for expressing a type of sentiment is given by the probability However, pr(λσ), which defines the prior probability of choosing λσ for a tweet is dependent on a large 2 Tweets in mixed script are rare and hence we do not include a symbol for it, though the framework does not preclude such possibilities. number of socio-linguistic parameters beyond sentiment. For instance, on social media, English is overwhelmingly more common than any Indic language . This is because (a) English tweets come from a large number of users apart from Hi-En bilinguals and (b) English is the preferred language for tweeting even for Hi-En bilinguals because it expands the target audience of the tweet by manifolds. The preference of λσ for expressing , therefore, can be quantified as: We say λσ is the preferred language-script choice over λ σ for expressing sentiment if and only if The strength of the preference is directly proportionate the ratio of the probabilities: pr( |λσ; T )/pr( |λ σ ; T ). An alternative but related way of characterizing the preference is through comparing the odds of choosing a sentiment type to its polar opposite -. We say, λσ is the preferred language-script pair for expressing , if

Hypotheses
Now we can formally define the two hypotheses, we intend to test here. Hypothesis I: For Hi-En bilinguals, Hindi is the preferred language for expression of opinion on Twitter. Therefore, we expect i.e., pr(⊗|hd; T ) < pr(⊗|er; T ) And similarly, pr(⊗|hr; T ) < pr(⊗|er; T ) Hypothesis II: For Hi-En bilinguals, Hindi is the preferred language for expression of negative sentiment. Therefore, pr(−|hd; T ) ≈ pr(−|hr; T ) > pr(−|er; T ) (8) In particular, we would like to hypothesize that the odds of choosing Hindi for negative over positive is really high compared to the odds for English. I.e., pr(−|hd; T ) pr(+|hd; T ) ≈ pr(−|hr; T ) pr(+|hr; T ) > pr(−|er; T ) pr(+|er; T ) A special case of the above hypotheses arise in the context of code-mixing, i.e., for the set mr(T ). Since the mixed tweets certainly come from proficient bilinguals and have both Hi and En fragments, we can reformulate our hypotheses at a tweet level. Let m h r(T ) and m e r(T ) respectively denote the set of Hi and En fragments in mr(T ). Hypothesis Ia: Hindi is the preferred language for expression of opinion in Hi-En code-mixed tweets. Therefore, we expect i.e., pr(⊗|m h r; T ) < pr(⊗|m e r; T ) Hypothesis IIa: Hindi is the preferred language for expression of negative sentiment in Hi-En codeswitched tweets. Therefore, pr(−|m h r; T ) pr(+|m h r; T ) > pr(−|m e r; T ) pr(+|m e r; T ) Likewise, the above hypotheses also apply for the Devanagari script, though for technical reasons, we do not test them here. Besides comparing aggregate statistics on mr(T ), it is also interesting to look at the sentiment of m h r(t i ) and m e r(t i ) for each tweet t i . In particular, for every pair of = , we want to study the fraction of tweets in mr(T ) where m h r(t i ) has sentiment and m e r(t i ) has . Let this fraction be pr(h ↔ e ; mr(T )). Under "no-preference for language" (i.e., the null) hypothesis, we would expect pr(h ↔ e ; mr(T )) ≈ pr(h ↔ e ; mr(T )). However, if pr(h ↔ ; mr(T )) is significantly higher than pr(h ↔ e ; mr(T )), it means that speakers prefer to switch from English to Hindi when they want to express a sentiment and vice versa. Pragmatic Functions of Code-Switching: When native speakers tend to switch from Hindi to English when they switch from an expression with sentiment to one with , or in other words ↔ , we

A Note on Statistical Significance
All the statistics defined here are likelihoods; Equations 9, 12 and 13, in particular, state our hypothesis in the form of the Likelihood Ratio Test. However, the true classes λ and are unknown; we predict the class labels using automatic language and sentiment detection techniques that have non-negligible errors. Under such a situation, the likelihoods cannot be considered as true test statistics, and consequently, hypothesis testing cannot be done per se. Nevertheless, we can use these as descriptive statistics and investigate the status of the aforementioned hypotheses.

Datasets
We collected tweets with certain India-specific hashtags (Table 1) using the Twitter Search API (Twi, 2015b) over three months (December 2014-February 2015. In this paper, we use tweets in Devanagari script Hindi (hd), and Roman script English (er), Hindi (hr) and Hi-En Mixed (mr). English and mixed tweets written in Devanagari are extremely rare ) and we do not study them here. We filter out tweets labeled by the Twitter API (Twi, 2015a) as German, Spanish, French, Portuguese, Turkish, and all non-Roman script languages (except Hindi). We experiment on the following different corpora: T All : All tweets after filtering. This corpus contains 430,000 unique tweets posted by 1,25,396 unique users.
T BL : Tweets from users who are certainly Hi-En bilinguals, which are approximately 55% (240,000) of the tweets in T All . We define a user to be a Hi-En bilingual if there is at least one mr tweet from the user, or if the user has tweeted at least once in Hindi (hd or hr) and once in English (er).
T spo , T mov , T pol , T eve : Topic-wise corpora for sports, movies, politics and events (Table 1).
T CS : Tweets with inter-sentential CS. We define these as tweets containing at least one sequence of 5 contiguous Hindi words and one sequence of 5 contiguous English words. The corpus has 3,357 tweets.
SAC: 1000 monolingual tweets (er, hr, hd) and 260 mixed (mr) tweets manually annotated with sentiment and opinion labels. These were annotated by two linguists, both fluent Hi-En speakers. The annotators first checked whether the tweet is opinionated or ⊗ and then identified polarity of the opinionated tweets (+, − or 0). Thus, the tweets are classified into the four classes in the set 3. If a tweet contains both opinion and ⊗, each fragment was individually annotated. The inter-annotator agreement is 77.5% (κ = 0.59) for opinion annotation and 68.4% (κ = 0.64) over all four classes. A third linguist independently corrected the disagreements.
LLC Test : 141 er, 137 hr, and 241 mr tweets annotated by a Hi-En bilingual form the test set for the Language Labeling system (Sec. 5.1).
SAC and LLC Test can be downloaded and used for research purposes 3 .
Note that apart from SAC and LLC Test , all corpora are subsets of T All . For generalizability of our observations, it is important to ensure that the tweets in T All come from a large number of users and the datasets do not over-represent a small set of users. In Figure 1, we plot the minimum fraction of users required (x-axis) to cover a certain percentage of the tweets in T All (y-axis). Tweets from at least 10%, i.e., 12.5K users are needed to cover 50% of the corpus. As expected, we do observe a powerlaw-like distribution, where a few users contribute a large number of tweets, and a large number of users contribute a few tweets each. We believe that 12.5K users is sufficient to ensure an unbiased study.
Further, we classify the users into three specific groups (i) news channels, (ii) general users (having ≤ 10,000 followers), (iii) popular users or celebrities (having > 10,000 followers). Interestingly, for both T All , and T BL corpora, we observe that around 98% of all users are general, and 96% of all tweets come from such users. Hence, most observations from these corpora are expected to be representative of the average online linguistic behavior of a Hi-En bilingual.

Method
Fig. 2 diagrammatically summarizes our experimental method. We identify the language used in each tweet before detecting opinion and sentiment.

Language Labeling
Tweets in Devanagari script are accurately detected by the Twitter API as Hindi tweets -we label these as hd, though a small fraction of them could also be md. To classify Roman script tweets as er, hr or mr, we use the system that performed best in the FIRE 2013 shared task for word-level language detection of Hi-En text (Gella et al., 2013). This system uses character n-gram features with a Maximum Entropy model for labeling each input word with a language label (either English or Hindi). We design minor modifications to the system to improve its performance on Twitter data, which are omitted here due to paucity of space.

Opinion and Sentiment Detection
Most of the existing research in opinion detection (Qadir, 2009;Brun, 2012; Rajkumar et al., We propose a two-step classification model. We first identify whether a tweet is opinionated or nonopinionated (⊗). If the tweet is opinionated, we further classify it according to its sentiment (+, − or 0). Fig. 2 shows the architecture of the proposed model. Two-step classification was empirically found to be better than a single four-class classifier.
We develop individual classifiers for each language class (er, hr, hd, mr) using an SVM with RBF kernel from Scikit-learn (Pedregosa et al., 2011). We use the SAC dataset (Sec. 4) as training data and features as described in Sec. 5.3.

Classifier Features
For opinion classification (opinion or ⊗), we propose a set of event-independent lexical features and Twitter-specific features. (i) Subjective words: Expected to be present in opinion tweets. We use lexicons from Volkova et al. (2013) for er and Bakliwal et al. (2012) for hd. We Romanize the hd lexicon for the hr classifiers (ii) Elongated words: Words with one character repeated more than two times, e.g. sooo, naaahhhhi (iii) Exclamations: Presence of contiguous exclamation marks (iv) Emoticons 4 (v) Question marks: Queries are generally nonopinionated. (vi) Wh-words: These are used to form questions (vii) Modal verbs: e.g. should, could, would, cud, shud (viii) Excess hashtags: Presence of more than two hashtags (ix) Intensifiers: Generally used to emphasize sentiment, e.g., we shouldn't get too comfortable (x) Swear words 5 : Prevalent in opinionated tweets, e.g. that was a f ing no ball!!!! #indvssa (xi) Hashtags: Hashtags might convey user sentiment (Barbosa et al., 2012). We manually identify hashtags in our corpus that represent explicit opinion. (xii) Domain lexicon: For hr, & hd category tweets, we construct sentiment lexicons from 1000 manually annotated tweets. Each word or phrase in this lexicon represents +, or −, or 0 sentiment. (xiii) Twitter user mentions (xiv) Pronouns: Opinion is often in first person using pronouns like I and we.
For sentiment classification, we use emoticons, swear words, exclamation marks and elongated words as described above. We also use subjective words from various lexicons (Mohammad and Turney, 2013;Volkova et al., 2013;Bakliwal et al., 2012;Sharma et al., 2015a). Additionally, we use -(i) Sentiment words: From Hashtag Sentiment and Sentiment140 lexicons . We also manually annotate hashtags from our dataset that represent sentiment. (ii) Negation: A negated context is tweet segment that begins with a negation word and ends with a punctuation mark (Pang et al., 2002). The list of negation words are  taken from Christopher Potts' sentiment tutorial 6 .
The mr opinion classifier uses the output from the er and hr classifiers as features (Fig. 2), along with an additional feature that represents whether the majority of the words in the tweet are Hindi or not. A similar strategy is used for mr sentiment detection.

Evaluation
We evaluated the language labeling system on the LLC T est corpus, on which the precision (recall) values were 0.93(0.91), 0.90(0.85) and 0.88(0.92) for er, hr and mr classes respectively. The tweetlevel classification accuracy was 89.8%.
The opinion and sentiment classifiers were evaluated using 10-fold cross validation on the SAC dataset. Table 2 details the class-wise accuracy. For comparison, we also reimplemented the dictionary and dependency-based method by Qadir (2009). The accuracy of the opinion classifier on the er tweets was found to be 65.7%, 7% lower than our system. We also compared our mr sentiment classifier with that of Sharma et al. (2015b). As their method performs two class sentiment detection (+ and −), we select such tweets from SAC. Their system achieves an accuracy of 68.2%, which is 4% lower than the accuracy of our system. An analysis of the errors showed more false negatives (i.e., opinions labeled ⊗) than false positives in opinion classification. Sentiment misclassification is uniformly distributed. Table 3 reports the accuracy of the opinion classifier for feature ablation experiments. For all three language-script pairs, lexicon and non-word (emoticons, elongated words, hashtags, exclamation) features are the most effective, though all features have some positive contribution towards the final accuracy of opinion detection. For hr and hd tweets, domain knowledge is significant, as shown by the 4% accuracy drop with removing the domain lexicon.

Experiments and Observations
In this section, we report our experiments on 430,000 unique tweets (T All ), and its various subsets as defined in Sec 4. First, we run the language detection system on the corpora. Table 4 shows the language-wise distribution. We see that language preference varies by topic, which is not surprising. Due to paucity of space, the correlation between language usage and topic will not be discussed at length here, but we will highlight cases where the differences are striking. We apply the language-specific opinion and sentiment classifiers to tweets detected as the corresponding language class. In the following subsections, we empirically investigate the hypotheses. Table 5 shows pr(⊗|λσ; T ), pr(−|λσ; T ) and pr(−|λσ; T )/pr(+|λσ; T ) for T All , T BL and two randomly selected topics -Movie and Politics. The statistics are fairly consistent over the corpora, with slight differences but similar trends in T mov .  Table 5: Sentiment across languages: Statistics concerning hypotheses I and II.

Status of Hypotheses I and II
We need the first statistic in order to investigate Hypothesis I (Eqs. 6 and 7), and the two latter ones for verifying Hypothesis II (Eqs. 8 and 9).
Contrary to Eqs. 6 and 7, for all corpora except T mov , we observe the following trend: pr(⊗|hd; T ) > pr(⊗|hr; T ) ≥ pr(⊗|er; T ) In other words, hd is more commonly preferred for expressing non-opinions than hr and er. Hypothesis I is clearly untrue for these corpora, though due to the small differences between hr and er, we cannot claim that English is the preferred language for expressing opinions. A closer scrutiny of the corpora revealed that hd tweets mostly come from official sources (news channels, political parties, production houses) and celebrities, which are mostly factual. hr tweets are from general users and show similar trends as English. Thus, in general, there seems to be no preferred language for expressing opinion by the Hi-En bilinguals on Twitter.
In the context of Hypothesis II, we see the general pattern (with some topic specific variations): pr(−|hr; T ) > pr(−|hd; T ) ≥ pr(−|er; T ) The pattern emerges even more strongly, when we look at pr(−|λσ; T )/pr(+|λσ; T ). The odds of expressing a negative opinion over positive opinion in Hindi is between 1.5 and 6 (T mov exhibits a slightly different pattern but similar preference, T pol shows a stronger preference towards Hindi for negative sentiment), whereas the same for English is between 0.1 and 0.6. In other words, English is more preferred Statistic m h r m e r pr(⊗|λσ; T CS ) 0.39 0.45 pr(−|λσ; T CS ) 0.22 0.14 pr(−|λσ; T CS )/pr(+|λσ; T CS ) 2.2 0.34 Table 6: T CS statistics for testing hypotheses Ia and IIa for expressing positive opinion, and Hindi for negative opinion. These observations provide very strong evidence in favor of Hypothesis II.

Status of Hypotheses Ia and IIa
Recall that Hypothesis Ia and Hypothesis IIa are essentially same as Hypotheses I and II, but applied on m h r and m e r fragments from the T CS corpus. Table 6 reports the three statistics necessary for testing these hypotheses. pr(⊗|m e r; T CS ) is slightly greater than pr(⊗|m h r; T CS ), which is what we would expect if Hypothesis Ia was true. However, since the difference is small, we view it as a trend rather than a proof of Hypothesis Ia.
The statistics clearly show that Hypothesis IIa holds true for T CS . The fraction of negative sentiment in m h r is over 1.5 times higher than that of m e r. Further, the odds of expressing a negative sentiment in Hindi over positive sentiment in Hindi in a code-switched tweet is 6.5 times higher than the same odds for English.

Switching Functions
Recall that using Eq. 13 (Sec. 3), we can estimate the preference, if any, for switching to a particular language while changing the sentiment. In particular, research in socio-linguistics has shown that users often switch between languages when they switch from non-opinion (⊗) to opinion ({+, −, 0}). This is called the Narrative-Evaluative function of CS (Sanchez, 1983). This function appears in 46.1% of the tweets in T CS . We find that pr(h{+, −, 0} ↔ e⊗; T CS ) pr(h⊗ ↔ e{+, −, 0}; T CS ) = 0.86 which indicates that there is no preference for switching to Hindi (or English) while switching between opinion and non-opinion. This is also confirmed above in the context of hypotheses I and Ia. While switching between opinion and non-opinion in a tweet, users do switch language. However, we The latter function is called polarity switch. The extremely high value for these ratios is an evidence for a strong preference towards switching language from English to Hindi while switching to negative sentiment (and switching to English when sentiment changes from negative to positive). We also observe cases where there is a language switch, but no sentiment switch and hence, we cannot evaluate language preference using Eq. 13 (because = ). In T CS , 15.3% of the tweets show Positive Reinforcement, where both fragments are of positive sentiment. Negative Reinforcement is defined similarly and is seen in 8.7% of the tweets. Other tweets in T CS likely have pragmatic functions that cannot be identified based on sentiment.

Language Preference for Swearing
Since there is evidence that the native language (Hindi, in this case) is preferred for swearing (De-waele, 2004), we computed the fraction of tweets that contain swear words in each language class. Fig. 3a shows the distribution across topics. The languages hr and mr have a much higher fraction of abusive tweets than er and hd. Fig. 3b shows the distribution of abusive m h r and m e r fragments for tweets in T CS . Interestingly, over 90% of the swear words occur in m h r. Both distributions strongly suggest a preference for swearing in Hindi.

Conclusion
In this paper, through a large scale empirical study of nearly half a million tweets, we tried to answer a fundamental question regarding multilingualism, namely, is there a preferred language for expression of sentiment. We also looked at some of the pragmatic functions of code-switching. Our results indicate a strong preference for using Hindi, L1 for the users from whom these tweets come, for expressing negative sentiment, including swearing. However, we do not observe any particular preference towards Hindi for expressing opinions.
Previous linguistic studies (Dewaele, 2004;Dewaele, 2010) have already shown a preference for L1 for expressing emotion and swearing. However, we observe that for expressing positive emotion, English (which would be L2) is the language of preference. This raises some intriguing socio-linguistic questions. Is it the case that English being the language of aspiration in India, it is preferred for positive expression? Or is it because Hindi is specifically preferred for swearing and therefore, is the language of preference for negative emotion? How do such preferences vary across topics, users and other multilingual communities? How representative of the society is this kind of social media study? We plan to explore some of these questions in the future.
Our study also indicates that inferences drawn on multilingual societies by analyzing data in just one language (usually English), which has been the norm so far, are likely to be incorrect.