Accommodation of Conversational Code-Choice

Bilingual speakers often freely mix languages. However, in such bilingual conversations, are the language choices of the speakers coordinated? How much does one speaker’s choice of language affect other speakers? In this paper, we formulate code-choice as a linguistic style, and show that speakers are indeed sensitive to and accommodating of each other’s code-choice. We find that the saliency or markedness of a language in context directly affects the degree of accommodation observed. More importantly, we discover that accommodation of code-choices persists over several conversational turns. We also propose an alternative interpretation of conversational accommodation as a retrieval problem, and show that the differences in accommodation characteristics of code-choices are based on their markedness in context.


Introduction
Code-switching (CS) refers to the fluid alteration between two or more languages within a conversation, and is a common feature of all multilingual societies. (Auer, 2013). Multilingual speakers are known to code-switch in spoken conversations for a variety of reasons, motivated by information-theoretic and cognitive principles, and also as a result of numerous social, communicative and pragmatic functions (Scotton and Ury, 1977;Söderberg Arnfast and Jørgensen, 2003;Gumperz, 1982).
Code-choice refers to a speaker's decision of which code to use in a given utterance, and in case of a CS utterance, to what extent the different codes are to be used. Depending on the sociolinguistic and conversational context, a speaker's code-choice may be unexpected and noticed by other speakers, and is likely to affect other speakers' subsequent code-choice. In other words, speakers may accommodate to each other's codechoice, positively or negatively (Genesee, 1982).
In this work, we propose a set of metrics to study the social accommodation of code-choice as a sociolinguistic style marker. We build upon the existing framework on accommodation by  and adapt that for code-choice by introducing relevant features for code-choice. We then motivate and illustrate the effect of code markedness on the degree of accommodation -the more salient code is more strongly accommodated for. We further generalize the framework to also account for delayed accommodation, instead of only next-turn or immediate accommodation.
In addition, we introduce an alternative view of accommodation as a query-response task, and employ mean reciprocal rank, a well-understood metric from the domain of Information Retrieval, as a metric for latency of accommodation. We measure how quickly a style marker (code-choice in our case) introduced by a speaker is retrieved by the other speaker during the conversation. Our approach is developed for analyzing code-choice but is applicable to other dimensions of linguistic style as well (Tausczik and Pennebaker, 2010). This presents an alternative view of conversational style accommodation and offers a simple but effective way of measuring, characterizing and even predicting elements of conversational style.
We test this formulation on two CS conversational datasets -dialog scripts of bilingual Indian movies (in English and Hindi) and a transcription of real-world conversations between Spanish-English bilinguals in Florida, US. In both the corpora, we observe strong signals of interpersonal code-choice accommodation for the salient or marked code. We also observe that on average, the marked code is accommodated within the first three to four conversational turns, beyond which the effect of accommodation on code-choice decays gradually. Contextually-unmarked code is less strongly accommodated for, even when it occurs relatively infrequently within a conversation.
As far as we know, this is the first computational study of code-choice accommodation, and first work that introduces and formalizes the concept of delayed accommodation, that can be applied to other style dimensions as well.
The rest of the paper is organized as follows. We describe the background and related work in Section 2, which motivates the first formulation of code-choice accommodation in Section 3. We improve this by formulation by modifying the features in Section 4. We generalize the formulation to multiple turns and introduce the analogy to retrieval in Section 5, along with the results. We wrap up with a discussion in Section 6 that we conclude in Section 7.

Related Work
CS is employed by speakers to signal a common multilingual identity (Auer, 2005), and can be effectively used to reduce (or increase) the perceived social distance between the speakers (Camilleri, 1996). As a marker of informality, it has been shown to lower interpersonal distance (Myers-Scotton, 1995;Genesee, 1982).
Common structural patterns in CS as well as the choice to switch between languages have been the focus of many linguistic studies (Poplack, 1988) (Auer, 1995). As CS is typically used as a conversation strategy by bilinguals who are proficient in both languages (Auer, 2013), it is not surprising that certain pragmatic and socio-linguistic factors, such as formality of context (Fishman, 1970), age (Ervin-Tripp and Reyes, 2005), expression of emotion (Dewaele, 2010) and sentiment (Rudra et al., 2016), are found to signal language preference in CS conversations. A Twitter study of CS patterns across several geographies (Rijhwani et al., 2017), also suggests that there might be complex sociolinguistic reasons for code-choice. Thus, CS, and the choice of language or code in which one communicates during a multilingual conversation, could be considered a marker of linguistic style.
Communication accommodation theory (Giles et al., 1973;Giles, 2007) states that speakers shift their linguistic styles towards (or away from) each other in a conversation for social effect. In the CAT framework, the interlocutors' desire for 'social approval' results in an attempt to match each other's linguistic style. Accommodation has been studied for many markers of linguistic style like tense, negations, articles, prepositions, pronouns and sentiment (Taylor and Thomas, 2008;Niederhoffer and Pennebaker, 2002).
Since it is possible to convey the same semantic content while widely varying the extent of CS, we also consider code-choice as a linguistic style dimension. Therefore, we expect to observe accommodation in terms of code-choice in similar manner to that of variables for other linguistic styles. While there have been linguistic and small-scale studies (Sachdev and Giles, 2004;Bourhis, 2008;Bissoonauth and Offord, 2001;y Bourhis et al., 2007) that argue for prevalence of code-choice accommodation, there are no large-scale quantitative or computational studies that corroborate this and shed light on the various patterns of code-choice accommodation. Further, these studies rely on simple correlation-based measures.
The first computational study of linguistic style accommodation  shows that it is highly prevalent in Twitter conversations. They use binary features for the presence of various psychologically meaningful word categories as described by the LIWC method(Tausczik and Pennebaker, 2010) to identify stylistic variations in tweets. They then define a probabilistic framework that mathematically models style accommodation in terms of the likelihood of an addressee to respond in the same style as the speaker.
Though CS is similar enough to other kinds of linguistic style to allow analysis using the same framework, it also differs from them in being a strong sociological indicator of identity (Auer, 2005) and in not being processed nonconsciously (Levelt and Kelter, 1982). We demonstrate that a model that does not account for these crucial differences fails to capture the accommodative patterns of code-choice. Because of being processed consciously, code-choice also exhibits accommodation over several conversational turns, an effect which is not observed as strongly for other style dimensions . Long-term effects in accommodation have received very little attention, and have mostly studied based on crude conversation-level correlation values (Niederhoffer and Pennebaker, 2002).

Accommodation of Code-Choice as Linguistic Style
As a first step, we adapt an existing framework (Danescu-Niculescu-Mizil et al., 2011) that quantifies accommodation of a given linguistic style. Any linguistic feature is said to exhibit accommodation if it is more likely to be expressed in response to a dialog that also expresses it, than otherwise. In other words, an accommodative feature in a dialog begets the same feature in the next dialog. We use the term 'dialog' or 'turn' to refer to a single spoken utterance or dialog within a conversation, and the term 'speaker' to refer to conversation participants. This framework thus restricts the definition of accommodation to only single-turn effects.

Measuring Accommodation
Mathematically, let F denote some binary feature over a dialog (we describe the features themselves in Section 3.2 below). F is said to exhibit accommodation if the likelihood of a user expressing F increases when F has been expressed in the previous dialog. We define the degree of accommodation as follows Here, dialog d i−1 immediately precedes dialog d i , and δ d F is the event that the dialog d exhibits F . The first term can be thought of as the reciprocity over F . The second term is the fraction of dialogs in the corpus for which F = 1, which is also the empirical probability of observing F in a dialog d.
Instead of computing these likelihoods over the entire corpus, we could also compute them individually for each speaker, and doing so yields a fairer condition for accommodation. Different speakers can have widely different base likelihoods. This metric requires an average speaker to reciprocate more than their own (individual) baseline likelihood of expressing F , rather than simply more than the population baseline. Denoting the event that a dialog d is spoken by a speaker s as δ S(d)=s , we redefine accommodation as follows (E s denotes an expectation over all speakers s)

Measuring Code-Choice
Our general hypothesis is that code-choice is reciprocated in a bilingual conversation. To measure this, we introduce simple binary features for presence of each code, along the lines of the binary features in , with individual language expression substituting for the style dimensions. For each language L, we define a feature F L indicating, for a dialog d, if the dialog contains words in the language L. The event that dialog d is at least partially in L, is denoted by δ d F L . In other words, δ F L d is true if the language L is expressed in dialog d, and false otherwise.

Data
We employ two datasets of bilingual conversations, each in a different conversational context and a different pair of languages, to test the occurrence of code-choice accommodation. Table 1 reports the number of dialogs and words for the two datasets, and the fraction of words that are in English.

Hindi Movies
The data comprises of scripts of 32 Hindi movies released between 2012 and 2017. 17 of these scripts were collected by Pratapa and Choudhury (2017) from scripts posted online 1 . We collected 15 scripts of our own from a similar online source 2 and parsed them replicating the methodology of Pratapa and Choudhury (2017). All the scripts have word-level language tags as created by the language identification system from (Gella et al., 2013). The language labels on manual inspection were found to have significant amount of noise, we corrected frequently observed errors with manual supervision.
Each dialog is assumed to be in response to the immediately preceding dialog within a scene. We restrict our analysis to dialogs that are between no more than two speakers, to avoid confounding effects of multi-party conversations on accommodation. This also filters out most dialogs in the scripts which are not conversational in nature.
Movie conversations, even though imagined, are designed to sound natural, and therefore, are suitable for studying style accommodation, as is argued in Danescu-Niculescu-Mizil and Lee (2011), and also multilingualism (Bleichenbacher, 2008) and code-choice (Vaish, 2011). It is true that movie dialogs promote stereotypes that may affect characters' expression of code-choice, however accommodative effects can still be expected to play out largely independent of such stereotypes. There have been several linguistic and quantitative studies on Hindi-English CS in Hindi movies (Parshad et al., 2016;Lösch, 2007;Pratapa and Choudhury, 2017).

Bangor Corpus
We use the Bangor Miami corpus 3 of wordlevel language labeled transcripts of spoken conversations between Spanish-English bilinguals in Florida, US. The original dataset contains 56 conversations, from which we selected 40 conversations that have non-trivial amount of English and Spanish, and sufficient dialogs from each speaker. Figure 1 shows the fraction of Spanish used by a dyad of speakers in a sample conversation from this dataset (the complement fraction being English). Intuitively, we expect our metrics to capture how coordinated two speakers are. Table 2 shows the metrics from Section 3.1 computed over the features in Section 3.2 on the two datasets.

Results
While these numbers do suggest that accommodative effects are present, they seem to be fairly weak. The rate of reciprocation is only slightly higher than the base rate, and in some cases the difference isn't statistically significant.
However, looking at individual differences in these values reveals an interesting observation. For each speaker s in the Movies dataset, we 3 http://bangortalk.org.uk/speakers.php?c=miami Figure 1: Fraction of Spanish over time in a conversation. The x-axis denotes consecutive dialog pairs, with dialog i above aligned with dialog i + 1 below, so two aligned bars denote two consecutive dialogs.  plot in Figure 2, the rate of accommodation by s, Acm s (F ), against the respective base rate P (d F s ), for F ∈ {F En , F Hi }.
Clearly, we see that a high base rate of expression corresponds to far less accommodation. In other words, the instances of code-choice that are uncommon and therefore unexpected within the conversational context are likely to be accommodated for. In a conversation that is predominantly in Hindi, a dialog uttered in Hindi carries little salience and doesn't stand out. This code-choice is unlikely to be registered as a communicative signal or a marked expression of any linguistic style, and therefore wouldn't elicit accommodation. English and Spanish are respectively less common in Movies and Bangor, and indeed their rates of accommodation are higher than the rates for the corresponding dominant languages.
Since the metrics in Section 3.1 compute likelihoods over all instances of code-choice irrespective of salience, the observed rates of accommodation are low. We borrow the notion of markedness of code-choice, as described in Myers-Scotton (2005), and incorporate it into our framework, as Figure 2: Variation of accommodation rate against base rate. Observed rate (x + y) can vary between 0 and 1. The highlighted region denotes positive accommodation and a low base rate (x < 0.5 and y > 0). In contrast, all other regions, as demarcated by dashed lines, are sparser. described in the next section.

Code Salience
As shown earlier, measuring accommodation makes sense only over marked instances of codechoice. Thus, for every conversation in our dataset, we identify the marked language, and measure accommodation only over that language. We choose a conversation as the unit for deciding if a code is marked because the set of speakers and the conversation context typically dictates code-choice in multilingual societies.
A language is considered marked if it is the non-dominant language -we keep the threshold of markedness at no more than 40% of total words in the entire conversation. We discard highly mixed conversations where none of the languages meets the threshold. This consideration also makes the calculation of accommodation more robust, as for a high fraction of incidence of a code, the effect of the previous turn would be harder to isolate.

Threshold of Occurrence
Another limitation of the formulation in Section 3 is that it doesn't incorporate the extent of presence of each code in a dialog. Consequently, even named entities, frequently borrowed words and frozen expressions from the marked language, would be considered as candidates for accommo-dation. The Bangor corpus came with namedentity tags, and in the Movies corpus we removed all character names from the dialogs, but we were not aware of any NER system for Hindi-English CS data that we could have used to remove other named entities. Ideally, we would like to exclude all such words from the triggers expected to elicit accommodation, as their usage isn't stylistically marked (Auer, 1999). The word-level language tags also have some amount of noise, and it is desirable to use features that are resilient to it.
Besides, it is possible that a relatively high incidence of marked code in a dialog is perceived as a stronger style marker, and is perhaps accommodated for more strongly than a lower incidence. We introduce a simple fraction-based thresholding that allows us to test the same.
For every dialog d, we define feature F L,τ such that d F L,τ = 1 if and only if (a) d is sufficiently long and (b) fraction of words of d in the marked language L is more than τ . We consider an utterance to be sufficiently long if it contains more than 4 words, as this is expected to filter out most frozen expressions and named entities that may be borrowed from one language to another. We show results for accommodation of F τ for τ ∈ {0, 0.2, 0.5}. While F 0 would capture presence of even one word in a marked code, F 0.2 represents a non-trivial occurrence and F 0.5 represents majority occurrence of the marked code in context.

Beyond Immediate Accommodation
The metrics in Section 3 and those in  only consider the immediate next turn as a candidate for reciprocation. However, it is possible for accommodative effects to span a few conversation turns. Consider the following snippet from one of the conversations in Bangor (Spanish code is in bold and its translation is in italics).
In cases like this, the content of the conversation prevents a possibility of accommodating immediately, but the speaker Sarah still reciprocates Paige's code-choice at the first instance possible. We can test if such cases of delayed accommodation are indeed common in the data, by extending our formulation to an arbitrary number of turns. We extend Equation (2) below, and Equation (1) can be extended analogously.

Generalization of Immediate Accommodation
The baseline rate of a speaker s using a feature F across n (consecutive) turns is the likelihood that at least one the n turns expresses F , and is given by . For a speaker s, the rate of n-turn accommodation is the increase in likelihood of occurrence of F in either of the n dialogs d s,1 to d s,n , conditioned on the event that the preceding dialog d 0 expresses F .
When n = 1, this resolves to Equation (2). Note that d 1 to d n are the first n dialogs spoken by s immediately after the dialog d 0 . As before, E s denotes expected value over all speakers.

Accommodation as Retrieval
Responding to marked code-choice with marked code-choice can be thought of or reformulated as a retrieval task. For a speaker s, each instance of a dialog addressed to s with a feature F would be a query posed to s. The next n dialogs spoken by s would be the top-n retrieved responses to the query. We are interested in the retrieval of responses that also have feature F , so we call a response with feature F to be relevant response and irrelevant otherwise, in keeping with the standard terminology in information retrieval. We consider s to have retrieved a relevant response in n-turns if at least one of the first n responses is relevant.
When formulated this way, the recall of s, the probability of retrieving a relevant response, is precisely equal to the first term in Equation 3, the probability using F in responding to a dialog c F . The second term in Equation 3 is the expected value of recall under the independence assumption, i.e., if s randomly introduces marked code at every turn with probability p s . Therefore, a speaker is accommodative if their recall is higher than that of this random baseline.
A popular metric to evaluate retrieval systems is the mean reciprocal rank (MRR). The reciprocal rank of a query response is the multiplicative inverse of the rank of the first relevant response. The MRR of a system is simply the mean of the reciprocal ranks of all its responses. Since we expect the accommodative speaker to have a higher recall than the random baseline, we also expect the accommodative speaker to have a higher MRR, with the difference from baseline MRR being proportional to its accommodativeness.
Not only does this present an alternative view of accommodation and exposes well-studied formalisms and concepts from information retrieval, but the ability to capture speakers' styles as response characteristics also facilitates predictive conversational modelling.
Mean reciprocal ranks for the random baselines can be computed analytically as follows. We first compute the expected reciprocal rank r for any given query as a function of the correctness probability p s . For the first relevant response to be at rank i, all previous responses must be irrelevant. Since each response is relevant with a probability p s , the probability of the i-th response being the first relevant response is given by : The baseline MRR of a speaker s, denoted by Base s , is then the expected value of r, also as function of p s : The overall baseline MRR, Base is then simply E s (Base s ). We compare the observed MRR on the data (denoted by Obs) with the expected MRR of the random baselines (Base), with their difference being indicative of the degree and immediacy of accommodation.
(a) Bangor; L = Es for solid lines (significant for n < 6) and L = En for dashed (significant for n < 4).
(b) Movies; L = En. Significant for n < 4. Figure 3: Accommodation rates (Acm * n (F L,τ )) versus n. Red, green and blue lines indicate τ = 0, 0.2 and 0.5 respectively. Accommodation of Hindi is not significant. Figure 3 shows the trends in Acm * n (F L,τ ) for different values of n, L and τ . Significance scores are computed in the same way as for Table 2. Table 3 shows the real and baseline MRR values for each corpus over different values of τ .

Results and Observations
It is evident that accommodation of code-choice is a prevalent and robust phenomenon. The values of accommodation are consistently positive for all the different marked-code features, languages and datasets, and for low values of n.
In Table 3, the less common codes in each dataset, Es and En respectively, have a lower baseline while having comparable or even higher observed MRRs as their more common counterpart. This reiterates that accommodation is more pronounced for more marked codes.
From Figure 3, a higher fraction of marked code (τ = 0.5) does not seem to elicit stronger accommodation than τ = 0. However, it is important to note that the base rate for F 0 is much higher than that of F 0.5 , so in relative terms, the latter exhibits  a stronger tendency to accommodate (since the increase over respective base rate is identical). The difference between the retrieval characteristics for the different thresholds is more salient in Table 3 -higher thresholds correspond to a smaller average likelihood, and lower baseline MRRs. The difference between observed and baseline MRR does slightly increase with τ , making higher fraction of marked code somewhat more accommodated for. In contrast to English, the accommodation for Hindi code-choice in conversations dominated by English is not significant. This suggests that Hindi code isn't marked even when it is the minority code in a scene, an inference that aligns with the claim from Myers-Scotton (2005) that Hindi is not marked in Hindi movies, even when it is the nondominant language in context.
Hindi in Movies and English in Bangor have a lower strength of accommodation than their respective counterparts, even when measured over conversations where they are uncommon. Not only is accommodation stronger for Spanish, it also persists for more number of turns as compared to English. This suggests that the context of markedness is larger than the immediate conversation, and the being the dominant language of the corpus as a whole reduces markedness.
In most cases, accommodation is salient and significant even after a few turns. Delayed accommodation is as prevalent as immediate accommodation. And the likelihood of a given speaker reciprocating code-choice in kind, remains significant for several turns in a conversation.

Discussion
Accommodation is prevalent and robust, but not universal. While it is observed across conversations spanning different media and language pairs, there is significant variation among speakers within a dataset. As many as 18% of the speakers exhibit what may be considered negative accommodation, or non-accommodation. Half of these do so with a value of Acm * 1 (F 0 ) less than −0.10. It is in fact known that accommodation or convergence is neither a universal nor a positive interpersonal strategy (Genesee and Bourhis, 1988;Giles et al., 1991;Burt, 1994). In-group/out-group identity as well as attitudes towards CS and the languages involved can cause negative accommodation as well as a negative perception of accommodation. Burt (1994) show that while convergence is largely viewed positively, some multilingual speakers may oppose it as either misplaced solidarity with an in-group, or a slur on the language capability of an interlocutor.
While we work under the assumption that codechoice is a style dimension, largely independent of content, it is in fact influenced by factors like topic (Sert, 2005) and sentiment (Rudra et al., 2016). These influences could either align or compete with the socially accommodative code-choice, and this explains several-turn accommodation -it is not always possible to accommodate immediately. The difference between code-choice and other linguistic style markers is also indicated by the poor results of Section 3, which naively applies the style accommodation framework to code-choice.
It is worth noting that the baselines throughout the paper assume that speakers do not adjust their overall rate of employing a particular code, in order to accommodate. This is in fact a fairly strict assumption. In fact, the same speaker typically has widely varying base rates in conversations with multiple other speakers. The extent of marked code to be used is itself often negotiated within a conversation, and adjusting one's base rate can be construed as accommodation, and harder to analyze. Nevertheless, this assumption gives us a strong and realistic baseline to judge the observations against.
One limitation of our formulation is that we do not look at individual words. Word or code saliency in context is actually more complex that just language saliency in current conversation. Some words are more marked than others, with borrowed words carrying very little salience. It would be nice to have more complex features, aware of the syntactic structure of dialogs. It would also be worthwhile to apply this formula-tion to study conversation-wide accommodation effects and convergence of code-choice at scale.

Conclusion
We demonstrate that code-choice is a marker of linguistic style, and when it is marked in context, it is interpersonally accommodated for. We extend the probabilistic formulation to multiple conversation turns, and show equivalence with a retrieval task, both facilitating better conversational analysis of code-choice in particular and style interactions in general.
In the future, we would like to use richer and linguistically motivated features for code-choice, including parts-of-speech, and indicators of borrowing across languages. Another generalization would be to also study LIWC words and markers of sociolinguistic style in this framework. Finally, longer-term accommodation effects, like convergence being succeeded by divergence, or topical effects on convergence, remain to be explored using a quantitative method like ours.