Not that much power: Linguistic alignment is influenced more by low-level linguistic features rather than social power

Linguistic alignment between dialogue partners has been claimed to be affected by their relative social power. A common finding has been that interlocutors of higher power tend to receive more alignment than those of lower power. However, these studies overlook some low-level linguistic features that can also affect alignment, which casts doubts on these findings. This work characterizes the effect of power on alignment with logistic regression models in two datasets, finding that the effect vanishes or is reversed after controlling for low-level features such as utterance length. Thus, linguistic alignment is explained better by low-level features than by social power. We argue that a wider range of factors, especially cognitive factors, need to be taken into account for future studies on observational data when social factors of language use are in question.


Introduction
The effect of social power on language use in conversations has been widely studied. The Communication Accommodation Theory (Giles, 2008) states that the social power of speakers influence the extent to which conversation partners accommodate (or align, coordinate) their communicating styles towards them. This theory is supported by findings from qualitative studies on employment interviews (Willemyns et al., 1997), classroom talks (Jones et al., 1999), and the more recent data-driven studies on large online communities and court conversations (Danescu-Niculescu-Mizil et al., 2012;Jones et al., 2014;Noble and Fernández, 2015). In particular, Danescu-Niculescu-Mizil et al. (2012) uses a probabilitybased measure of linguistic alignment to demonstrate that people align more towards conversation partners of higher power, i.e., the admin users in Wikipedia talk-page, and the justices in U.S. supreme court conversations, than those of lower power, i.e., the non-admin users and the lawyers.
However, while these results find sound explanations from socio-linguistic theories, they are still somewhat surprising from the perspective of cognitive mechanisms of language production, because the mutual alignment between interlocutors of in natural dialogue can be explained by an automatic and low-level priming process (Pickering and Garrod, 2004). It is known that the strength of alignment is sensitive to low-level linguistic features (e.g., words, syntactic structures etc.), such as temporal clustering properties (Myslín and Levy, 2016), syntactic surprisal measured by prediction error (Jaeger and Snider, 2013), and lexical information density (Xu and Reitter, 2018).
Then why, or under what mechanisms, can alignment be affected by the relatively high-level social perceptions of power as reported? Could it be the case that the effect of power on alignment is actually due to the other low level features in language, such as the ones mentioned above? Is the effect of power still observable, if we control for other factors? How large is the effect? Is it significant enough to be captured by computational measures of alignment? Answering these questions will help clarify the role of social factors in linguistic alignment, and improve our understanding of language production in general.
In this study, we conduct a two-step model analysis. First, we use a basic model that has two predictors, count (number of a certain linguistic marker in the preceding utterance) and power (power status of the preceding speaker), to predict the occurrence of the same marker in the follow-ing utterance. Here, the linguistic markers are derived from 11 Linguistic Inquiry and Word Count (LIWC;Pennebaker et al., 2001) categories (e.g., article, adverb, etc.). With the basic model, the main effect of count characterizes the strength of alignment, and the interaction between count and power characterizes the effect of power on alignment (Section 3). Second, we use an extended model that includes a third predictor, utterance length (It is chosen as a typical low-level linguistic feature, discussed in Section 2.3), on top of the basic model. With the extended model, we aim to examine whether the inclusion of utterance length will influence the interaction between count and power (Section 4). Therefore, we can examine the extent to which the effect of power on alignment is confounded by low-level linguistic features.
To clarify, our goal is not to disprove the existence of social accommodation in dialogue. Nonetheless, it is important to distinguish between what is caused by automatic priming-based alignment and what is caused by high-level, intentional accommodation. As we will discuss, these are different processes with different predictions. Throughout this paper we use the term alignment to refer to the priming-based process, and accommodation to refer to the intentional process.
2 Related Work

Social power and linguistic alignment
The social factors of language use have been widely studied. Communication Accommodation Theory (Giles, 2008) posits that individuals adapt their communication styles to increase or decrease the social distance from their interlocutors. One factor that affects the adaptation of linguistic styles is social power. Typically, people of lower power converge their linguistic styles to those of higher power; for example, interviewees towards interviewers (Willemyns et al., 1997), or students towards teachers (Jones et al., 1999).
More recently, sensitive quantitative methods have been applied to this line of inquiry. Danescu-Niculescu-Mizil et al. (2012) computed the probability-based linguistic coordination measure among Wikipedia editors and participants of the US supreme court, and they showed that people with low power (e.g., lawyers, non-admins) exhibit greater coordination than people with high power (Justices, admins). Using the same data, Noble and Fernández (2015) found that linguis-tic coordination is positively correlated with social network centrality, and this effect is even greater than the effect of power status distinction.
The aforementioned studies do not include lowlevel language features in their analysis and thus overlook the possibility that cognitive mechanisms may be able to more readily explain the data. Importantly, as we will later discuss, these studies use a measurement of alignment that we believe is more appropriately measuring the automatic process, rather than the intentional one.

Quantifying linguistic alignment
A variety of computational measures of linguistic alignment have been developed. Some quantify the increase in conditional probability of certain elements (words or word types) given that they have appeared earlier (Church, 2000;Danescu-Niculescu-Mizil et al., 2012). Some compute the proportion of repeated lexical entries or syntactic rules between two pieces of text (Fusaroli et al., 2012;Wang et al., 2014;Xu and Reitter, 2015). Some use the coefficients returned by generalized linear models (McCullagh, 1984;Breslow and Clayton, 1993;Lindstrom and Bates, 1990) as an index of alignment (Reitter and Moore, 2014). A large body of the existing computational measures intensively use LIWC (Pennebaker et al., 2001) to construct representations of language users' styles, which can be used to measure alignment with distance-like metrics (Niederhoffer and Pennebaker, 2002;Jones et al., 2014). Many of these approaches do not distinguish between different levels of linguistic analysis and different psycholinguistic processes (phonological, lexical, syntactic, etc.), and neither do we. Alignment is consistently present across these levels and processes, although it is not as clear in naturalistic language as it is in the constrained utterances of experiments, particularly at the syntactic level (Healey et al., 2014). We are concerned with the question of whether alignment is a socially linked, intentional adaptation process, as opposed to addressing any particular cognitive model.
More recently,  pointed out that most existing measures are difficult to compare, and emphasized the need for a universal measure. The Hierarchical Alignment Model (HAM;  and Word-Based HAM (WHAM;  use statistical inference techniques, which out-perform other measures in terms of robustness of capturing linguistic alignment in social media conversations. In this study, we choose to use generalized linear models to quantify linguistic alignment, avoiding issues with more complex, and less inspectable models. For instance, the commonly used probability based methods and their more advanced variants (HAM and WHAM) lack the flexibility to jointly examine multiple factors (e.g., speaker groups, utterance length etc.) that influence alignment. Another issue is that they do not take into account the number of occurrences of linguistic markers, which is known to affect alignment (see Section 2.3). Conversely, though linear models do not give an accurate per-speaker estimate of alignment (which we do not need for the purpose of this study), they do provide the ability to examine multiple factors that influence alignment by simply including multiple predictors in the model. As should be clear, a generalized linear model also already takes into account baseline usage with a fitted intercept. Given these considerations, we use generalized linear models for quantitative analysis. The formulation of our models is described in Sections 3.2 and 4.1.

Cognitive constraints on linguistic alignment: why utterance length matters
There are many, at times competing, cognitive explanations of linguistic alignment in both comprehension and production. Jaeger and Snider (2013) explained alignment as a consequence of expectation adaptation, and they found that stronger alignment is associated with syntactic structures that have higher surprisal (roughly speaking, less common). Alignment in language production can also be modeled as a general memory phenomenon (Reitter et al., 2011), which explains a number of known interaction effects. Myslín and Levy (2016) found that sentence comprehension is faster when the same syntactic structure clusters in time in prior experience than when it is evenly spaced in time. Myslín and Levy (2016) cast comprehension priming as the rational expectation for repetition of stimuli. Though this result is not directly related to comprehensionto-production priming, it makes sense to anticipate that production could also be sensitive to the clustering patterns of linguistic elements, because comprehension and production are closely coupled processes (Pickering and Garrod, 2007).
Utterance length, i.e., the number of words in utterance, is a feature that closely relates to both surprisal and clustering properties. Longer utterances tend to have higher syntactic surprisal (Xu and Reitter, 2016a), and it is reasonable to assume they tend to contain more evenly distributed stimuli. Thus, utterance length is a low-level linguistic feature that correlates with many of the causes of alignment. In this way, we use utterance length as a stand-in for low-level linguistic features as a whole when comparing it with social power, a much higher-level feature. Examining alignment (in social science research and elsewhere) therefore calls for controlling sentence length.

Experiment 1: Basic model
In Experiment 1, we justify the practice of using generalized linear models to quantify linguistic alignment. We compare two ways of characterizing the occurrence of LIWC-derived markers in a preceding utterance, binary presence and numeric count, to determine which results in better model. We use an interaction term in the model to quantify the effect of the power status of speakers on linguistic alignment, which serves as the basis for the following sections.

Corpus data
We use two datasets compiled by Danescu-Niculescu-Mizil et al. (2012): Wikipedia talkpage corpus (Wiki) and a corpus of United States supreme court conversations (SC). Wiki is a collection of conversations from Wikipedia editor's talk Pages 1 , which contains 125,292 conversations contributed by 30,732 editors. SC is a collection of conversations from the U.S. Supreme Court Oral Arguments 2 , with 51,498 utterances making up 50,389 conversational exchanges, from 204 cases involving 11 Justices and 311 other participants (lawyers or amici curiae).
A conversation consists of a sequence of utterances, {u i }(i = 1, 2, . . . , N ), where N is the total number of utterances in the conversation. Because people take turns to talk in conversation, u i and u i+1 are always from different speakers. Since our interest here is the alignment between different speakers (as opposed to within the same speaker), we use a sliding window of size 2 to go through the whole conversation, generating a sequence of adjacent utterance pairs, { prime i , target i }(i = 1, 2 . . . N − 1).

Statistical models
We formulate alignment as the impact of using certain linguistic elements in the preceding utterance on their chance to appear again in the following utterance. In the language of generalized linear models, we use the occurrence of linguistic markers in target as the response variable and the predictor is their occurrence in prime. These occurrences can be represented as either a boolean or a count. Thus alignment is characterized by the β coefficient of the predictor, which allows the model to distinguish the prevalence of Occurrence or another feature in primed situations as compared to its prior in the corpus. Factors that may influence alignment (e.g., social power) can then be examined by adding a corresponding interaction term to the model. Our first step, then, is to replicate the previous studies' findings of the effect of social power on alignment. Two models were fitted, predicting the presence of the linguistic marker m in target utterance over its absence. We fit models both corresponding to a binary predictor (C presence ) and a count-based one (C count ). Both models include a second binary predictor, C power , indicating the power status of the prime speaker (high vs. low), and its interaction with C presence and C count , respectively. Additionally, random intercepts on linguistic marker and target speaker are fitted, based on the consideration that individuals might have different levels of alignment towards different markers. C count is log-transformed to maximize model fit according to Bayesian Information Criterion; this is commensurate with standard psycholinguistic practice and known cumulative priming and memory effects. Equation (1) shows the count-based model. To reiterate, the interaction term C count * C power characterizes the effect of power on alignment.

Model coefficients
The main effects of C presence and C count are significant (p < 0.001) and positive in both corpora (SC: β presence = 0.439, β count = 0.291; Wiki: β presence = 0.440, β count = 0.395), which captures the linguistic alignment from prime to target. However, there is difference in how alignment is influenced by power between the two corpora: In SC, C count * C power is significant (β = 0.078, p < 0.001), but C presence * C power is non-significant; In Wiki, on the contrary, C presence * C power is marginally significant (β = 0.014, p = .055), but C count * C power in not significant. No collinearity is found between C count (or C presence ) and C power (Pearson correlation r < 0.2).
To explore why using C presence vs. C count results in different significance levels for SC and Wiki, we fit a individual linear model for each linguistic marker, using 14 disjoint subsets of each corpus. We present the z scores and significance levels of the two interaction terms are reported in Table 1. First, in SC the interaction term C presence * C power is significant for 9 out of 14 markers. In Wiki, C count * C power is significant for 5 out of 14 markers. This suggests that the interaction between the occurrence of linguistic markers and the power status of speakers exists within a subset of the linguistic categories, but not across all of them. Thus, we consider this first experiment a replication of past findings of the effect of social power on alignment: social power has a significant effect across certain markers, but its overall effect is neutralized in the full model since some markers at not significant. This analysis also revealed that C count * C power is more reliable in capturing this effect, which is what we will use in the following experiment. Table 1: Summary of the 14 models that fit individual markers on disjoint data subsets. Wald's z-score and significance level ( * * * for p < 0.001, * * for p < 0.01, * for p < 0.05, and † for 0.05 < p < 0.1) of the interaction terms (C presence * C power or C count * C power ) are reported.

Visualizing the effect of power
To better understand the interaction term C count * C power , we divide the data into two groups by whether C power is high or low, and fit a model on each of the groups. In the models we include only one predictor C count (see Equation (2)). Then we compare the main effects (β 1 coefficients) from the two groups.
Unsurprisingly, the main effects of C count are significant for both groups (p < 0.001). But more importantly, the β 1 coefficients of the high power group are larger than those of the low power group. For SC, the difference is very salient: β . This is in line with the nonsignificant coefficient of C count * C power in Wiki. In fact, the models of Wiki are fitted on a subset of data that contain the 5 (out of 14) markers that have significant coefficients of C count * C power in the individual models shown in Table 1 (certain, excl, incl, ipron, negate), so that the difference in slopes is presented at maximal degree.
In Figure 1 we illustrate the β high and β low coefficients of C count by plotting the predicted probability (the reversed logit transformation of the lefthand side term of Equation (2)) against C count (logtransformed). It is obvious that the slope of β high is larger than that of β low (more salient in SC), indicating the significant interaction between C count and C power .  (2)) against the number of markers in prime, i.e., C count (log-transformed), grouped by the power of prime speaker, i.e., high vs. low. Divergent slopes indicate significant interactions. Colored hexagons indicate the number of data points within that region.

Discussion
The occurrence of linguistic markers in prime is a strong predictor of whether the same marker will appear again in target. The coefficients of C count can be viewed as indicators of the linguistic alignment between interlocutors: larger positive βs indicate stronger alignment, while smaller or even negative βs indicate weaker and reverse alignment, respectively (not found in our data).
Our results confirm the previously reported effect of power on linguistic alignment. The significant β ′ coefficient of C count * C power means that the β of C count is dependent on C power . In other words, the strength of alignment varies significantly depending on different power levels (i.e., high vs. low) of the prime speaker (reflected by the different slopes in Figure 1). However, we need to keep in mind that this affirmative finding is not safe, because it based on a simple model that has only one key predictor, C power . According to our hypothesis, the strength of alignment can be influenced by a lot of low-level linguistic features, and we are not sure yet if the effect of power will still be visible after we includes more predictors representing those features. This will be the next step experiment.
Additionally, the results also suggest that the influence of power on linguistic alignment is better characterized by the more fine-grained cumulative effect of linguistic markers than when it is simply explained by the mere difference between their absence or presence. Thus, we will discard C presence and proceed with C count .

Experiment 2: Extended model
In our first experiment, we replicated the effect of prime speakers' power status on the linguistic alignment from target speakers, from the significant interaction term C count * C power . Now, we want to determine if the effect of power remains significant after taking into account utterance length. As discussed, our hypothesis is that alignment (as measured by changes in probability of using LIWC categories) is best explained by low-level linguistic features that would be taken into account by an automatic priming process.

Statistical models
We add a new predictor to Equation (1), C pLen , which is the number of words in prime, resulting in an extended model shown in Equation (3). We are interested to see if β 4 remains significant when the other interaction terms (with corresponding coefficients β 5 , β 6 and β 7 ) are added. logit(m) = ln p(m in target) p(m not in target) = β 0 + β 1 C count + β 2 C power + β 3 C pLen + β 4 C count * C power + β 5 C count * C pLen + β 6 C power * C pLen + β 7 C count * C power * C pLen (3) Note that we used the same subset of Wiki as used in Section 3.4 (using the five most significant LIWC categories), so that the strongest effect of C count * C power is considered.

Model coefficients
The coefficients of the full model are in Table 2. Surprisingly, the coefficient of C count * C power is significantly negative in SC, and non-significant in Wiki (see highlighted rows), which are in contrast to the positive coefficients of the same term Table 2: Summary of the model described in Equation (3): β coefficients, Wald's z-score and significance level ( * * * for p < 0.001, * * for p < 0.01, * for p < 0.05) for all predictors and interactions.  Table 1. It indicates that the observed effect of power on alignment depends on the presence of C pLen in the model. No collinearity is found between C power and other predictors: Pearson correlation r < 0.2; Variance inflation factor (VIF) is low (< 2.0) (O'brien, 2007).

Corpus Predictor
To further demonstrate how the coefficient of C power * C count is dependent on C pLen , we remove C count * C pLen , C power * C pLen and C count * C power * C pLen from Equation (3) stepwisely, and examine C count * C power in the corresponding remaining models. z-scores, significance levels, and the Akaike information criterion (AIC) score (Akaike, 1998) of the remainder models are reported in Table 3. In the full model, and when C count * C power * C pLen or C count * C pLen is removed from the model, the coefficients of C power * C count are significantly negative in SC and non-significant in Wiki. Only when C power * C pLen is removed, the coefficients of C count * C power become significantly positive (the last two rows in Table 3). However, the models that have negative or non-significant coefficient for C power * C count have lower AIC scores than those that have positive coefficient (The full model has the lowest AIC score), which indicates that the former ones have higher quality. Altogether, the stepwise analysis not only indicates that the positive interaction between C power and C count shown in our basic model (Section 3) is unreliable, but also suggests that a negative interaction (SC) or non-significant interaction is more preferable.

Visualizing interaction effect
To illustrate how the interaction C power * C count diminishes after adding C pLen into the extended model, we cluster different ranges of C pLen and determine how the amount of priming changes with C count w.r.t. different combinations of C power and C pLen . This is a common practice to interpret linear models with three-way interactions (Houslay, 2014).
To cluster, we first compute the mean of C pLen (i.e., the average utterance length), M pLen . Then we divide the data by whether C pLen is above or below M pLen . Then we compute the mean of C pLen for the upper and lower parts of data, resulting in M L pLen and M S pLen respectively (L for long and S for short). Now, we can replace the continuous variable C pLen to a categorical and ordinal one that has two values, {M S pLen , M L pLen }, which represent the length of relatively short and long utterances respectively. Together with the other categorical variable, C power , which has two values, high and low, we have four combinations: C pLen = M S pLen and C power = high (SH), C pLen = M L pLen and C power = high (LH), C pLen = M S pLen and C power = low (SL), C pLen = M L pLen and C power = low (LL). In Figure 2 we plot the smoothed regression lines of predicted probability against C count , w.r.t. the above four groups of C pLen and C power combinations. Here C count is not log-transformed, because it better demonstrates the trend of the fitted regression lines. Figure 2 intuitively shows that C pLen is a more determinant predictor than C power . Division by power, i.e., high (SH and LH groups) vs. low (SL and LL groups), does not result in a salient difference in slopes, as it can be seen that the slopes of high (solid) and low (dashed) power lines do not differ much from each other within the same prime utterance length group (indicated by color). However, division by prime utterance length, i.e. short (SH and SL) vs. long (LH and LL), results in very significant differences in slopes: in Figure 2a, short C pLen group (orange) has larger slopes than long C pLen group (blue), while in Figure 2b, short group has smaller slopes than long group.

Discussion
Adding C pLen to the model has strong impact on the previous conclusion about the effect of power on alignment. First of all, we find a negative interaction between C count and C power in SC and a nonsignificant effect in Wiki, which is contrary to the previous findings reported by Danescu-Niculescu-Mizil et al. (2012). Moreover, we doubt the reliability of a positive interaction because the valence of its β varies when other interaction terms (associated with C pLen ) are removed or added, and a negative or non-significant interaction is preferred Table 3: Wald's z-score and significance level ( * * * for p < 0.001) of the C count * C power term, and the AIC scores of the remainder models after removing other interaction terms from the full model stepwisely. The full model is described in Equation (3).

Remainder model SC Wiki
z-score AIC z-score AIC Full -9.95*** 697588 0.14 890685 Full − C count * C power * C pLen -8.75*** 697609 -0.62 890686 Full − C count * C power * C pLen − C count * C pLen -5.61*** 697838.9 -0.74 890723.5 Full − C count * C power * C pLen − C power * C pLen 10.90*** 698254.7 3.85*** 890726.7 Full − C count * C power * C pLen − C count * C pLen − C power * C pLen 15.02*** 698461.8 3.72*** 890763.8 by a simple model selection criterion. Second, there is a significant interaction between C count and C pLen , though it is in different directions for the two corpora: negative β in SC and positive β in Wiki. Both observations have some theoretical justification from previous studies. Myslín and Levy (2016)'s work is in favor of the negative β: language comprehension is facilitated by the clustering of linguistic stimuli in time. In our case, the linguistic markers in the utterance of speaker A function as stimuli to speaker B. A longer utterance means that all the stimuli span wider in time, and thus demonstrate less clustering, which make the stimuli less salient features for speaker B to adapt to. This in turn causes speaker B to be less likely reuse those stimuli in the near future. Meanwhile, evidence from the line of works on surprisal and syntactic priming supports the positive β. In syntactic alignment, structures with higher surprisal (less common) are associated with stronger alignment (Jaeger and Snider, 2013;Reitter and Moore, 2014). Since surprisal has been found to be closely related with utterance length in dialogue (Genzel and Charniak, 2003;Xu and Reitter, 2016b,a), it is reasonable to expect that longer utterances receive stronger alignment because they contain content of higher surprisal.
The discrepancy between Wiki and SC in terms of the direction of C count * C pLen is an interesting phenomenon to explore, because it can tell us something about how the form of dialogue (Wiki consists of online conversations and SC consists of face-to-face ones) affects the underlying cognitive mechanism of language production.
Regardless, our main finding is that low-level linguistic features, such as utterance length, have a strong effect on linguistic alignment. These effects are an important confound to take into account when examining higher-level features, such as social power. In particular, the effect of social power cannot be reliably detected by linear models once introducing utterance length.
Another interesting piece of result is the significant interaction term C power * C pLen , which implies that the power status of speaker and how long he/she tends to speak are not totally unrelated. Significant but weak correlation are found between C power and C pLen (using Pearson's correlation score): r = −0.059 in SC; r = −0.018 in Wiki. This correlation may show some kind of a linguistic manifestation of social power, but since it is not directly related to the alignment process, we do not further discuss it in this paper.
In summary of the results, we conjecture that the previously reported effect of power (Danescu-Niculescu-Mizil et al., 2012) is likely to be caused by the correlation between power status and utterance length, though further investigation is needed to confirm this. Moreover, utterance length is just one simple factor, and there are many more other linguistic features that can correlate with social power: e.g., the surprisal based measure of lexi-cal information etc.

Conclusion
To sum up, our findings suggest that the previously reported effect of power on linguistic alignment is not reliable. Instead, we consistently align towards language that shares certain low-level features. We call for the inclusion of a wider range of factors in future studies of social influences on language use, especially low-level but interpretable cognitive factors. Perhaps in most scenarios, alignment is primarily influenced by linguistic features themselves, rather than social power.
We are not denying the existence of accommodation caused by the social distance between interlocutors. However, we want to stress the difference between the priming-induced alignment at lower linguistic levels and the intentional accommodation that is caused by higher-level perception of social power. The latter should be a relatively stable effect that is independent on the lowlevel linguistic features. In particular, our findings suggest that the probability change of LIWC categories is more likely to be a case of automatic alignment, rather than an intentional accommodation, because it is better explained by lowerlevel linguistic features (utterance length). Therefore, we suggest that future work on social power and language use should consider other (maybe higher-level) linguistic elements.