Characterizing Interactions and Relationships between People

This paper presents a set of dimensions to characterize the association between two people. We distinguish between interactions (when somebody refers to somebody in a conversation) and relationships (a sequence of interactions). We work with dialogue scripts from the TV show Friends, and do not impose any restrictions on the interactions and relationships. We introduce and analyze a new corpus, and present experimental results showing that the task can be automated.


Introduction
People interact with each other and as a result form relationships. These relationships range from weak (e.g., John talking to a waiter to order a drink) to strong (e.g., John and his best friend discussing career options). Traditionally, information extraction systems target, among others, relationships between people, e.g., PARENT, SIBLING, OTHER-PERSONAL, OTHER-PROFESSIONAL (Doddington et al., 2004).
Extracting a label describing the general relationship between two entities-often called relation type-is useful for tasks such as summarization (Jijkoun et al., 2004) and question answering (White et al., 2001). Only assigning a relation type, however, does not account for nuances in the relationship between two individuals. First, a relationship can be characterized beyond a relation type. For example, people having a PRO-FESSIONAL relationship may be spatially near or distant (working at the same or different offices), and have an equal or hierarchical relationship (two software developers or a developer and the CEO). Second, relationships are defined by multiple interactions, and the fine-grained characteristics of interactions do not necessarily mirror the characteristics of the corresponding relationship.
For example, software developers having a cooperative PROFESSIONAL relationship may have a heated interaction in a meeting that does not affect the long-term PROFESSIONAL relationship. Similarly, the same software developers having a taskoriented PROFESSIONAL relationship may have occasionally pleasure oriented interactions (e.g., when they go out for drinks on Fridays).
This kind of fine-grained characteristics of interactions and relationships are called dimensions in social science (Wish et al., 1976). Social scientists have also studied language usage and how people interact with each other depending on their relationship. For example, Gibbs (2000) studies irony (sarcasm, hyperbole, understatement, etc.) in communications among friends, and Snyder and Stukas Jr (1999) analyze the expectations in social interactions (e.g., interactions between strangers tend to be more formal) as well as the consequences of breaking expectations. In the social sciences, however, researchers mostly focus on how people act (e.g., how they talk to each other and about each other) and how people perceive interactions and relationships. In general, they do not attempt to automatically characterize interactions or relationships from language usage.
In this paper, we characterize the interactions between people and the resulting relationships. The main contributions of this paper are: (a) a set of dimensions to characterize interactions and relationships, including dimensions previously defined in the social sciences and two novel dimensions; (b) annotations of these dimensions for all interactions and relationships in 24 episodes of the TV show Friends (Season 1); (c) corpus analysis including agreements, label distributions and correlations; and (d) experimental results showing that classifiers grounded on language usage (and discarding the names of people being considered) are successful at automating the task.

Previous Work
Within natural language processing, there have been several recent efforts working with relationships between people. Voskarides et al. (2015) extract human-readable descriptions of relations in a knowledge graph. Unlike the work presented here, they experiment with a proprietary knowledge graph and rely heavily on features extracted from the graph. Iyyer et al. (2016) propose an unsupervised algorithm to extract relationship trajectories of fictional characters. Bracewell et al. (2012) introduce social acts (e.g., agreement, undermining) designed to characterize relationships exhibiting adversarial and collegial behavior (similar to our cooperative vs. competitive dimension). None of these works distinguish between interactions and relationships, characterize interactions and relationships with dimensions, or consider all interactions between two people.
In our previous work, we characterize interpersonal relationships with dimensions (Rashid and Blanco, 2017). In this paper, we improve upon our previous effort as follows. First, we distinguish between interpersonal interactions and relations. Second, we work with dialogues thus the same people interact with each other many times.
There have been a few studies on analyzing language usage when people communicate. For example, Danescu-Niculescu-Mizil et al. (2012) study how power differences affect language style in online communities, and Prabhakaran and Rambow (2014) present a classifier to detect power relationships in email threads. Similarly, Gilbert (2012) explores how people in hierarchical relationships communicate through email, and Bramsen et al. (2011) focus on identifying power relationships in social networks. Politeness in online forums has also been studied (Danescu-Niculescu-Mizil et al., 2013). While power (similar to our equal vs. hierarchical dimension, Section 3) and politeness could be considered dimensions, these works exploit structural and linguistic features derived from communications between two individuals. Unlike all of them, we distinguish between and characterize interactions and relationships, and automate the task using only information derived from language usage.
Information extraction systems target, among others, relationships between people. There have been many evaluations (Grishman and Sundheim, 1996;Doddington et al., 2004;Kulick et al., 2014;Surdeanu and Heng, 2014), and there are two main approaches. Traditionally, relationships are defined before training takes place (e.g., PAR-ENT, FRIENDS), and systems are trained using supervised machine learning (Yu and Lam, 2010;Nguyen et al., 2016;West et al., 2014). On the other hand, open information extraction (Wu and Weld, 2010;Angeli et al., 2015) has emerged as an unsupervised domain-independent approach to extract relations. Regardless of details, these previous works extract explicit relationships and do not attempt to characterize instances of relationships with dimensions. Additionally, they do not distinguish between interactions and relationships.

Interpersonal Interactions and Relationships
In this paper, we work with transcripts of conversations and define interaction and relationship as follows. An interaction between two people x and y exists for each conversation turn by either x or y referring to the other person. A relationship between two people x and y exists if there is at least one interaction between them. One could understand a relationship between x and y as the association defined by a sequence of interactions between x and y. Beyond these definitions, we do not impose any restriction on what constitutes an interaction or relationship: interactions occur each time two people refer to each other in their speech (even if they are not talking to each other), and one or more interactions constitute a relationship. Interactions and relationships between people have been extensively studied in psychology and social sciences in general. The right set of dimensions is not agreed upon (Wish et al., 1976;Deutsch, 2011;Adamopoulos, 2012), and we argue that it depends on the domain of interest (e.g., personal diaries vs. news articles covering politics). We note that dimensions apply to interactions and relationships between specific people (i.e., instances of interactions and relationships), not relation types. For example, a KINSHIP relationship between x and y could be intense or superficial (depending on x and y) and a particular interaction of that relationship may be spatially near or distant (even for the same x and y).
The dimensions we work with are briefly summarized in Table 1 and described below. All but two dimensions are defined in previous work in the social sciences (see references in Table 1).

Annotating Dimensions of Interactions and Relationships
Annotating dimensions of interactions and relationships requires a corpus in which the same people interact several times. We augment an existing corpus of scripts from the TV show Friends (Chen and Choi, 2016). More specifically, we work with the 24 episodes from Season 1 because they: • contain a large number of conversation turns (9,168, see counts per episode in Table 2); • involve many characters (42 characters speak at least 100 conservation turns, see the characters that interact the most in Table 2, and the full list in the supplementary materials); • include speaker information (i.e., we have access to who says what); and • include annotations linking each mention of people in each conversation turn to the actual person (the name of the person). Beyond size, the main motivation to use this corpus is the last item above: starting from scratch with another corpus of dialogues would require a substantially larger annotation effort. We refer the reader to the afore-cited paper for details, but the original corpus clusters mentions to people such as guy, my brother and he together with other mentions of the same person. The original corpus is publicly available, 1 and we release our annotations as stand-alone annotations. 2

Selecting Pairs of People
The corpus we start with makes it straightforward to select pairs of people whose interactions and corresponding relationships will be annotated. We consider as interactions all instances of somebody mentioning (or referring to) somebody else in a conversation turn. We consider as relationships individuals who interact at least once. Note that we do not (a) distinguish between x mentioning y and y mentioning x, and (b) consider as an interaction x talking to y unless the conversation turn contains a mention to y (the mention need not be the actual name, it could be a pronoun or any nominal mention). Our rationale is as follows. First, all dimensions of interactions are symmetric; and all dimensions of relationships are symmetric except equal vs. hierarchical. Second, the characters of Friends refer to each other explicitly at least once in most conversations and scenes, either using first names or the pronoun you. Thus we are considering as an interaction most verbal exchanges. Table 2 shows basic counts per episode. We show the number of interactions, unique relationships (i.e., interactions between unique pairs of people), and the pair of people who interact the most. The supplementary materials include an extended table listing the number of times each pair of people interact per episode.

Annotation Process
The annotations were done one episode at a time. Annotators were presented with the full transcript of the episode including speaker informa- Table 2: Basic corpus counts. We show the number of conversation turns, interactions (i.e., one person referring to another one), unique relationships (i.e., unique pairs of people who interact with each other), and the pairs of people with most interactions. tion (who speaks what?) and the names of the individuals mentioned in each conversation turn (who do speakers talk about?). Annotators read each episode from the beginning, and annotate dimensions of interactions and relationships after each interaction. Regarding interactions, they were instructed to annotate dimensions taking into account the language of the current conversation turn. Regarding relationships, they were instructed to annotate dimensions taking into account all previous conversation turns within the same episode. For example, if previous turns state that Rachel and Monica are best friends, the relationship will continue to be annotated intense even if an interaction does not indicate so (until a turn indicates that they are not friends, if applicable).   We discovered during pilot annotations that the value for a dimension sometimes cannot be determined. For example, if the first interaction between Rachel and Monica is Rachel: How are [you] Monica doing?, we cannot tell if the relationship is temporary or enduring. We note, however, that all interaction after we find out that they are best friends (as long as they remain best friends) will be annotated enduring. Hereafter, we refer to dimensions by the first descriptor in Table 1, and use 1 if the first descriptor of a dimension is true, -1 if the second descriptor is true, and 0 if neither the first nor the second descriptor can be chosen. Annotation Quality. The annotations were done by two graduate students in computational linguis-tics. First, they did pilot annotations to better define the dimensions (Section 3). After several iterations, both of them annotated 3 episodes (15% of all interactions). Table 4 presents the interannotator agreements. Cohen's κ range between 0.77 and 0.89, and most (7 out of 9) are above 0.80, which is considered perfect agreement. Values between 0.60 and 0.80 are considered substantial (Artstein and Poesio, 2008). The remaining episodes were annotated once.

Annotation Examples
We present annotation examples in Table 3. The interactions in conversation turns 1 and 2 are competitive: Phoebe is ridiculing Paul by asking Monica if he eats chalk, and Ross is confronting an unnamed woman. In turn (3), Monica refers to Phoebe with affection (as the latter sleeps), thus the interaction is cooperative. Turns (4-6) exemplify concurrent vs. nonconcurrent. In (4), Rachel is inquiring whether she can meet Alan (Monica's boyfriend), and Rachel and Alan are not involved in the same event (at this point, the meeting may or may not happen). In (5-6), however the speaker and second party are involved directly in a communication event. In examples (4-6), the values for active are the same as for concurrent.
Turns 7  an adult and Monica's father is indeed Mr. Geller. Example 8 is annotated equal, as Chandler and Ross are friends based on previous interactions (the use of pal also helps). Examples 10-12 require more explanation, as additional information beyond the current conversation turn is required (recall that dimensions of relationships are annotated taking into account the previous turns within the same episode, Section 4.2). Dr. Franzblau is the doctor of a friend's ex-wife, so Monica and him have a superficial relationship (Turn 10). At the point Turn (11) is spoken by Chandler, she and Aurora are strangers, so they have a superficial relationship. In (12), previous conversations reveal that Monica and Rachel are close friends, and they interact often (intense).

Corpus Analysis
The pie charts in Figure 1 present the label distributions per dimension. Regarding interactions, we note that (a) values for all dimensions can be determined almost always (the percentages of 0 (unknown) are almost zero), and (b) the first descriptor is much more common in all dimensions. These percentages do not represent the distribution of interactions between people in general: the scripts of the TV show Friends mostly contain conversation between friends. Regarding relationships, we observe a larger percentage of 0 (unknown) although values of all dimension can be determined most of the time (labels 1 and -1, indicating that the first or second descriptor apply  enduring), especially pleasure oriented and equal (91.2% and 84.8%). Again, these distributions would be different if we worked with other sources of dialogue than the TV show Friends.
Many of the dimensions we consider in this work are intuitively correlated. For example, concurrent interactions must be active, and pleasure oriented interactions are probably also equal. We note, however, that interactions can be passive and concurrent, e.g., in (Monica talking to Joey) [He] Paul is just [a guy] Paul I am dating!, Monica and Paul have a passive and concurrent interaction (they are dating, but they are not talking to each other). Table 5  of 36), although some pairs do have high correlations. In particular, active interactions tend to be both concurrent (0.81) and spatially near (0.71), and spatially near interactions tend to be concurrent (0.89). Regarding relationships, intimate correlates with intense (0.75), pleasure oriented with equal (0.50), and temporary with both superficial (0.52) and intimate (0.54).
Finally, Figure 2 shows the most salient words of dimensions spatially near and intense. We calculated salience using tf-idf (Schütze et al., 2008). Interactions containing derogatory words (e.g., pig, bugs, pretending, cheating) tend to be distant, and near interactions contain mostly neutral and nicer words such as friends, sweatheart and please. We also note that cognitive verbs and nouns (e.g., thinking, figured (out), looking (into), cause), as well as important events (birthday, thanksgiving) and slang usage (e.g., whaddya) signal intense relationships.

Experiments and Results
We experimented with SVM classifiers with RBF kernel to predict dimensions of interactions and relationships. We divided the 24 episodes into train (episodes 1-20) and test (21-24), and trained one classifier per dimension using scikit-learn (Pe-  Table 6: Results obtained with the test set with several systems (average of all dimensions). Previous refers to the previous conversation in which the same pair of people interacted not the immediately previous turn). dregosa et al., 2011). Each classifier is trained with three labels: 1 (1st descriptor), -1 (2nd descriptor) and 0 (unknown). The SVM parameters (C and γ) were tuned using 10-fold crossvalidation with the train split, and results are reported using the test split.
Note that different pairs of people interact more or less in each episode (Table 2). Thus, the classifiers are grounded on general language usage and not modeling who talks and who is talked about. We also experimented with LSTMs taking as input the current conversation turn and previous turns, but do not report results because SVM classifiers yielded better results.  Table 7: Results obtained per dimension with the best system (all features, Table 6). The results under All the weighted averages for all labels, recall that the label distribution is biased (Figure 1).
Feature Set. We use a combination of features extracted directly from the conversation turn, sentiment lexica and context. Specifically, we extract (a) the first word in the conversation turn, (b) bag-of-words features (binary flags and tf-idf scores), and (c) the root verb, and flags indicating the presence of exclamation, question marks and negation cues from (Morante and Daelemans, 2012) (other). Regarding sentiment, we extract flags indicating whether the turn has a positive, negative or neutral word in the list by Hamilton et al. (2016), the sentiment score of the turn (summation of sentiment scores per token over number of tokens in the turn), and a flag indicating whether the turn contains a negative word from the list by Hu and Liu (2004). Regarding context, we extract bag-of-words features from the previous conversation turn in which the same people interact (not necessarily the preceding turn). Table 6 shows the overall results (average of all dimensions) obtained with the majority baseline and several feature combinations. All feature combinations outperform the baseline. Sentiment features are not beneficial, leading to the conclusion that sentiment does not correlate with dimensions of interactions and relationships between people. This may look surprising at first sight, but recall that our dimensions capture much more than if two people get along (Table 1). Finally the bagof-words features from the previous turn in which the same people interacted bring a small improvement (F: 0.70 vs. 0.72). We show results per dimension for the best feature combination (all) in Table 7. Despite the label distributions are biased (Figure 1), the system predicts most labels for most dimensions except the very biased ones (cooperative, equal and pleasure oriented). Note that 0 (unknown) does not allow us to determine the value of a dimension, and the low results with this label are not a concern.

Results
7 Conclusions This paper presents the task of characterizing interactions and relationships between people. We work with dialogue transcripts, and define an interaction as a speaker referring to somebody else, and a relationship as a sequence of one or more interactions. Unlike previous work (Section 2), we target all interactions and relationships, and use dimensions that are applicable to any interaction or relationship regardless of the underlying type (e.g., SIBLINGS, FRIENDS, DOCTOR-PATIENT).
We have presented an annotation effort on 24 episodes of the popular TV show Friends (Season 1). The total number of conversation turns is 9,168, and the total number of interactions is 2,331. The label distribution per dimension shows that the labels are unbalanced, but a relatively straightforward SVM is able to outperform the majority baseline (F: 0.62 vs. 0.72, Table 6). Features extracted using well-known sentiment lexica yield no improvements, leading to the conclusion that the dimensions we work with capture information beyond whether two people get along.
Crucially, values for the dimensions we work with can be determined most of the time ( Figure  1, labels 1 and -1). Since we do not impose any restriction on the interactions or relationships we work with, we conclude that these dimensions may be universally applicable.