Dimensions of Interpersonal Relationships: Corpus and Experiments

This paper presents a corpus and experiments to determine dimensions of interpersonal relationships. We define a set of dimensions heavily inspired by work in social science. We create a corpus by retrieving pairs of people, and then annotating dimensions for their relationships. A corpus analysis shows that dimensions can be annotated reliably. Experimental results show that given a pair of people, values to dimensions can be assigned automatically.


Introduction
The task of information extraction (IE) consists in creating structured representations from unstructured text. These representations usually consist of relations explicitly stated in text, and involve two or more arguments. For example, IE systems would extract SPOUSE(John, Mary) or MAR-RIED(John, Mary, 1994 ) from John and Mary have been married since 1994. IE systems have a long history, and became popular after evaluations such as MUC (Grishman and Sundheim, 1996) and ACE (Doddington et al., 2004).
Traditional IE systems are supervised and extract relations defined before training takes place (Peng and McCallum, 2004). More recently, open IE systems have been proposed to extract all relations explicitly stated in text in an unsupervised manner and without defining relations a priori (Mausam et al., 2012). Regarding interpersonal relations-relations that take as arguments two people-both IE approaches extract relations such as RELATIVE, FRIEND and COMMU-NICATES WITH. Open IE systems are domain independent and would extract, in principle, relations such as CLASSMATES and ADVISOR from students' diaries or biographies of scientists.
While useful for applications such as question answering (Yao and Van Durme, 2014), these dyadic relations only provide a generic understanding of the relationship between two people. For example, COMMUNICATES WITH may relate people who have an intense or superficial relationship (e.g., engaged couples talking about wedding plans vs. home owners discussing remodels with contractors), pleasure-or task-oriented relationships (e.g., friends planning a backpacking trip vs. software developers discussing the next delivery), and may be spatially near or distant (e.g., inviduals having an in-person meeting vs. those exchanging emails or talking on the phone).
These elemental properties of interpersonal relationships are called dimensions in social science, and have been studied for decades (Wish et al., 1976). In those studies, the goal is to understand how relationships are perceived by people, not to extract them. Note that unlike interpersonal relationships, their dimensions are usually implicitly stated in text, thus extracting them is challenging. Also, extracting dimensions of interpersonal relationships requires text understanding beyond the event in which two people participate. As shown above, two people who communicate may have different dimension values depending on what they talk about or the communication device.
In this paper, we target dimensions of interpersonal relationships that characterize the nature of relationships beyond a name per relationship.  (Adamopoulos, 2012), and [4] for (Deutsch, 2011). New indicates a dimension discovered after analyzing several examples and pilot annotations.

Related Work
Extracting relations between entities such as people, organizations and locations is the core goal of the task of information extraction. A few competitions have served as evaluation benchmarks (Grishman and Sundheim, 1996;Doddington et al., 2004;Kulick et al., 2014;Surdeanu and Heng, 2014), and include interpersonal relationships such as BUSINESS, SPOUSE and CHILDREN. Aguilar et al. (2014) compare several evaluations, and automated approaches to relation extraction-also referred to as link prediction and knowledge base completioninclude (Yu and Lam, 2010;Nguyen et al., 2016;West et al., 2014). Open information extraction (Wu and Weld, 2010;Angeli et al., 2015) has emerged as an unsupervised domain-independent approach to extract relations. Regardless of details, all these previous efforts extract explicit relations, and do not attempt to characterize instances of relations with dimensions. Besides extracting relations per se, there have been efforts within computational linguistics involving interpersonal relationships. Voskarides et al. (2015) extract human-readable descriptions of relations in a knowledge graph by ranking sentences that justify the relations. Iyyer et al. (2016) propose an unsupervised algorithm to extract relationship trajectories of fictional characters, i.e., how interpersonal relationships evolve over time in fictional stories. Bracewell et al. (2012) introduce 9 social acts (e.g., agreement, undermining) designed to characterize relationships between individuals exhibiting adversarial and collegial behavior (similar to our cooperative vs. competitive dimension).
Researchers have studied from a computational perspective how people com-municate with each other.
For example, Danescu-Niculescu-Mizil et al. (2012) study how power differences affects language style in online communities, and Prabhakaran and Rambow (2014) present a classifier to detect power relationships in email threads. Similarly, Gilbert (2012) explores how people in hierarchical relationships communicate through email, and Bramsen et al. (2011) focus on identifying power relationships in social networks. Politeness in online forums has also been studied (Danescu-Niculescu-Mizil et al., 2013). While power (similar to our equal vs. hierarchical dimension, Section 3) and politeness could be considered dimensions, these works exploit structural and linguistic features derived from communications between two individuals. Unlike all of them, we extract 9 dimensions of interpersonal relationships from sentences describing an event involving two people, and without needing language samples from them.

Dimensions of Interpersonal Relationships
Dimensions of interpersonal relationships have been studied for decades outside of computational linguistics, mostly in psychology and social science in general (Wish et al., 1976). The set of dimensions is by no means agreed upon, and neither is the terminology to refer to what apparently is the same dimension. For example, the terms dominance, submission, potency, autonomy and control are used to describe the distribution of power in a relationship (Deutsch, 2011). The dimensions we work with in this paper are primarily borrowed from previous works in social science, although we add two new dimensions. Interestingly, the previous works which define these dimensions do so from a theoretical point of view or after conducting experiments with subjects to reveal how they perceive interpersonal relationships. The latter was done using multidimensional scaling analysis after subjects compared 25 relationships, e.g., between a parent and child, between business partners (Wish et al., 1976). Table 1 presents the nine dimensions targeted in this paper along with the original references and aliases found in the literature. Social scientists have proposed additional dimensions, e.g., voluntary vs. involuntary, public vs. private and licit vs. illicit (Deutsch, 2011), or self-benefiting vs. service-oriented (Adamopoulos, 2012). We discarded these additional dimensions because we discovered that they are not applicable to most pairs of people we work with (Section 4).
We provide below brief descriptions of the nine dimensions of interpersonal relationships we work with. Note that these dimensions are not completely independent, for example, enduring relationships are usually intense and intimate, and intense and pleasure-oriented relationship are almost always intimate. Section 4.3 presents examples, and Section 4.4 discusses inter-dimensional correlations and inter-annotator agreement. We point the reader to the references in Table 1

Building a Corpus of Dimensions of Interpersonal Relationships
Existing corpora annotating relations (Section 2) only consider selected interpersonal relationships and do not target dimensions. Our goal is to target dimensions of interpersonal relationships between any two individuals, from weak links (e.g., journalists interviewing celebrities) to strong ties (e.g., close friends). Thus, we create a corpus 1 by first retrieving pairs of people, and then annotating dimensions for their relationships. We decided to add our annotations to OntoNotes (Hovy et al., 2006). Doing so has several advantages. First, OntoNotes contains texts from several domains and genres (e.g., conversational telephone speech, weblogs, broadcast), thus we not only work with newspaper articles. Second, OntoNotes includes part-ofspeech tags, named entities and coreference chains, three annotation layers that allow us to streamline the corpus creation process.

Retrieving Pairs of People
We retrieve pairs of people within each sentence in OntoNotes following four steps: Figure 1: Frequencies of the top 20 most frequent verbs after retrieving pairs of people (Section 4.1). We discard verbs with frequency <4, and randomly select up to 26 pairs per verb for a total of 1,048 pairs.
1. Collect all instances of personal pronouns (part-of-speech tag PRP) I, he and she. 2. Collect all named entities PERSON. 3. Keep one mention per coreference chain, giving priority to named entities over pronouns. 4. Generate combinations of 2 elements from the union of the pronouns and named entities subject to the following constraints: at least (a) one is a PERSON named entity, and (b) one is the nsubj (nominal subject syntactic dependency) of a verb. The elements of the pair that satisfy restrictions (4a) and (4b) need not be the same. Note that removing duplicate mentions (Step 3) does not reduce the number of relationships targeted, it simply avoids duplicate pairs. Also, the only syntactic constraint is that one person in the pair must be the nominal subject of a verb. Thus, we work with relationships between individuals from different clauses, and connected with a variety of syntactic paths (see Examples in Table 2).
The total number of pairs generated skipping Steps 3 and 4 would be 4,886. After removing duplicate mentions (Step 3), the number is reduced to 3,481; restrictions (4a) and (4b) further reduce the number to 3,143 and 2,696 respectively. Executing all steps yields 2,364 pairs. Figure 1 presents verb frequencies for the top 20 most frequent verbs in the 2,364 pairs. In order to reduce the annotation effort and account for a variety of verbs, we set to annotate 1,000 pairs. After trying several thresholds, we retrieved 1,048 pairs by selecting pairs from verbs that occur at least 4 times, and randomly selecting up to 26 pairs per verb (most verbs occur less than 26 times).

Annotating Dimensions of Interpersonal Relationships
After generating pairs, annotators determine values for each dimension of interpersonal relationships. The annotation interface shows the sentence from which the pair was generated, and the previous and next sentence to provide some context. The pair of people of interest were highlighted, but no additional information was shown (e.g., the verb of which one person is the subject).
Annotators assign a value to each dimension based on the relationship between the two individuals at the time the verbal event of which one of the individuals is the subject takes place. They were trained to take into account context (previous and next sentences), and to interpret the text as they normally would. Therefore, they assign values using world knowledge that may not be explicitly stated in the text. For example, two people talking on the phone would have a spatially distant relationship because (most likely) they are not next to each other while talking. Annotating the changes over time of the dimensions is outside the scope of this paper.
During the first batch of annotations, we discovered that for a given pair of people, dimensions sometimes cannot be determined because (a) there is not enough evidence in the text provided (i.e., sentence from which the pair was generated, previous and next sentences) or (b) the pair is invalid and assigning dimensions is nonsensical. We use 0 label in the former case, and inv in the latter. For example, in the sentence [He] y [criticized] verb [Ken Starr] x , the value for the dimension spatially near (vs. distant) was marked 0 as there is not enough information to determine whether He and Ken Starr are at the same location when criticized took place. The most frequent example of inv is when God is marked as a PERSON named entity in the gold annotations in OntoNotes.
In the rest of the paper, we refer to dimensions by the first descriptor in Table 1, and use 1 if the first descriptor of a dimension is true, and -1 if the second descriptor is true. For example label -1 applied to dimension temporary means that the realtionship is enduring. -1 1 -1 1 1 -1 -1 1 1  Figure 2: Label distribution per dimension of interpersonal relationships. The missing portion of each pie chart corresponds to labels 0 and inv, which always amount to less than 5% each.

Annotation Examples
We present annotation examples from our corpus in Table 2, including context if it is relevant. We acknowledge that some annotations are ambiguous, and discuss label distributions and interannotator agreement in Section 4.4.
Sentences (1) and (2) encode a COMMUNICA-TION relationships between two individuals, and both are cooperative, superficial, work oriented, active, unintimate, temporary, and concurrent. The values for two dimensions, however, are different. Two counterparts (Sentence 1) are at the same level in the power structure (equal), but interviewer and interviewee are not (Sentence 2). Similarly, talking on the phone entails that the individuals are spatially distant (Sentence 1), but inter-viewing (most likely) means that they were spatially near (Sentence 2). One could argue that 0 would be a better label for spatially near in Sentence (2), but annotators interpreted that interviewed refers to an in-person interview.
Sentence (3) in context describes one person (Yu Youren) encouraging another one (I). Annotators indicate that this relationship, unlike the ones in Sentences (1) and (2), is intense (frequent interaction), intimate (emotionally close), and enduring (lasting over a month). These values are not explicitly stated, but they are understood given the long-lasting impact Yu Youren had on I.
Finally, Sentence (4) exemplifies a competitive relationship. The context describes a struggling relationship between Dingxiang and Zhang Sheng. When the latter threw the former out, the rela-      Koch, 1977).
tionship was superficial and unintimate, but (most likely) existed for longer than a month (enduring).

Corpus Analysis
Label Distribution. Figure 2 shows the percentage of label 1 (first descriptor) and -1 (second descriptor) per dimension in our corpus. The percentage of Label 0 ranges from 0.86% (temporary) to 4.3% (cooperative) depending on the dimension, and the percentage of inv is 4.6% (not shown in Figure 2). Importantly, annotators assigned a useful value (either 1 or -1) to most pairs of people (>90%) for all dimensions. The distributions of 1 and -1 clearly show that most dimensions are biased towards one label. For example, few relationships are pleasure oriented (14.4%) or enduring (14.5%). The exceptions are concurrent vs. nonconcurrent, with roughly the same percentages (46.4% and 47.1%), spatially near vs. distant (40.4% and 51.4%) and active vs. passive (58.4% and 35.3%). These distributions are not a representative sample of all interpersonal relationships, we would expect many pleasure ori-ented and intimate relationship if we work with personal diaries instead of OntoNotes. Inter-Dimension Correlations. While the dimensions we work with have a long tradition in social science (Section 2), to the best of our knowledge, they have not been extensively annotated in text before. Table 3 shows inter-dimensional correlations for all pairs of dimensions in our corpus. Not surprisingly, some dimensions correlate with each other. For example, enduring relationships tend to also be intense (0.56) and intimate (0.61), and concurrent relationships tend to be active (0.74). The highest correlation is between concurrent and spatially near (0.89), indicating that if two people participate in a common event at the same time, usually they are at the same location (see counterexample in Table 2, Sentence 1). Note, however, that most correlations are low, and some dimensions (e.g., cooperative, equal) have low correlations (<0.30) with all dimensions. Inter-Annotator Agreement. The annotations were done by two graduate students. They started annotating small batches of pairs of people, and discussed disagreements with each other. After several iterations, they annotated independently 10% of all pairs of people generated. Table 4 depicts the inter-annotator agreements obtained. Overall Cohen's kappa coefficient is 0.68, and the coefficients range between 0.59 to 0.74 depending on the dimension. Note that kappa coefficients in the range 0.60-0.80 are considered substantial, and over 0.80 would be perfect (Landis and Koch, 1977). Given these high agreements, the rest of pairs were annotated once.

Experiments and Results
We conduct experiments using standard supervised machine learning. Each pair of people become an instance, and we split instances into training (80%) and test (20%). As a learning algorithm, we use SVM with RBF kernel as implemented in scikit-learn (Pedregosa et al., 2011).
We report results in the test set after tuning the SVM parameters (C and γ) using 10-fold crossvalidation with the training set. More specifically, we train one classifier per dimension, and experiment with all instances but the ones annotated inv. Thus, each classifier predicts 3 labels: 1 (the first descriptor applies), -1 (the second descriptor applies), and 0 (neither descriptor applies). Outgoing syntactic dependency from verb deps in Flags indicating incoming syntactic dependencies to the verb lex name Name of the WordNet lexical file of the verb token before Word form and part-of-speech tag of the token before the verb token after Word form and part-of-speech tag of the token after the verb Person words, tags Concatenation of word forms and part-of-speech tags type Flag indicating whether the person is a pronoun or named entity dep out Outgoing syntactic dependency distance verb Number of tokens between the person and the verb first token Word form and part-of-speech tag of the first token in the person last token Word form and part-of-speech tag of the last token in the person token before Word form and part-of-speech tag of the token before the person token after Word form and part-of-speech tag of the token after the person Personx Persony direction Flag indicating whether x occurs before or after y type Flag indicating whether x and y are PERSON NEs, or either one is a pronoun Table 5: Feature set used to determine dimensions of interpersonal relationships between pairs of people (x, y). Verb features are extracted from the verb of which either x or y is the subject, Person features are extracted from x and y independently, and Persons features are extracted from x and y.

Feature Set
The features we work with are summarized in Table 5. Most features are standard and have been used before to extract relations from text (Section 2). Following the notation in Table 2, we refer to the pair of people as x and y.
Verb features capture information about the verb to which x or y attach. We include words and part-of-speech tags (verb, and tokens before and after), the name of the WordNet lexical file to which the verb belongs, and dependencies.
Person features are extracted from x and y independently, and consists mostly of words and partof-speech tags. We also include a flag indicating whether the person is a pronoun or named entity (type feature), and the number of tokens between the person and the verb (distance verb).
Person x Person y features capture information of both x and y. They capture (a) whether x occurs before or after y in the sentence, and (b) whether they are both named entities or one is a pronoun and the other one a named entity (type feature).

Results
We present overall results (averages of the classifiers for each dimension) using the majority baseline and with several feature combinations in Table  6. Then, we present detailed results per dimension with the best feature combination in Table 7. We only present results obtained in the test set. Overall Results and Feature Ablation ( Table  6). The majority baseline obtains 0.53 average F-measure. Recall that we build a classifier per dimension, thus the combination of the nine  majority-baseline classifiers predict two labels: 1 (0.55 F-measure) and -1 (0.64 F-measure).
Models trained with any combination of features outperform the majority baseline, but they never learn to predict label 0. Since 0 occurs between 0.86% and 4.3% depending on the dimension (Section 4.4), this limitation does not affect overall performance substantially.   Table 6) Verb features alone yield a 0.65 average Fmeasure (1: 0.64, -1: 0.65). Adding features derived from x (Verb + Person x ) improves performance (0.71 average F-measure), and adding features derived from y (Verb + Person y ) slightly improves performance (0.67 average F-measure). In both cases, -1 is predicted more accurately than 1 (0.78 vs. 0.69 and 0.74 vs. 0.67).
Finally, adding all features (Verb + Person x + Person y + Person x Person y ) yields the best results (average F-measure: 0.72), although by a minimal margin with respect to Verb + Person x . Detailed Results per Dimension. Table 7 presents results per dimension with the best overall combination of features (Verb + Person x + Person y + Person x Person y ).
All dimensions obtain overall F-measures between 0.65 and 0.83 (last column). Results per label are heavily biased towards the most frequent label per dimension (Figure 2), although it is the case that the models we experiment with predict both 1 and -1 for all dimensions. As stated above, none of them predict 0, but this limitation does not substantially penalize overall performance because of the low frequency of this label.
The model obtains the same F-measures for 1 and -1 with concurrent dimension (0.76), and the labels of this dimension are virtually distributed uniformly (46.4% vs. 47.1%, Figure 2). Similarly, F-measures for 1 and -1 with spatially near and active dimensions are similar (0.67 vs. 0.76 and 0.76 vs. 0.58), and the labels are distributed relatively evenly in our corpus (40.4% vs 51.4% and. 58.4% vs. 35.3%).
Finally, F-measures per label with other dimen-sion are biased towards the most frequent label. For example, only 15% of all pairs of people have an enduring relationship (Figure 2), and the Fmeasure for 1 with temporary dimension is much higher (0.91) that for -1 (0.16).

Conclusions
We have presented a set of nine dimensions of interpersonal relationships, including dimensions with a long tradition in social science and new ones. These dimensions allow us to differentiate core characteristics of the relationship between two individuals. For example, people that communicate may be spatially near or spatially distant (asking questions in class vs. chatting online), and have a pleasure-oriented or work-oriented relationship (somebody wishing good luck to a friend vs. interviewer and interviewee). Our annotations show that assigning values to dimensions can be done reliably (Cohen's kappa: 0.68), and that useful values (1 and -1 labels) are assigned to dimensions in most pairs of people (>90%). Experimental results following a standard supervised machine learning approach show that assigning values to dimensions can be automated (0.72 overall F-measure), and that results per label and dimensions are biased towards the most frequent label.
We believe that extracting dimensions of interpersonal relationships complements previous efforts that extract relationships. Our future plans include studying values of dimensions for selected relationships (e.g., COWORKER), and investigating changes on the dimensions of the relationship over time. The latter would allow us to, for exam-ple, analyze how the relationship between two individuals changes over time, and determine which events make a relationship go from superficial to intense and vice versa.