Better Together: Combining Language and Social Interactions into a Shared Representation

Despite the clear inter-dependency between analyzing the interactions in social networks, and analyzing the natural language content of these interactions, these aspects are typically studied independently. In this paper we present a first step towards finding a joint representation , by embedding the two aspects into a single vector space. We show that the new representation can help improve performance in two social relations prediction tasks.


Introduction
The interactions, social bonds and relationships between people have been studied extensively in recent years. Broadly speaking, these works fall into two, almost completely disconnected, camps. The first, focusing on social network analysis, looks at the network structure and information flow on it as means of inferring knowledge about the network. For example, works by (Leskovec et al., 2008;Kumar et al., 2010) model the evolution of network structure over time, and works such as (Xiang et al., 2010;Leskovec et al., 2010) use the network structure to predict properties of links (e.g., strength, sign).
The second camp, focusing on natural language analysis, looks into tasks such as extracting social relationships from narrative text (Elson et al., 2010;Van De Camp and van den Bosch, 2011;Agarwal et al., 2012) and analyzing the contents of the information flowing through the network. For example, works by (Danescu-Niculescu-Mizil et al., 2012;Hassan et al., 2012;Filippova, 2012;Volkova et al., 2014;West et al., 2014;Rahimi et al., 2015;Volkova et al., 2015) extract attributes of, and social relationships between, nodes by analyzing the textual communication between them. Other works (Krishnan and Eisenstein, 2014;Sap et al., 2014) use the social network to inform language analysis.
Both perspectives on social network analysis resulted in a wide range of successful applications; however, they neglect to model the interactions between the social and linguistic representations and how they complement one another. One of the few exceptions was discussed in (West et al., 2014), which inferred sentiment links between nodes in a social network by jointly modeling the local output probabilities of a sentiment analyzer looking at the textual interactions between the nodes and the global network structure. While resulting in better performance, inference is done over two independent representations, one capturing the linguistic information, and the other, the network structure.
Instead, in this paper we take the first step towards finding a joint representation over both linguistic and network information, rather than treating the two independently. We follow the intuition that interactions in a social network can be fully captured only by taking into account both types of information together. To achieve this goal, we embed the input social graph into a dense, continuous, low-dimensional vector space, capturing both network and linguistic similarities between nodes. Word (Mikolov et al., 2013;Pennington et al., 2014) and Network (Perozzi et al., 2014;Tang et al., 2015) embedding approaches that were recently proposed, aim to combat a similar problem in their respective domains-data sparsity. Both follow a similar approach-embed discrete objects (words or nodes in the graph) into a continuous vector representation, based on the context they appear in. Our approach aims to map both social and linguistic information into the same vector space, rather than embedding the two aspects into two independent spaces. The social graph, originally containing only quantitative properties of the interaction between nodes (e.g., number of messages exchanged between nodes), is extended to capture the contents of these interactions, by computing the textual similarity between the messages generated by each one of the nodes. The computed similarity is used to weight the edges between adjacent nodes. We embed the modified graph nodes into a vector space, using the embedding technique described by (Tang et al., 2015).
We evaluate the joint representation by using it in two social relationship prediction tasks and comparing it to several different word-based and network based representations. Our experiments show the advantage of the joint representation.

Problem Formulation
Our primary assumption is there is a latent space that influences the interactions we observe among people. Thus the goal of our work is to learn this latent representation from the observed data. We describe the data and problem more specifically below.

Data
We assume that the data comprise a graph G = (V, E), where nodes V correspond to entities (e.g., users in a social network), and the edges E correspond to textual interactions among the entities (e.g., emails, messages). Each edge e t ij ∈ E, which refers to a message sent from node v i to node v j at time t, has an associated document representation d t ij . We refer to the set of messages (documents) between nodes v i and v j as E ij := {e t ij } t (D ij respectively). Moreover, we refer to the set of messages (documents) sent by a node v i to any other node as

Motivation
Given this type of network data, the goal is to discover the underlying latent representation of the nodes. Our assumption is that the entities are embedded in a latent space that influences the frequency and nature of their communication. We assume that each node has a location in space (e.g., in R 2 , the location of v i is v i := (x i , y i )), and that pairwise node distances (e.g., d(v i , v j )) affect the likelihood of communication and the content of that communication. More specifically, we assume that nearby nodes are more likely to communicate, and talk about similar things. Thus, we assume the latent space embedding represents entities' interests and pairs of entities with similar interests are more likely to interact. These assumptions are motivated by online communities where users exhibit homophily (McPherson et al., 2001), i.e., users with common interests are more likely to form relationships.

Problem Definition
Given the framework and assumptions described above, we can now state the problem definition for the work in this paper. Assume as input, a multigraph G = (V, E) with messages between nodes in the graph that can be modeled as a set of documents.
The goal is to learn an embedding of the nodes V in R k such that the representation reflects both the frequency and content of the messages. To achieve this we will consider several different ways to compute the embedding based on optimizing (1) network connectivity, (2) message content, and (3) connectivity and content. Our conjecture is that jointly considering connectivity and content will produce an embedding that is more robust to noisy interaction data. Strong (but introverted) friends may talk less frequently but share more common interests, compared to gregarious users who talk more frequently but with many (weak) friends.
Since there is no ground truth for quantitative evaluation, it is difficult to directly evaluate the quality of a learned embedding. Thus, we evaluate our methods indirectly via related classification tasks. In this work, we will use the learned embeddings in two link-based prediction tasks, where we differentiate (1) strong vs. weak(er) friendships, and (2) employees working in the same vs. different groups.

Method
The input for our task is the text-enriched network graph G. The goal is to compute a node embedding from G and then use the embedding to generate features for pairs of nodes, which can then be used for a prediction task. The process follows these steps.
• Textual-Similarity (TS) Infused Social Graph: Construct graph weights W ij based on the text in G, according to (1) a Node or Edge view of the documents, and (2) using Topic Model or Word Embedding to represent the content.
• Node Embedding: Construct an embedding function V → R k , mapping the (weighted) graph nodes into a R k dimensional space. We used the LINE method (Tang et al., 2015). We omit the details due to space restrictions.
• Feature Extraction: Construct a feature set for each node pair, using 9 similarity measures between the nodes' k-dimensional vector representations from the embedding. We experiment with additional features extracted directly.

Creating the TS-Infused Social Graph
The TS-Infused social graph captures the interaction between node pairs by modifying the strength of the edge connecting them according to the similarity of the text generated by each one of the nodes. We identify several design decisions for the process.
Node vs. Edge Each edge e ij ∈ G is associated with textual content d ij . We can characterize the textual content from the point of view of the node by aggregating the text over all its outgoing edges (i.e., D i ), or alternatively, we can characterize the textual content from the edge point of view, by only looking at the text contained in the relevant outgoing edges (i.e., D ij ).
Representing Textual Content using Topic Models vs. Word Embedding Before we compute the similarity between the content of two parties, we need a vector space model to represent the textual information (the set of documents D i , or D ij ). One obvious method for this is topic modeling, in which the textual content is represented as a topic distribution. In this approach, we learn a topic model over the set of documents, and then represent each document via a set of topic weights (T i or T ij ). An alternative approach is using word embedding, which has been proved effective as a word representation. In this approach, we represent each document as the average of the embedding over the words in the document (WE i or WE ij ). Given the distributional representation of text associated with a node/edge, we assign a weight (w ij ) for each edge (e ij ) as the cosine similarity between vector representation of contents from neighboring nodes (e.g., d(T i , T j ) or d(T ij , T ji ), where d is cosine similarity).

Node Embedding
We utilize the LINE embedding technique (Tang et al., 2015), aimed at preserving network structures when generating node embedding for social and information networks. LINE uses edge weights corresponding to the number of interactions between each pair of nodes. This only makes use of the network structure, without taking advantage of the text in the network. We modify the embedding procedure by using the edges weights W ij described above (i.e., based on the cosine similarity of the text between nodes i, j) and use the LINE algorithm to compute a k-dimensional embedding of the nodes in G.

Feature Extraction
Distance-based Features Given a node pair represented by their k-dimensional node embedding, we generate features for the pair according to nine similarity measures. The nine measures used by us are Bray-Curtis distance, Canberra distance, Chebyshev distance, City Block (Manhattan) distance, Correlation distance, Cosine distance, Minkowski distance, Euclidean and squared Euclidean distance.
Additional Features Besides the distance-based features, we can also add one or more other basic features related to nodes in the network. These include the following: (1) Network: The number of interactions between two nodes, e.g. number of emails sent and received. (2) Unigram: The unigram feature vector for text sent for each node. (3) Word embedding features: The word embedding vector for text sent for each node. Again we use the average of word embedding to represent documents.

Experiments
Purdue Facebook Network We analyzed the public Purdue Facebook network data from March 2007 to March 2008, which includes 3 million post activities. Members can set friends as top (close) friends to get the timely notifications without a confirmation by the other. We collected 945 mutually top friend pairs for two users who set each other as top friend and 34633 one-way top friend pairs if there is only one of them set the other as top friend. The dataset will be referred as "Facebook" in this  Table 1: Prediction results over the two datasets. We report the F1 score.
paper. We evaluated our method by a classification task of the two different social relationships.
Avocado Email Collection This collection consists of 279 e-mail accounts, from which we extracted the job titles and departments of 136 accounts. We divided these accounts into three groups, according to their positions in the company, namely executives, engineering department, and business department. We will refer to this dataset as "Avocado" in this paper. The task is defined as predicting whether two accounts belong to the same group. In order to make use of text signal. We will only consider account pairs that have correspondence between each other. There are 2232 positive and 1409 negative examples in this dataset.

Result
Using the features defined in the previous section, we train Logistic Regression classifier via scikit-learn in Python. We show the ten-fold crossvalidation performance of our features on Facebook and Avocado datasets in  Table are the average scores of ten different random downsampling. For Facebook dataset, the results of all embeddings constructed by TS-Infused social graph outperforms the original embedding GE. It shows the joint representation over linguistic information and network structure is more effective than only considering one of them independently. The results on Avocado dataset also confirm the advantage of shared representation. GE N T M significantly outperforms other text-based or network-based methods. The performance of aggregating text sent by a node is better than only looking at text on one outgoing edge, which is opposite to the results on Facebook dataset. This could be resulted from the difference between two prediction tasks. In the Facebook dataset, we try to distinguish strong and weak(er) friendship, in which case the messages they sent to each other are most indicative. While when we predict whether two persons belong to the same group inside a company, the interaction they had with their colleagues would tell us more about the community they are from.