Detecting Deceptive Groups Using Conversations and Network Analysis

Deception detection has been formulated as a supervised binary classiﬁcation problem on single documents. However, in daily life, millions of fraud cases involve detailed conversations between deceivers and victims. Deceivers may dynamically adjust their deceptive statements according to the reactions of victims. In addition, people may form groups and collaborate to deceive others. In this paper, we seek to identify deceptive groups from their conversations. We propose a novel subgroup detection method that combines linguistic signals and signed network analysis for dynamic clustering. A social-elimination game called Killer Game is introduced as a case study 1 . Experimental results demonstrate that our approach signiﬁcantly out-performs human voting and state-of-the-art subgroup detection methods at dynamically differentiating the deceptive groups from truth-tellers.


Introduction
Deception generally entails messages and information intentionally transmitted to create a false conclusion (Buller et al., 1994). Deception detection is an important task for a wide range of applications including law enforcement, intelligence gathering, and financial fraud. Most of the previous work (e.g., (Ott et al., 2011;Feng et al., 2012)) focused on content analysis of a single document in isolation (e.g., a product review). The promoters of a product may post fake complimentary reviews, while their competitors may hire people to write fake negative reviews (Ott et al., 2011). 1 The data set is publicly available for research purposes at: http://nlp.cs.rpi.edu/data/killer.zip However, when we want to detect deception from text or voice conversations, the deception behavior may be affected by the following factors beyond textual statements.

Dynamic. Recent research in social science
suggests that deception communication is dynamic and involves interactions among people (e.g., (Buller and Burgoon, 1996)). Additionally, the research postulates that human's capacity to learn by observation enables him to acquire large, integrated units of behavior by example (Bandura, 1971). Therefore, a person's behavior concerning deception or truth-telling can change constantly, while he learns from others' statements during conversations. 2. Global. People may form groups for purpose of deception. Research in social psychology has shown that an individual's object-related behavior may be affected by the attitudes of other people due to group dynamics (Friedkin, 2010).
Recent studies typically have been conducted over "static" written or oral deceptive statements. There is no obligatory requirement for communication between the author and the readers of these statements (Yancheva and Rudzicz, 2013). As a result, a victim of deception tends to trust the story mainly based on the statement he reads (Ott et al., 2011). However, in daily life, millions of fraud cases involve detailed conversations between deceivers and victims. A deceiver may make a statement, which is partially true in order to deceive or mislead victims and adjust his deceptive strategies based on the reactions of victims (Zhou et al., 2004). Therefore, it is more challenging to identity a deceiver in an interactive process of deception.
Most deception detection research addressed individual deceivers, but deceivers often act in pairs or larger groups (Vrij et al., 2010). The interac-  tions within a deceptive group have been ignored. For example, a product review from a deceiver may be supported by his teammates so that his deceptive comments can be read by more potential buyers. In this case, we can identify a deceptive group based on their collaborations and common characteristics, which is more promising than the typical methods of classifying individual statements as deceptive or trustworthy.

Subgroups
In order to identify deceptive groups by analyzing the evolution of a person's deception strategy during his interactions with victims and the interactions within the deceptive group from conversations, we use a social-elimination game called Killer Game which contains the ground-truth of subgroups.
The killer game has many variants that involve different roles and skills. We choose a classical version played by three roles/teams: detectives, citizens, and killers. The role of each player (game participant) is randomly assigned by a third-party game judge. Every killer/detective is given the identities of his teammates. There are two alternating phases of the game: "night", when killers may covertly "murder" a player and detectives may learn one player's role; and "day", when surviving players are informed of who was killed last "night" and then asked to speculate about the roles of other surviving players. Before a "day" ends, every surviving player should vote for a suspect. The candidate with the most votes is eliminated. A player's identity is not exposed after his "death". The game continues until all killers have been eliminated or all detectives have been killed. The killers are treated as deceivers, and citizens and detectives as truth-tellers.
In this paper, we present an unsupervised approach for differentiating the deceptive groups from truth-tellers in a game. During each round, we use Natural Language Processing (NLP) techniques to identify a player's attitude toward other players (Section 2), which are used to construct a vector of attitudes for each surviving player (Section 3.1) and a signed social network representation (Section 3.2) for the discussions. Then we use a clustering algorithm to cluster the attitude vector space and obtain results for each round (Section 3.1). We also implement a greedy optimization algorithm to partition the singed network based on the attitude clustering result (Section 3.2). Finally, we apply a pairwise-similarity approach that makes use of the predicted cooccurrence relations between players to combine all results from each round (Section 3.3). Figure 1 provides an overview of our system pipeline.
The major novel contributions of this paper are as follows.
• This is the first study to investigate conversations and deceptive groups for computerized deception detection. • The proposed clustering technique is shown to be successful in separating deceptive groups from truth-tellers. • The method can be applied to dynamically detect subgroups in a network with discussants who tend to change their opinions.

Attitude Identification
In this section, we describe how we take a player's statement in a single round as input to extract his attitudes toward other players and represent them by an attitude 3-tuple (speaker, target, polarity) list. For this work, the polarity of attitudes (Balahur et al., 2009) can be positive (1), negative (-1) or neutral (0). A game log from a single round will be used as our illustrative example, as shown in Figure 2.

Target and Attitude Word Identification
We start by identifying targets and attitude words from conversations. In the killer game, a target is tences that include at least one attitude word from a player's statement during each round. We develop a rule-based approach for attitudetarget pairing: if there is at least one ID in the sentence, we associate all attitude words in that sentence with it. Otherwise, if "I" is the only subject or there are no subjects at all, we associate attitude words with the ID of the speaker. We reverse the polarity of an attitude word if it appears in a negation context.
Previous methods pair a target and an attitude word if they satisfy at least one dependency rules (e.g., (Somasundaran and Wiebe, 2009)). We check the POS tag sequence between them. For each attitude-target pair, if there exists an attitude word, a belief-oriented verb such as "think", "believe", "feel", or more than two verbs in the sequence, we will discard this pair. The assumption is that POS tag sequences can be used to summarize dependency rules when statements are relatively short.
For those targets, the speaker didn't mention or there is no positive/negative attitude word used when they are mentioned, the attitude polarity score is set to 0. For instance, given Player 16's statement in Figure

Target and Attitude Word Identification
We start by identifying targets and attitude words from conversations. In the killer game, a target is represented by his unique ID 2 and game terms are regarded as attitude words. We collected 41 terms in total from the game's website 3 and related discussion forum posts. ICTCLAS (Zhang et al., 2003) is used for word segmentation and part-of-speech (POS) tagging. There are two kinds of game terms: positive and negative. Positive terms include "citizen", "good person", "good person certified by the detectives" and "detective". Negative terms include "killer", "killer verified by the detectives" and "a killer who claimed himself/herself to be a detective". We assign the polarity score +1, -1 to positive and negative terms respectively.

Attitude-Target Pairing
Then we associate each attitude word with its corresponding target. We remove interrogative and exclamatory sentences and only keep the sentences that include at least one attitude word from a player's statement during each round. We develop a rule-based approach for attitudetarget pairing: if there is at least one ID in the sentence, we associate all attitude words in that sentence with it. Otherwise, if "I" is the only subject or there are no subjects at all, we associate attitude words with the ID of the speaker. We reverse the polarity of an attitude word if it appears in a negation context.
Previous methods pair a target and an attitude word if they satisfy at least one dependency rule (e.g., (Somasundaran and Wiebe, 2009)). We check the POS tag sequence between them. For each attitude-target pair, if there exists an attitude word, a belief-oriented verb such as "think", "believe", "feel", or more than two verbs in the sequence, we will discard this pair. The assumption is that POS tag sequences can be used to summarize dependency rules when statements are relatively short.

Clustering
Since the statements in conversations are relatively short and concise, it is difficult to identify which one is deceptive, even using deep linguistic features such as the language style.
In this section, we introduce a method to construct an attitude profile for each player and a signed network based on the attitude tuple list in Section 2, and combine them to analyze a dynamic network with discussants telling lies and truths.

Clustering based on Attitude Profile
We use a vector containing numerical values to represent each player's attitude toward identified targets in each round. The values correspond to the polarity scores in a player's attitude tuple list. For example, the polarity score of player 16's attitude toward target 11 is −1 as shown in Figure 2.
We call this vector as the discussant attitude profile (DAP) following (Abu-Jbara et al., 2012a).
Suppose there are n players who participate in a single game. Since a player's identity is not exposed to the public after his death 4 , people can still analyze the identity of a "dead" player. Therefore, the number of possibly mentioned targets in each round equals to n. Given all the statements from m surviving players in a single round, each player's DAP has n + 1 dimensions including his vote and thus we can have a m × (n + 1) attitude matrix A where A ij represents the attitude polarity of i toward j we got from Section 2. A i(n+1) represents i's vote.
In a certain round, given a set of m surviving players X = {x 1 , x 2 , · · · , x m } to be clustered and their respective DAPs, we can modify the Euclidean metric to compute the differences in attitudes and get an m × m distance matrix M : The Kronecker delta function δ is: We use this function to compare the votes of two players separately because a player's vote can be inconsistent with his previous statements. We assume that there is a larger distance between two players when they vote for different suspects.
A common assumption in previous research was that a member is more likely to show a positive attitude toward other members in the same group, and a negative attitude toward the opposing groups (Abu-Jbara et al., 2012a). However, a deceiver may pretend to be innocent by supporting those truth-tellers and attacking his teammates, whose identities have already been exposed. Therefore, it is not enough to judge the relationship between two players by simply measuring the distance between their DAPs.
In addition to comparing DAPs between players i and j, we also consider the attitudes of other players toward i and j, as well as their attitudes 4 Each round, the player killed by killers and the player with the most votes are out. toward each other. We modify M ij as follows and show it in Figure 3: where the function h detects the negative attitudes. h(x) = 0 if x ≥ 0 and h(x) = −1 otherwise.
We perform hierarchical clustering on the condensed distance matrix of M and use the complete linkage method to compute the distance between two clusters (Voorhees, 1986). We set the number of clusters as 3 since there are three natural groups in the game. We focus on separating deceivers (killers) from truth-tellers (citizens and detectives).
compare and ′ s DAPs Figure 3: Computation of the distance between player i and j based on the attitude matrix.

Signed Network Partition
When we computed the distance between two players in Section 3.1, we did not consider the network structure among all the players. For example, if A supports C, B supports D and C and D dislike each other, A and B may belong to different groups. Thus, we propose to capture the interactions in the social network to further improve the attitude-profile-based clustering result.
We can easily convert the attitude matrix A into a signed network by adding a directed edge i → j between i and j if A ij = 0. We denote a directed graph corresponding to a signed network as G = (V, S, N, W ), where V is the set of nodes, S is the set of positive edges, N is the set of negative edges and W : (V × V ) → {−1, 1} is a function that maps every directed edge to a value, W (i, j) = A ij .
We use a greedy optimization algorithm (Doreian and Mrvar, 1996) to find partitions. A criterion function for an optimal partitioning procedure is constructed such that positive links are dense within groups and negative links are dense between groups. For any potential partition C, we seek to minimize the following error function: where γ ∈ [0, 1] controls the balance of the penalty difference between putting a positive edge across and a negative edge within a group. We regard these two types of errors as equally important and set γ = 0.5 for our experiments.
Initially, we use the clustering result in Section 3.1 to partition nodes into three different groups and an error function, E, is evaluated for that cluster. Every cluster has a set of neighbor clusters in the cluster space. A neighbor cluster is obtained by moving a node from one group to another, or exchanging two nodes in two different groups. E is evaluated for all the neighbor clusters of the current cluster and the one with the lowest value is set as the new cluster. The algorithm is repeated until it finds a minimal solution 5 . We set the upper limit for the number of subgroups to 3.

Cluster Ensembles
The relationships between players are dynamic throughout the game. For example, a killer tends to hide his identity and pretends to be friendly to others at later stages in order to survive. Thus, it is insufficient to rely on a single round's discussion to cluster players. In addition, for each single round, we also need to combine the clustering results from the attitude profiles of the players and the signed network.
In a game with information gathered from up to r rounds, let P = {P 1 , P 2 , · · · , P r } be the set of r clusterings (partitionings) based on attitude profiles and P = {P 1 , P 2 , · · · , P r } be the set of r clusterings based on the signed network.
Using the co-occurrence relations between players, we can generate a n × n pairwise similarity matrix T based on the information of all r rounds: where vote ij , vote ij are the number of times that player i and j are assigned to the same cluster in P and P respectively. r ij denotes the number of rounds when both of them survived (r ij ≤ r). T r ij ∈ [0, 1]. We assign a higher weight to the result of P 1 and set λ = 2/3 in our experiments.

Dataset Construction
We recorded 10 games from 3J3F 6 , one of the most popular Chinese online killer game websites 7 . A screenshot of the game system interface is shown in Figure 5. There are 16 participating players per game: 4 detectives, 4 killers and 8 citizens. Each player occupies a position in 1 . All the surviving players can express their attitudes via a voice channel using 2 , while detectives and killers can also communicate with teammates in their respective private team channels 3 via texts. The system provides real-time updates on the game progress, voting results, and so on using the public channel 4 . We manually transcribed speech and stored the text information in the public channel, which contains the voting and death information. The average game length is about 76.3 minutes and there are on average 5 rounds and 411 sentences per game. Note that our method is language-independent and could easily be adapted to other languages.

Evaluation Metrics
We use two metrics to evaluate the clustering accuracy: Purity and Entropy. Purity (Manning et al., 2008) is a metric in which each cluster is assigned to the class with the majority vote in the cluster, and then the accuracy of this assignment is measured by dividing the number of correctly assigned instances by the total number of instances N . More formally: where Ω = {w 1 , w 2 , · · · , w k } is the set of clusters and C = {c 1 , c 2 , · · · , c j } is the set of classes. w k is interpreted as the set of instances in w k and c j is the set of instances in c j . The purity increases as the quality of clustering improves. Entropy (Steinbach et al., 2000) measures the uniformity of a cluster. The entropy for all clusters is defined by the weighted sum of the entropy of each cluster: where P (i, j) is the probability of finding an element from the category i in the cluster j, n j is the number of items in cluster j and n is the total number of items in the distribution. The entropy decreases as the quality of clustering improves.

Overall Performance
We compare our approach with two state-of-theart subgroup detection methods and human performance as follows: 1. DAPC: In Section 3.1, we introduced our implementation of the discussant attitude profile clustering (DAPC) method proposed in (Abu-Jbara et al., 2012a). In the original DAPC method, for each opinion target, there are 3 dimensions in the feature vector, corresponding to (1) the number of positive expressions, (2) negative expressions toward the target from the online posts and (3) the number of times the discussant mentioned the target. For our experiment, we only keep one dimension representing the discussant's attitude (positive, negative, neutral) toward the target since a discussant attitude remains the same in his statement within a single round. 2. Network: We also implemented the signed network partition method for subgroup detection proposed by . To determine the number of subgroups t, we set an upper limit of t = 3 in order to minimize the optimization function.
3. Human Voting: We also compare our methods with human voting results. There are two subgroups based on the voting results. The players with the highest votes each round belong to one subgroup and the rest of the players are in the other subgroup. Table 1 shows the overall performance of various methods on subgroup detection and Figure 6 depicts the average performance. We can see that our method significantly outperforms two baseline methods and human voting. The human performance is not satisfying, which indicates it's very challenging even for a human to identify a deceiver whose deceptive statement is mixed with plenty of truthful opinions (Xu and Zhao, 2012). By extending the DAPC method (EDPAC), we can estimate the distance between two players more accurately by considering the attitudes of other players toward them and their attitudes toward each other. Given the log in Figure 2 as input, players 5 (detective) and 7 (killer) are clustered into one group when DAPC is applied since they don't have conflicting views on the identities of other players. However, 5 voted for 7 and is supported by more players compared with 7, which indicates that they are less likely to be teammates. We can successfully separate them after re-computing the distance between them.
Adding network information provided 5.7% further gain in Purity. In some cases, the performance remains the same when EDAPC clustering result is already optimal with the minimum value of the criterion function.

Dynamic Subgroup Detection
As shown in Figure 7, the performance of our approach improves as the game proceeds. Players seldom maintain their opinions throughout a game. Figure 2 shows that most killers (16,1,12) insisted that citizen 11 should be a killer except 7. As a response to the group pressure (Asch, 1951), 7 changed his opinion and stated that 11 could be a killer in the following round.
In reality, a discussant who participates in an online discussion tends to change his opinions about a target as he learns more information, which shows both the necessity and importance of the dynamic detection of subgroups. Our method can be applied to detect subgroups dynamically by grouping posts into multiple discussion "rounds" based on their timestamps.

Opinion Analysis
Our work on mining a player's attitude toward other players is related to opinion mining. Attitudes and opinions are related and can be regarded as the same in our task. Compared with the previous work (e.g., (Qiu et al., 2011;Kim and Hovy, 2006)), the opinion words and targets in our task are relatively easier to recognize due to the simplicity of statements. Some recent work (e.g., (Somasundaran and Wiebe, 2009;Abu-Jbara et al., 2012a)) developed syntactic rules to pair an opinion word and a target if they satisfy at least one specific dependency rule. We use POS tag sequences to efficiently help us filter out irrelevant pairs.

Deception Detection
Most of the previous computational work for deception detection used supervised/semisupervised classification methods (Li et al., 2013b). Besides lexical and syntactical features (Ott et al., 2011;Feng et al., 2012;Yancheva and Rudzicz, 2013), Feng and Hirst (2013) proposed using profile compatibility to distinguish fake and genuine reviews. Xu and Zhao (2012) used deep linguistic features such as text genre to detect deceptive opinion spams. Banerjee et al. (2014) used extended linguistic signals such as keystroke patterns. Li et al. (2013a) used topic models to detect the difference between deceptive and truthful topic-word distribution. Researchers have began to realize the importance of analyzing computer-mediated communication in deception detection. Zhou and Sung (2008) conducted an empirical study on deception cues using the killer game as a task scenario and obtained many interesting findings (e.g., deceivers send fewer messages than truth-tellers). Our work is most related to the work of Chittaranjan and Hung (2010) on detecting deceptive roles in the Werewolf Game which is another variant of the killer game. They created a Werewolf data set by audio-visual recording 8 games played by 2 groups of people face-to-face and extracted audio features and interaction features for their experiments. However, we should note that non face-to-face deception detection emphasizes verbal and linguistic cues over less controllable nonverbal communication cues (Walther, 1996).

Subgroup Detection
In online discussions, people usually split into subgroups based on various topics. The member of a subgroup is more likely to show positive attitude to the members of the same subgroup, and negative attitude to the members of opposing subgroups (Abu-Jbara et al., 2012a). Previous work also studied subgroup detection in social media sites. Abu-Jbara et al. (2012a) constructed a discussant attitude profile (DAP) for each discussant and then used clustering techniques to cluster their attitudes. 2012b; proposed various methods to automatically construct a signed social network representation of discussions and then identify subgroups by partitioning their signed networks. Qiu et al. (2013) applied collaborative filtering through Probabilistic Matrix Factorization (PMF) to generalize and improve extracted opinion matrices.
An underlying assumption of the previous work was that a participant will not tell lies nor hide his own stance. Moreover, their work did not take into account that a person's attitude or stance will change as he learns more by reading the comments from others and acquiring more background knowledge (Bandura, 1971). Our contribution is that we extend the DAP method and combine it with the signed network partition in order to cluster the hidden group members. We also develop a novel cluster ensemble approach in order to analyze the dynamic network.

Conclusions and Future Work
Using the killer game as a case study, we present an effective clustering method to detect subgroups from dynamic conversations with lies and truths. This is the first work to utilize the dynamics of group conversations for deception detection. Experiments demonstrated that truth-tellers and deceptive groups are separable and the proposed method significantly outperforms baseline approaches and human voting.
Our work builds a pathway to future work in deception detection in content-rich dynamic environments such as electronic commerce and repeated interrogation which will require sophisticated content and network analysis. In real-life suspects may be interrogated about particular events on numerous occasions. Our method can potentially be modified to find criminals who act in groups based on their statements. Other applications of this research include law enforcement, financial fraud, fraudulent ad campaigns and social engineering.
This study focuses on analyzing the verbal content in conversations. It will be interesting to study non-verbal features such as blink rate, gaze aversion and pauses (Granhag and Strömwall, 2002) when people play this game face-to-face and combine the non-verbal and verbal features for deception detection. In addition, it is worth exploring the impact of cross-cultural analysis in detecting deception. When attempting to detect deceit in people of other ethnic origin than themselves, people perform even worse in terms of lie detection accuracy than when judging people of their own ethnic origin (Vrij, 2000). For the future work, we aim to use automatic prediction of deceivers to help truth-tellers win games more easily.