Are You for Real? Detecting Identity Fraud via Dialogue Interactions

Identity fraud detection is of great importance in many real-world scenarios such as the financial industry. However, few studies addressed this problem before. In this paper, we focus on identity fraud detection in loan applications and propose to solve this problem with a novel interactive dialogue system which consists of two modules. One is the knowledge graph (KG) constructor organizing the personal information for each loan applicant. The other is structured dialogue management that can dynamically generate a series of questions based on the personal KG to ask the applicants and determine their identity states. We also present a heuristic user simulator based on problem analysis to evaluate our method. Experiments have shown that the trainable dialogue system can effectively detect fraudsters, and achieve higher recognition accuracy compared with rule-based systems. Furthermore, our learned dialogue strategies are interpretable and flexible, which can help promote real-world applications.


Introduction
Identity fraud is one person using another person's personal information or combining a few pieces of real data with bogus information to deceive a third person.Nowadays, identity fraud is becoming an increasingly prevalent issue and has left many financial firms nursing huge losses.Besides, for persons whose identities have been stolen, they may receive unexpected bills and their credit will also be affected.Although identity fraud is a very serious problem in modern society, there are no effective fraud detection methods at present and little attention has been paid to this problem.
Intuitively, a simple way to detect identity fraud in loan applications is directly asking applicants Figure 1: Dialogue examples of two possible fraud detection methods.The first one is directly asking applicants about their personal information.The second one is asking applicants about questions that are related to their personal information.
about their personal information.However, as shown in Fig. 1, this method is prone to errors because fraudsters may well know the fake information.Fortunately, we find fraudsters generally are not clear about answers to questions that are related to the fake information. 2We refer to these questions as derived questions, which can be constructed based on triplets where the head entity is the personal information entity.For example, the first derived question about "Nanjing University" is based on (Nanjing University, FoundedDate, 1902).In Fig. 1, the applicant claims to graduate from "Nanjing University" but can not answer derived questions about this school.This fact indicates that the applicant is likely to be a fraudster.
Based on the above finding, we aim to design a dialogue system to detect identity fraud by asking derived questions.However, there are three major challenges in achieving this goal.
First, designing derived questions requires a high-quality KG.However, owing to the sparseness problem (Ji et al., 2016;Trouillon et al., 2017) of the KG, many entities have no triplets for derived question generation.Second, randomly selecting triplets to generate questions is feasible but it is not the optimal questioning strategies to detect fraudsters.Third, because of privacy issues, evaluating anti-fraud systems with real applicants is not practical.And existing user simulation methods (Li et al., 2016;Georgila et al., 2006;Pietquin and Dutoit, 2006) do not apply to our task.Hence, how to evaluate our systems efficiently is a problem.
To address the above problems, we first complete an existing KG with geographic information in an electronic map (Section 2).In the new KG, nearly all personal information entities can find triplets for derived question generation.Then, based on the KG, we present structured dialogue management (Section 3) to explore the optimal dialogue strategy with reinforcement learning.Specifically, our dialogue management consists of (1) the KG-based dialogue state tracker (KG-DST) that treats embeddings of nodes in the KG as dialogue states and (2) the hierarchical dialogue policy (HDP) where high-level and low-level agents unfold the dialogue together.Finally, based on intuitive analysis, we find the applicants' behavior is related to some factors (Section 5.1).Thus, we introduce hypotheses to formalize the effect of these factors on the applicants' behavior and propose a heuristic user simulator to evaluate our systems.
Experiments have shown that the data-driven system significantly outperforms rule-based systems in the fraud detection task.Besides, the ablation study proves that the proposed dialogue management can improve the recognition accuracy and learning efficiency because of its ability to model structured information.We also analyze the behavior of our system and find the learned anti-fraud policy is interpretable and flexible.
To summarise, our main contributions are threefold: (1) As far as we know, this is the first work to detect identity fraud through dialogue interactions.(2) We point out three major challenges of identity fraud detection and propose corresponding solutions.(3) Experiments have shown that our approach can detect identity fraud effectively.

Knowledge Graph Constructor
There are four types of personal information in a Chinese loan application form: "School", "Company", "Residence" and "BirthPlace".To generate derived questions, we link all personal information entities to nodes in an existing Chinese KG3 and crawl triplets that are directly related to them.However, owing to the fact that the KG is largely sparse, nearly a half of entities4 cannot be linked.Thus we use wealthy geographic information about organizations and locations in electronic maps (e.g., Amap5 ) to complete the KG.Specifically, for each personal information entity, we first crawl its points of interest (POI6 ) within one kilometer and the POI types in the Amap.If there are multiple POI for the same type, we only keep the nearest one.Then we generate triplets in the form of ($Personal Information Entity$, $POI type$, $POI$) to indicate the fact that the nearest $POI type$ to the $Personal Information Entity$ is $POI$.Besides, for any two entities, if the distance between them is less than 100 meters, we generate two triplets to represent the bi-directional adjacency relation between them.In the end, as shown in Fig. 2, we combine triplets from the two information sources (the Chinese KG and the electronic map) to construct a new KG.In this KG, nearly all personal information entities can be linked.And for each relation7 , we design a language template for the question generation.

Dialogue System Design
The overview of our system is shown in Fig. 3.The core of the system is dialogue management  3: Overview of our approach.To build the directed graph for dialogue management, we reverse directions of all edges in the original personal KG to make the head entity read information from its tail entities.Besides, we add a special node "User" and new edges to represent the applicant's personal information.In this graph, the direction of each edge is the direction of message passing in KG-DST.The blue and green edges indicate that two agents select nodes to unfold the dialogue according to HDP.
which is organized as a directed graph G(V, E).In each turn, our system first infers dialogue states with the KG-based dialogue state tracker by computing embeddings of nodes.In this graph, the embedding of "User" node is the dialogue state of a high-level agent (manager), and the embeddings of nodes adjacent to "User" (named as personal information nodes) are the dialogue states of lowlevel agents (workers).Then our system unfolds the dialogue according to the hierarchical dialogue policy.Concretely, the manager first selects a personal information node (e.g., "Nanjing University") as the worker, and then the worker will select a node (e.g., "Gulou Subway Station") from its predecessors (named as answer nodes).After that, the sampled nodes of two agents form the final system action (a triplet).Next, based on the triplet and a predefined template, the natural language generation module will give a multiple-choice question to the applicant.After the applicant gives a response, the embeddings of all nodes will be updated to generate new dialogue states for the next turn.

KG-based Dialogue State Tracker
There are three types of nodes in G(V, E): the "User" node v u , the personal information node v p ∈ V p and the answer node v a ∈ V a .In the t-th turn, KG-DST first gives an initial embedding to v ∈ V p ∪ V a .The initial embedding is the concatenation of static features and dialogue features.Then, v will gather information from its predecessors N (v).After multiple message passings, we get its final embedding E t (v).Next, v u will aggregate information from V p to generate its embedding E t (v u ).Finally, E t (v u ) and E t (v p ) are the dialogue states of the manager and worker respectively.
Static Features.Specifically, for v ∈ V p ∪V a , the static features include the degree and type.Besides, for v a ∈ V a , we use the "spread degree on the internet" to distinguish different answer nodes because we find there is an obvious correlation between this "spread degree" feature and applicants' behavior in our human experiments (Section 5.1).To get the "spread degree" feature, we first treat the answer node v a and its adjacent personal information node v p as the keyword8 , and then search it in the search engine.The number9 of the retrieved results will be the "spread degree" feature of v a .In the end, each static feature is encoded as a one-hot vector and they are concatenated to form a vector S t (v).
Dialogue Features.The dialogue features record the dynamic information of v ∈ V p ∪V a during the dialogue.Specifically, dialogue features include whether the node has been explored by the manager or workers and whether the node appeared in the system action of the last turn.In addition, for v p ∈ V p , the dialogue features include the interaction turns of the corresponding worker and the number of correctly/incorrectly answered questions about v p .For v a ∈ V a , the dialogue features include whether applicants know v a is the answer to a derived question.Similarly, dialogue features will be encoded as a one-hot vector D t (v).
Message Passing.In Fig. 3, the applicant does not know "Gulou Subway Station" is the nearest subway station to "Nanjing University".In such case, the personal information about "School" may be fake.Besides, for another question "What's the nearest park to Nanjing University?", the applicant may not know the answer because the distance between "Gulou Park" and "Gulou Subway Station" is less than 100 meters.Thus, we want the known information of "Gulou Subway Station" to be sent to its successors.
Specifically, for v ∈ V p ∪ V a , we compute its embedding recursively as follows: where E k t (v) is the depth-k node embedding in the t-th turn, N (v) denotes the set of nodes adjacent to v, W k is the parameter in the k-th iteration and the aggregate function is the element-wise max operation.The final node embedding is the concatenation of embeddings at each depth: where and K is a hyperparameter.
After getting the embedding of v p ∈ V p , we compute the embedding of v u by aggregating information from V p : where W p is the parameter.
In the end, E t (v p ) is the worker's dialogue state which contains information of a part of the graph and E t (v u ) is the manager's dialogue state which contains information of the whole graph.

Hierarchical Dialogue Policy
After obtaining the dialogue states and node embeddings, our system will unfold the dialogue according to a hierarchical policy.
Specifically, the manager first selects v p ∈ V p as a worker to verify the identity state of v p according to a high-level policy π m .Then, the worker will choose some answer nodes from its predecessors N (v p ) to generate questions about v p according to a low-level policy π w .If the worker gives the decision d w ∈ {Fraud, Non-Fraud} about the identity state of v p , π w will end and the manager will select a new worker again or give the final decision.If the manager gives the final decision d m ∈ {Fraud, Non-Fraud} about the applicant's identity state, π m will end.Formally, π m and π w are defined as follows: where Besides, to prevent the two agents from making decisions in haste, domain rules are applied to their dialogue policies by "Action Mask" (Williams et al., 2017).Specifically, domain rules are defined as follows.First, only after all or at least three answer nodes related to a worker have been explored can the worker make the decision.Second, only after all workers have made decisions or at least one worker's decision is "Fraud" can the manager make the final decision.

Reward Function
We expect the system can give correct decisions about applicants within minimum turns.Thus, at the end of each dialogue, the manager receives a positive reward r m crt for correct decision, or a negative reward −r m wrg for wrong decision.If the manager selects a worker to unfold the dialogue in the t-th turn and the worker gives n w t questions to the applicant, the manager will receive a negative reward −n w t * r turn .Besides, we provide an internal reward to optimize the low-level policy.Specifically, if the worker gives a correct decision about the corresponding personal information, it will receive a positive reward r w crt .Otherwise, it will receive a negative reward −r w wrg .And in each turn, the worker receives a negative reward −r turn to encourage shorter interactions.

Reinforcement Learning
The two agents can be trained with policy gradient (Williams, 1992) approach as follows: where R m t and R w t are the discounted returns of two agents, a m t and a w t are their sampled actions, V m t E t (v u ) and V w t E t (v p ) are value networks which are optimized by minimizing mean-square errors to R m t and R w t respectively.

Pre-Training
Before reinforcement learning (RL), supervised learning (SL) is applied to mimic dialogues provided by a rule-based system.Rules are defined as follows.First, the manager selects a worker randomly.Then, the worker will select answer nodes randomly to generate questions.Let n crt /n wrg denotes the number of correctly/incorrectly answered questions in this worker's decision process.If |n crt −n wrg | ≥ 3 or all answer nodes related to this worker have been explored, the worker will give its decision.If n crt < n wrg , the worker's decision will be "Fraud" and the manager's decision will be "Fraud" too.Otherwise, the worker's decision will be "Non-Fraud" and the manager will choose a new worker to continue the dialogue.In the end, if all workers' decisions are both "Non-Fraud", the manager's decision will be "Non-Fraud".
5 Experiments and Results

User Simulator and Human Experiments
Simulating users' behavior is an efficient way to evaluate dialogue systems.In our task, the applicants' behavior is answering derived questions.Thus, the key of user simulator is to estimate the probability p(k i ), where k i is a binary random variable which denotes whether or not the applicant knows the triplet fact t i behind a question q i .Intuitively, p(k i ) depends on three factors.First, if the applicant's identity state is "Non-Fraud", p(k i = 1) will be greater than p(k i = 0).Second, the wider a triplet fact t i spreads on the internet, the more likely applicants know it.For example, almost all of applicants know (Baidu, Founder, Robin Li) because there are a lot of web pages containing this fact on the internet.Third, if applicants know other triplets that are related to t i , they may well know t i because it is easy to deduce t i based on what they know.For example, if applicants know (Nanjing University, Park, Gulou Park) and (Gulou Park, SubwayStation, Gulou Subway Station), they may well know (Nanjing University, SubwayStation, Gulou Subway Station).
To formalize the effect of the three factors on applicants' behavior, we introduce three hypotheses: (1) For both fraudsters and normal applicants, p(k i = 1) is proportional to the "spread degree" of t i .(2) The "spread degree" of t i can be approximated by the number of retrieved results (denoted as Freq(e h i , e t i )) in search engine where the keyword is the head entity e h i and the tail entity e t i of t i .(3) For any three triplets, if they form a closed loop (regardless of directions) and applicants know two of them, the applicants must know all of them.
To generate simulated loan applicants, we first estimate the function relations between p(k i = 1) and Freq(e h i , e t i ) via human experiments.Specifically, we ask 31 volunteers to answer derived questions10 about their own and others' personal information.And then, for the question q i , we place it into a discrete bin according to the logarithm of Freq(e h i , e t i ).In each bin, we use the ratio of correctly answered questions to approximate p(k i = 1).In the end, the relations are shown in Fig. 4. We can find that the statistical distributions of real behavior patterns of normal applicants and fraudsters are distinguishable and the results agree with our first two intuitions.

log(Freq(e h
i , e t i ) + 1)  (e h i , e t i ) is used to approximate the "spread degree" of t i .p(k i = 1) indicates the probability that applicants know t i .
Then, we get simulated loan applicants11 following a "sampling and calibration" manner.Specifically, given an applicant's personal information, we first sample the identity state randomly.If the sampling result is "Fraud", we will sample 1 ∼ 4 information item(s) randomly to be the fake information.Generally, forging information about "School" and "Company" may result in a larger loan.Thus, when sampling the fake information, the sampling probability of "School" and "Company" is twice the sampling probability of "Residence" and "BirthPlace".Then, for each personal information and its related triplet t i , we sample k i based on (1) whether the personal information is fake (2) Freq(e h i , e t i ) and ( 3) the corresponding function relation in Fig. 4. Because the sampling results {k 1 , ..., k n } are independent from each other, there may be the situations where the sampling results do not satisfy the rule defined in our third hypothesis.If that happens, we calibrate it until all sampling results agree with the hypothesis.Finally, if k i = 1, the applicant will give the correct answer to the question q i .Otherwise, the applicant's response is "D.I am not quite clear.".

Baselines
We compare our model (denoted as Full-S) with two rule-based baselines.In addition, to study the effect of message passing and hierarchical policy on the model training, we compare Full-S with two neural baselines for the ablation study.
• Flat Rule: The system selects 10 questions randomly to ask applicants.If the number of correctly answered questions is fewer than the number of incorrectly answered questions, the system's decision will be "Fraud".Otherwise, the system's decision will be "Non-Fraud".• Hierarchical Rule: A rule-based system which uses a hand-crafted hierarchical policy to unfold dialogues.As shown in Section 4.3, we use this system to pre-train Full-S.
• MP-S: A neural dialogue system which uses message passing to infer dialogue states but uses a flat policy to unfold dialogues.That is, the manager selects answer nodes directly to generate derived questions.• HP-S: A neural dialogue system which uses the hierarchical policy to unfold dialogues but does not use message passing to infer dialogue states.That is, K is 0 in Eq. 2.

Implementation Details
We collect 906 applicants' personal information, and randomly select 706 for training, 100 for dev, and 100 for test.In each batch, we sample 32 applicants' information for simulation.The maximum interaction turns of the system and the worker are 40 and 10 respectively.The iteration depth K is 2 in message passing.In the reward function, r m crt = 3, r m wrg = 3, r w crt = 1, r w wrg = 1, r turn = 0.1.The discount factors are 0.999 and 0.99 for the manager and worker respectively.All neural dialogue systems are both pre-trained with rule-based systems for 20 epochs.We pre-train MP-S with Flat Rule because they both use the flat policy.Besides, we pre-train HP-S and Full-S with Hierarchical Rule because they both use the hierarchical policy.In the RL stage, all neural dialogue systems are trained for 300 epochs.When testing, we repeat 10 epochs and take the average.

Test Performance
We compare Full-S with baselines in terms of two metrics: recognition accuracy and average turns.
Fig. 5 shows the test performance.We can see that the accuracy of Flat Rule is lower than Hierarchical Rule, and the accuracy of the data-driven counterpart of Flat Rule (MP-S) is just slightly higher than randomly guessing.It means that using the hierarchical policy to unfold dialogues is necessary for our task.Besides, HP-S achieves a higher accuracy than its rule-based counterpart (Hierarchical Rule) within much fewer turns.It proves that the data-driven system is more efficient than the rule-based system.Finally, equipped with message passing and hierarchical policy, Full-S achieves the best accuracy.And it is interesting to note that Full-S requires more turns but achieves much higher accuracy than HP-S.One possbile reason is that HP-S may easily trap in local optimum without message passing to infer dialogue states.

Ablation Study
To study the effect of message passing and hierarchical policy, we show the learning curves of three neural dialogue systems in Fig. 6.Each learning curve is averaged on 10 epochs.We find that, compared with Full-S and HP-S, MP-S is unable to learn any useful dialogue policy during training.There are two reasons for this.First, the action space of flat policy is too large, which results that MP-S suffers from the sparse reward and long horizon issues.Second, without explicitly modeling the logic relation between the manager and workers, MP-S is prone to errors.Besides, we can see that the convergence speed of Full-S is faster than HP-S in both the pre-training and the RL stages.This is because message passing can model structured information of the KG, and hence Full-S is more efficient in policy learning.

Manager's Policy Analysis
To better understand the high-level dialogue policy, we analyze the manager's behavior in Full-S.First, we show the manager's action probability curves in Fig. 7.We can see that selecting "School" and "Company" to verify personal information has a priority over "Residence" and "BirthPlace" in the first decision step.And in the following two decision steps, the probabilities of selecting "Residence" and "BirthPlace" will increase.This is because simulated applicants tend to forge personal information about "School" and "Company" for a larger loan.Consequently, to discover fake information faster, the manager learns to prioritize different information items.
Second, intuitively the manager's policy should follow two logic rules in our task: Rule1: If a worker's decision is "Fraud" (Cond1), the dialogue should end immediately and the manager's decision will be "Fraud" (RS1).
To test whether the manager follows the two rules, we calculate the probabilities of RS1 and RS2 under Cond1 and Cond2 respectively.Specifically, in the test data, p(RS1|Cond1) = 0.95 and p(RS2|Cond2) = 0.96.It proves that the manager will adopt workers' suggestions in most situations.
Meanwhile, we study cases where the manager does not follow the two rules and find some interesting phenomena.Specifically, if only one worker's decision is "Fraud" and the applicant can answer a few questions given by this worker, the manager's decision may be "Non-Fraud".Besides, if all workers' decisions are both "Non-Fraud" but the applicant can not answer most of the questions given All triplets that are related to "Shanghai Sports University" (replaced with $School$ for short): by one worker, the manager's decision may still be "Fraud".In fact, when the two cases happen, the worker may make the wrong decision.However, the manager can still give the correct decision.It means the manager is robust to workers' mistakes.

Worker's Policy Analysis
To better understand the low-level dialogue policy and the effect of message passing on it, we compare workers' behaviors in HP-S and Full-S.Table 1 shows an example of verifying personal information about "School" in HP-S and Full-S.We can see that the two systems give the same two questions in the first two turns.This is because the triplets behind the two questions are rarely known to fraudsters.It means that the low-level policies learn to give priority to such triplets for better distinguishing fraudsters from normal applicants.In the third turn, HP-S gives a question that is easy to answer for fraudsters and makes the wrong decision.However, Full-S notices the applicant gives the correct answer to a question that is hard to answer for fraudsters.Thus, Full-S does not make the decision in haste but continue the dialogue.Besides, it is worth noting that Full-S has not chosen ($School$, ConvenienceStore, HaoDe) to generate the derived question.This is because the message passing mechanism models the relation between "HaoDe" and "Xiao Liu Fruit".Specifically, because the two entities are closely related to each other, if applicants know "Xiao Liu Fruit", they may well know "HaoDe".Thus, there is no need to select this triplet anymore.

Related work
As far as we know, there is no published work about detecting identity fraud via interactions.We describe the two most related directions as follows: Deception Detection.Detecting deception is a longstanding research goal in many artificial intelligence topics.Existing work has mainly focused on extracting useful features from non-verbal behaviors (Meservy et al., 2005;Lu et al., 2005;Bhaskaran et al., 2011), speech cues (Levitan et al., 2018;Graciarena et al., 2006) or both (Krishnamurthy et al., 2018;Pérez-Rosas et al., 2015) to train a classification model.In their work, the definition of deception is telling a lie.Besides, existing work requires labeled data, which is often hard to get.In contrast, we focus on detecting identity fraud through multi-turn interactions and use reinforcement learning to explore the anti-fraud policy without any labeled data.
Dialogue System.Our work is also related to task-oriented dialogue systems (Young et al., 2013;Wen et al., 2017;Li et al., 2017;Gašić et al., 2011;Wang et al., 2018Wang et al., , 2019)).Existing systems have mainly focused on slot-filling tasks (e.g., booking a hotel).In such tasks, a set of system actions can be pre-defined based on the business logic and slots.In contrast, the system actions in our task are selecting nodes in the KG to generate questions.Thus, the structured information is important in our task.Besides, some works also try to model structured information in dialogue systems.For example, Peng et al. (2017) used hierarchical reinforcement learn-ing (Vezhnevets et al., 2017;Kulkarni et al., 2016;Florensa et al., 2017) to design multi-domain dialogue management.Chen et al. (2018) used graph neural networks (Battaglia et al., 2018;Li et al., 2015;Scarselli et al., 2009;Niepert et al., 2016) to improve the sample-efficiency of reinforcement learning.He et al. (2017) used DynoNet to incorporate structured information in the collaborative dialogue setting.Compared with them, our method is a combination of the graph neural networks and hierarchical reinforcement learning, and experiments prove that they both work in the novel dialogue task.

Conclusion
This paper proposes to detect identity fraud automatically via dialogue interactions.To achieve this goal, we present structured dialogue management to explore anti-fraud dialogue strategies based on a KG with reinforcement learning and a heuristic user simulator to evaluate our systems.Experiments have shown that end-to-end systems outperform rule-based systems and the proposed dialogue management can learn interpretable and flexible dialogue strategies to detect identity fraud more efficiently.We believe that this work is a basic first step in this promising research direction and will help promote many real-world applications.

Figure 2 :
Figure 2: An example of the KG for Nanjing University.The green edge represents the triplet crawled from the existing KG and the blue edges represent the triplets generated based on a navigation electronic map.

FlaFigure 5 :
Figure 5: Performance of different systems.Tested on 10 epochs using the best model during training.

Figure 6 :
Figure 6: Accuracy curves of different neural models in dev set.The first "20 epochs" indicates the pre-training stage.The last "300 epochs" indicates the RL stage.

Figure 7 :
Figure 7: Manager's action probability curves.Each curve indicates the probability of selecting a piece of personal information to verify.For each curve, we take the average of all dialogues during testing.

Applicant Natural Language Generation Dialogue Management
and E t (v p ) are dialogue states of the manager and worker in the t-th turn, E(d m ) is the encoding of the manager's terminal action which has the same dimension as E t (v p ), and E(d w ) is the encoding of the worker's terminal action which has the same dimension as E t (v a ).

Table 1 :
Examples of the low-level policies in two systems.Note that the information about "School" is not fake.