Robust Multi-Relational Clustering via \ell_1-Norm Symmetric Nonnegative Matrix Factorization

In this paper, we propose an ‘ 1 -norm Symmetric Nonnegative Matrix Tri-Factorization ( ‘ 1 S-NMTF) framework to cluster multi-type relational data by utilizing their interrelatedness. Due to introducing the ‘ 1 -norm distances in our new objective function, the proposed approach is robust against noise and outliers, which are inherent in multi-relational data. We also derive the solution algorithm and rigorously analyze its correctness and convergence. The promising experimental results of the algorithm applied to text clustering on IMDB dataset validate the proposed approach.


Introduction
Traditional clustering aims to partition data points into several groups, such that the data points in the same group can share some commonalities whilst those from different groups are dissimilar. With the recent progresses of Internet and computational technologies, data have started to appear in much richer structures. To be more specific, in many real-world problems a pair of object can be related in several different ways, which inevitably complicates the problem and calls for new clustering algorithms for better understanding to the data. To address this new challenge, Wang et. al. (Wang et al., 2011c;Wang et al., 2011d) proposed nonnegative matrix factorization (NMF) (Lee and Seung, 1999) based computational algorithms that have successfully solved the problems.
Due to its mathematical elegance and its equivalence to K-means clustering and spectral clustering (Ding et al., 2005), NMF (Lee and Seung, 1999) has been broadly studied in recent years and successfully solved a variety of practical problems in data mining and machine learning, such as those in computer vision (Wang et al., 2011b), bioinformatics (Wang et al., 2013), natural language understanding (Wang et al., 2011a), to name a few. Compared to many traditional clustering methods, such as K-means clustering, NMF has better mathematical interpretation, which usually lead to improved accuracy on clustering (Ding et al., 2010). Traditional clustering algorithms concentrate on dealing with homogeneous data, in which all the data belong to one single type (Wang et al., 2011d). To deal with the richer data structures in modern real-world applications, symmetric Nonnegative Matrix Tri-Factorization (NMTF) (Wang et al., 2011c) have demonstrated its effectiveness on simultaneous clustering of multi-type relational data by utilizing the interrelatedness among different data types.
Traditional NMF algorithms routinely use the least square error functions, which are notably known to be sensitive against outliers (Kong et al., 2011). However, at the era of big data outliers are inevitable due to the ever increasing data sizes. As a result, developing a more robust NMF model for multi-relational data clustering has become more and more important. In this paper, we further develop the symmetric NMF clustering model proposed in (Wang et al., 2011c) by using the 1 -norm distances, such that our new clustering model is more robust against outliers, which is of particular importance in multi-relational data.
2 Robust Multi-Relational Clustering via 1 -Norm Symmetric NMTF (S-NMTF) In this section, we first introduce the backgrounds to use symmetric NMF to cluster multi-relational data. Then we develop our new 1 -norm symmetric NMF model for better robustness against outlying data. The solution algorithm to our new model will be proposed and analyzed in the next section.
Notations. In this paper, we use upper case letters to denote matrices. Given a matrix M , its en-397 try at the i-th row and j-th column is denoted as M (ij) . The Frobenius norm of a matrix M is de- 1/2 and its 1norm is denoted as M 1 = i j |M (ij) |.

Problem Formalization
K-type relational data set can be denoted as . , x k n k represents the data set of k-th type. Suppose we are given a set of relationship matrices {R kl ∈ n k ×n l } (1≤k≤K,1≤l≤K) between different types of data objects, then we have R kl = R T lk . Our goal is to simultaneously partition the data objects in χ 1 , χ 2 , . . . , χ K into c 1 , c 2 , . . . , c K disjoint clusters respectively.

Our objective
To cluster multi-relation data, symmetric NMF has been taken advantage that solves the following optimization problem (Wang et al., 2008): It has also been shown that solving the above equation is equivalent to solve (Long et al., 2006): where Despite its successfulness of the method proposed in (Wang et al., 2011c) in multi-relational data clustering, the objectives in Equations (1-2) use the squared 2 -norm distances to measure the matrix approximation errors, which, though, are prone to outliers. As a result, the clustering results could be heavily dominated by outlying data points with large approximation errors (Kong et al., 2011;Nie et al., 2011;Wang et al., 2014). To improve the robustness of the clustering model, following prior works (Kong et al., 2011;Nie et al., 2011;Wang et al., 2014) we propose to use the following 1 -norm symmetric NMTF model for multi-relational data clustering: In this new formulation, the approximation errors are measured by the 1 -norm distances, which are expected to be more insensitive to outlying data points. As shown in Figure 1, when there exist outliers in the input data, traditional squared Frobenius-norm NMF are inclined to cluster incorrectly, while the 1 -norm NMF are more robust and can cluster more accurately.
until Converges 3 Algorithm to Solve 1 -Norm S-NMTF and its analysis The computational algorithm for the proposed 1norm S-NMTF approach is summarized in Algorithm 1 (Due to space limit, the derivation of the algorithm is skipped and will be provided in our journal version of the paper). Upon solution, the 398 final cluster labels are obtained from the resulted G k .
The following theorems guarantee the correctness of Algorithm 1 (Due to space limit, the derivation of the algorithm is skipped and will be provided in our journal version of the paper).
Theorem 3.1 If the updating rules of G and S in Algorithm 1 converges, the final solution satisfies the KKT optimal condition. This is the fixed point relationships that the solution must satisfy.
The following lemmas and theorem guarantee the convergence of Algorithm 1 (Due to space limit, the derivation of the algorithm is skipped and will be provided in our journal version of the paper).
Lemma 3.2 (Lee and Seung, 1999) Lemma 3.3 (Lee and Seung, 1999) If Z is an auxiliary function for F , then F is non-increasing under the update h (t+1) = arg min h Z(h, h ).

Theorem 3.4 Let
then the following function is an auxiliary function of J(G). Furthermore, it is a convex function in G and its global minimum is Based on the property of auxiliary function and convex function, by updating G, we can always get the optimal solution to the object function, thus determining the final cluster label.

Experiments Result
In this section, We test our proposed algorithm on IMDB dataset by using its inter-type relationship information.

Data set
We use the dataset from ACL-IMDB provided by (Maas et al., 2011). In this dataset, there is a subtraining set of 25000 highly polar movie reviews, in which positive and negative comments come up with one half(12500) each. The dataset also includes the following two important files: the content of each comment and the corresponding U RL 399 where each comment comes from. There are also some other files but not related with the experiment we conduct, thus we skip them.

Experiments settings
In our experiment, we set the multi-type data as 3 types: author, comment and word. As it is discussed in the 3rd part, there are three relationships we need to find, which correspond to three matrices we need to construct the multi-type data matrix:comment-author, comment-word and author-word. By making use of the U RL of every comment, we can find the author who posts the corresponding comment, thus we can build the author-comment matrix.Since each comment with content is given by the dataset file, we could therefore construct the matrix of comment-word, and the author-word matrix is the product of authorcomment matrix and comment-word matrix.
We could find the first 1500 authors who post comments most, since the comments from the same person are more likely to have some correlations, such as similar sentence structures, same words and etc. We also rule out the stop-words since they may disturb the clustering and they are meaningless to the property of comments. To make our experiments to be more persuasive, we also add some noise to the three relationship matrices with a ratio of 25 percentage(1/5 in amplitude). By randomly choosing 500 authors from 1500, we could generate many sub-datasets to conduct our experiments.

Experiments Results
We compare the performance of our proposed 1norm S-NMTF algorithm with other methods such as P-NMF, Frobenius norm S-NMTF, traditional NMF and K-means clustering. For simplicity, we only compare the clustering accuracy of commentword matrix since its label (positive or negative) is fixed(the grounding label), thus could be compared with the clustering results by using the clustering algorithms. Table 1 shows that when the data is pure, in many cases(more than the listed), 1 -norm S-NMTF approach has better performance than others Table 2 illustrates the situation when some noise is added to the data, it is easy to find that 1 -norm S-NMTF algorithm is the best in terms of clustering accuracy. This meets our analysis in our Motivation part.   Table 3 reveals the fact that 1 -norm S-NMTF algorithm performs more robust than any other algorithm. Though the clustering accuracy of 1 -norm S-NMTF decreases when noise exists, still it reduces the least among the five algorithms. This result convincingly demonstrates the robustness of our proposed 1norm S-NMTF method.

Conclusion
In this paper, we presented an 1 -norm Symmetric Nonnegative Matrix Tri-Factorization Framework to cluster multi-type relational data simultaneously. Our proposed approach clusters different types of data, using its inter-type relationship by transforming the original problem into a symmetric NMTF problem. We also presented an auxiliary function and high order matrix inequality to derive the solution algorithm. The proposed algorithm not only makes use of the rich data struc-400