SOLAR: Scalable Online Learning Algorithms for Ranking

Traditional learning to rank methods learn ranking models from training data in a batch and ofﬂine learning mode, which suffers from some critical limitations, e.g., poor scalability as the model has to be re-trained from scratch whenever new training data arrives. This is clearly non-scalable for many real applications in practice where training data often arrives sequentially and frequently. To overcome the limitations, this paper presents SO-LAR — a new framework of Scalable On-line Learning Algorithms for Ranking, to tackle the challenge of scalable learning to rank. Speciﬁcally, we propose two novel SOLAR algorithms and analyze their IR measure bounds theoretically. We conduct extensive empirical studies by comparing our SOLAR algorithms with conventional learning to rank algorithms on benchmark testbeds, in which promising results validate the efﬁcacy and scalability of the proposed novel SOLAR algorithms.


Introduction
Learning to rank [27,8,29,31,7] aims to learn some ranking model from training data using machine learning methods, which has been actively studied in information retrieval (IR). Specifically, consider a document retrieval task, given a query, a ranking model assigns a relevance score to each document in a collection of documents, and then ranks the documents in decreasing order of relevance scores. The goal of learning to rank is to build a ranking model from training data of a set of queries by optimizing some IR performance measures using machine learning techniques. In literature, various learning to rank techniques have * The corresponding author. This work was done when the first two authors visited Dr Hoi's group. been proposed, ranging from early pointwise approaches [15,30,28], to popular pairwise [26,18,3], and recent listwise approaches [5,38]. Learning to rank has many applications, including document retrieval, collaborative filtering, online ad, answer ranking for online QA in NLP [33], etc.
Most existing learning to rank techniques follow batch and offline machine learning methodology, which typically assumes all training data are available prior to the learning task and the ranking model is trained by applying some batch learning method, e.g., neural networks [3] or SVM [4]. Despite being studied extensively, the batch learning to rank methodology has some critical limitations. One of serious limitations perhaps is its poor scalability for real-world web applications, where the ranking model has to be re-trained from scratch whenever new training data arrives. This is apparently inefficient and non-scalable since training data often arrives sequentially and frequently in many real applications [33,7]. Besides, batch learning to rank methodology also suffers from slow adaption to fast-changing environment of web applications due to the static ranking models pre-trained from historical batch training data.
To overcome the above limitations, this paper investigates SOLAR -a new framework of Scalable Online Learning Algorithms for Ranking, which aims to learn a ranking model from a sequence of training data in an online learning fashion. Specifically, by following the pairwise learning to rank framework, we formally formulate the learning problem, and then present two different SOLAR algorithms to solve the challenging task together with the analysis of their theoretical properties. We conduct an extensive set of experiments by evaluating the performance of the proposed algorithms under different settings by comparing them with both online and batch algorithms on benchmark testbeds in literature.
As a summary, the key contributions of this pa-per include: (i) we present a new framework of Scalable Online Learning Algorithms for Ranking, which tackles the pairwise learning to ranking problem via a scalable online learning approach; (ii) we present two SOLAR algorithms: a first-order learning algorithm (SOLAR-I) and a second-order learning algorithm (SOLAR-II); (iii) we analyze the theoretical bounds of the proposed algorithms in terms of standard IR performance measures; and (iv) finally we examine the efficacy of the proposed algorithms by an extensive set of empirical studies on benchmark datasets. The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 gives problem formulations of the proposed framework and presents our algorithms, followed by theoretical analysis in Section 4. Section 5 presents our experimental results, and Section 6 concludes this work and indicates future directions.

Related Work
In general, our work is related to two topics in information retrieval and machine learning: learning to rank and online learning. Both of them have been extensively studied in literature. Below we briefly review important related work in each area.

Learning to Rank
Most of the existing approaches to learning to rank can be generally grouped into three major categories: (i) pointwise approaches, (ii) pairwise approaches, and (iii) listwise approaches.
The pointwise approaches treat ranking as a classification or regression problem for predicting the ranking of individual objects. For example, [12,19] formulated ranking as a regression problem in diverse forms. [30] formulated ranking a binary classification of relevance on document objects, and solved it by discriminative models (e.g., SVM). In [15], Perceptron [32] ranking (known as "Prank") [15] formulated it as online binary classification. [28] cast ranking as multiple classification or multiple ordinal classification tasks.
The pairwise approaches treat the document pairs as training instances and formulate ranking as a classification or regression problem from a collection of pairwise document instances. Example of pairwise learning to rank algorithms include: neural network approaches such as RankNet [3] and LambdaRank [2], SVM approaches such as RankSVM [26], boosting ap-proaches such as RankBoost [18], regression algorithms such as GBRank [43], and probabilistic ranking algorithms such as FRank [35]. The pairwise group is among one of widely and successfully applied approaches. Our work generally belongs to this group.
The listwise approaches treat a list of documents for a query as a training instance and attempt to learn a ranking model by optimizing some loss defined on the predicted list and the ground-truth list. In general, there are two types of approaches. The first is to directly optimize some IR metrics, such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG) [25]. Examples include AdaRank by boosting [39], SVM-MAP by optimizing MAP [42], PermuRank [40], and Sof-tRank [34] based on a smoothed approximation to NDCG, and NDCG-Boost by optimizing NDCG [37], etc. The other is to indirectly optimize the IR metrics by defining some listwise loss function, such as ListNet [5] and ListMLE [38].
Despite being studied actively, most existing works generally belong to batch learning methods, except a few online learning studies. For example, Prank [15] is probably the first online pointwise learning to ranking algorithm. Unlike Prank, our work focuses online pairwise learning to rank technique, which significantly outperforms Prank as observed in our empirical studies. Besides, our work is also related to another existing work in [10], but differs considerably in several aspects: (i) they assume the similarity function is defined in a bi-linear form which is inappropriate for document retrieval applications; (ii) their training data is given in the form of triplet-image instance (p 1 , p 2 , p 3 ), while our training data is given in a pairwise query-document instance (q t , d 1 t , d 2 t ); (iii) they only apply first order online learning algorithms, while we explore both first-order and second-order online algorithms. Finally, we note that our work differs from another series of online learning to rank studies [21,22,23,36,41] which attempt to explore reinforcement learning or multi-arm bandit techniques for learning to rank from implicit/partial feedback, whose formulation and settings are very different.

Online Learning
Our work is closely related to studies of online learning [24], representing a family of efficient and scalable machine learning algorithms. In literature, a variety of online algorithms have been proposed, mainly in two major categories: first-order algorithms and second-order algorithms. The notable examples of first-order online learning methods include classical Perceptron [32], and Passive-Aggressive (PA) learning algorithms [13]. Unlike first-order algorithms, second-order online learning [6], e.g., Confidence-Weighted (CW) learning [16], usually assumes the weight vector follows a Gaussian distribution and attempts to update the mean and covariance for each received instance. In addition, Adaptive Regularization of Weights Learning (AROW) [14] was proposed to improve robustness of CW. More other online learning methods can be found in [24]. In this work, we apply both first-order and secondorder online learning methods for online learning to rank.

SOLAR -Online Learning to Rank
We now present SOLAR -a framework of Scalable Online Learning Algorithms for Ranking, which applies online learning to build ranking models from sequential training instances.

Problem Formulation
Without loss of generality, consider an online learning to rank problem for document retrieval, where training data instances arrive sequentially. Let us denote by Q a query space and denote by D a document space. Each instance received at time step t is represented by a triplet (q denote a pair of documents for prediction of ranking w.r.t. the query q (i) Assume that we have a total of Q queries , each of which is associated with a total of D i documents and a total of T i training triplet instances. In a practical document retrieval task, the online learning to rank framework operates in the following procedure: (i) Given a query q 1 , an initial model w 1 is first applied to rank the set of documents for the query, which are then returned to users; (ii) We then collect user's feedback (e.g., clickthrough data) as the ground truth labels for the ranking orders of a collection of T 1 triplet training instances; (iii) We then apply an online learning algorithm to update the ranking model from the sequence of T 1 triplet training instances; (iv) We repeat the above by applying the updated ranking model to process the next query.
For a sequence of T triplet training instances, the goal of online learning to rank is to optimize the sequence of ranking models w 1 , . . . , w T during the entire online learning process. In general, the proposed online learning to rank scheme is evaluated by measuring the online cumulative MAP [1] or online cumulative NDCG [25]. Let us denote by NDCG i and MAP i the NDCG and MAP values for query q i , respectively, which are defined as follows: where I {·} is an indicator function that outputs 1 when the statement is true and 0 other- G(l(π(r)))D(r), l(r) is the corresponding labels as K-level ratings, π f denote a rank list produced by ranking function f , m is the number of relevant documents. The online cumulative IR measure is defined as the average of the measure over a sequence of Q queries:

First-order SOLAR Algorithm
The key challenge of online learning to rank is how to optimize the ranking model w t when receiving a training instance (q i t , d 1 t , d 2 t ) and its true label y t at each time step t. In the following, we apply the passive-aggressive online learning technique [13] to solve this challenge. First of all, we formulate the problem as an optimization:

Second-order SOLAR Algorithm
The previous algorithm only exploits first-order information of the ranking model w t . Inspired by recent studies in second-order online learning [6,16,14], we explore second-order algorithms for online learning to rank.  Specifically, we cast the online learning to ranking problem into a probabilistic framework, in which we model feature confidence for a linear ranking model w with a Gaussian distribution with mean w ∈ R d and covariance Σ ∈ R d×d . The mean vector w is used as the model of the ranking function, and the covariance matrix Σ represents our confidence on the model: the smaller the value of Σ p,p , the more confident the learner has over the p-th feature w p of the ranking model w.
Following the similar intuition of the above section, we want to optimize our ranking model N (w, Σ) by achieving the following trade-off: (i) to avoid being deviated too much from the previous model N (w t , Σ t ); (ii) to ensure that it suffers a small loss on current triplet instance; and (iii) to attain a large confidence on the current instance. Similar to [16], we employ the Kullback-Leibler divergence to measure the distance between the current model w to be optimized and the previous model w t , and the regularization terms include both the loss suffered at current triplet instance and the confidence on current triplet instance.
Specifically, we formulate the optimization of second-order online learning to rank as: The above can be proved by following [14]. We omit the details. We denote the above algorithm as "SOLAR-II" for short.

Theoretical Analysis
In this section, we theoretically analyze the two proposed algorithms by proving some online cumulative IR measure bounds for both of them.
In order to prove the IR measure bounds for the proposed algorithms, we first need to draw the relationships between the cumulative IR measures and the sum of pairwise squared hinge losses. To this purpose, we introduce the following Lemma. Lemma 4.1. For one query q i and its related documents, the NDCG and MAP is lower bounded by its sum of pairwise squared hinge loss suffered by rank model w.
G(l(π(r)))D(r), l(r) is the corresponding labels as K-level ratings, π is rank list, m is the number of relevant documents.
Sketch Proof. Using the essential loss idea defined in [11], from Theorem 1 of [11] we could see the essential loss is an upper bound of measure-based ranking errors; besides, the essential loss is the lower bound of the sum of pairwise squared hinge loss, using the properties of squared hinge loss, which is non-negative, nonincreasing and satisfy 2 (0) = 1.
The above lemma indicates that if we could prove bounds for the online cumulative squared hinge loss compared to the best ranking model with all data beforehand, we could obtain the cumulative IR measures bounds. Fortunately there are strong theoretical loss bounds for the proposed online learning to ranking algorithms. The following shows the theorem of such loss bounds for the proposed SOLAR algorithms. Theorem 1. For the SOLAR-I algorithm with Q queries, for any rank model u, suppose R = max i,t φ(q i t , d 1 t ) − φ(q i t , d 2 t )) , the cumulative squared hinge loss is bounded by The proof for Theorem 1 can be found in Appendix A. By combining the results of Lemma 1 and Theorem 1, we can easily derive the cumulative IR measure bound of the SOLAR-I algorithm. Theorem 2. For the SOLAR-I algorithm with Q queries, for any ranking model u, the NDCG and MAP performances are respectively bounded by The analysis of the SOLAR-II algorithm would be much more complex. Let us denote by M(M = |M|) the set of example indices for which the algorithm makes a mistake, and by U(U = |U|) the set of example indices for which there is an update but not a mistake. Let X A = (q i maximum and minimum value of χ t , respectively. Σ T be the final covariance matrix and u T be the final mean vector. For any ranking model u, the squared hinge loss is bounded by where a = γ u 2 + u t XAu log(det(I + 1 γ XA)) + U The proof for Theorem 3 can be found in Appendix B. Now, by combining the Lemma 1 and Theorem 3, we can derive the cumulative IR measure bound achieved by the proposed SOLAR-II algorithm. Theorem 4. For the SOLAR-II algorithm with Q queries, for any ranking model u, the NDCG and MAP performances are respectively bounded by The above theorems show that our online algorithm is no much worse than that of the best ranking model u with all data beforehand.

Experiments
We conduct extensive experiments to evaluate the efficacy of our algorithms in two major aspects: (i) to examine the learning efficacy of the proposed SOLAR algorithms for online learning to rank tasks; (ii) to directly compare the proposed SOLAR algorithms with the state-of-the-art batch learning to rank algorithms. Besides, we also show an application of our algorithms for transfer learning to rank tasks to demonstrate the importance of capturing changing search intention timely in real web applications. The results are in the supplemental file due to space limitation.

Experimental Testbed and Metrics
We adopt the popular benchmark testbed for learning to rank: LETOR 1 [31]. To make a comprehensive comparison, we perform experiments on all the available datasets in LETOR3.0 and LETOR4.0. The statistics are shown in Table 1. For performance evaluation metrics, we adopt the standard IR measures, including "MAP", "NDCG@1", "NDCG@5", and "NDCG@10".

Evaluation of Online Rank Performance
This experiment evaluates the online learning performance of the proposed algorithms for online learning to rank tasks by comparing them with the existing "Prank" algorithm [15], a Perceptronbased pointwise online learning to rank algorithm, and a recently proposed "Committee Perceptron (Com-P)" algorithm [17], which explores the ensemble learning for Perceptron. We evaluate the performance in terms of both online cumulative NDCG and MAP measures. As it is an online learning task, the parameter C of SOLAR-I is fixed to 10 −5 and the parameter γ of SOLAR-II is fixed to 10 4 for all the datasets, as suggested by [17], we set the number of experts in "Com-P" to 20. All experiments were conducted over 10 random permutations of each dataset, and all results were averaged over the 10 runs. Table 2 give the results of NDCG on all the datasets, where the best results were bolded. Several observations can be drawn as follows.
First of all, among all the algorithms, we found that both SOLAR-I and SOLAR-II achieve significantly better performance than Prank, which proves the efficacy of the proposed pairwise algorithms. Second, we found that Prank (pointwise) performs extremely poor on several datasets (HP2003, HP2004, NP2003, NP2004, TD2003, TD2004). By looking into the details, we found that it is likely because Prank (pointwise), as a pointwise algorithm, is highly sensitive to the imbalance of training data, and the above datasets are indeed highly imbalanced in which very few documents are labeled as relevant among about 1000 documents per query. By contrast, the pairwise algorithm performs much better. This observation further validates the importance of the proposed pairwise SOLAR algorithms that are insensitive to imbalance issue. Last, by comparing the two SOLAR algorithms, we found SOLAR-II outperforms SOLAR-I in most cases, validating the efficacy of exploiting second-order information.

Comparison of ranking performance
This experiment aims to directly compare the proposed algorithms with the state-of-the-art batch algorithms in a standard learning to rank setting. We choose four of the most popular and cuttingedge batch algorithms that cover both pairwise and listwise approaches, including RankSVM [20], AdaRank [39], RankBoost [18], and ListNet [5]. For comparison, we follow the standard setting: each dataset is divided into 3 parts: 60% for training, 20% for validation to select the best parameters, and 20% for testing. We use the training data to learn the ranking model by the proposed SO-LAR algorithms, the validation data to select the best parameters, and use the test data to evaluate performance. For SOLAR-I, we choose the best parameter C from [10 −3.5 , 10 −6.5 ] via grid search on the validation set; and similarly for SOLAR-II, we choose the best parameter γ from [10 3 , 10 6 ]. Following [31], we adopt 5 division versions of all the datasets, and report the average performance. The results are shown in Table 3, where the best performances were bolded 2 . Several observations can drawn from the results.  First of all, we found that no single algorithm beats all the others on all the datasets. Second, on all the datasets, we found that the SOLAR algorithms are generally achieve comparable to the state-of-the-art batch algorithms. On some datasets, e.g., "MQ2008", "MQ2007" "HP2003", "TD2003", the proposed online algorithms can even achieve best performances in terms of MAP. This encouraging result proves the efficacy of the proposed algorithms as an efficient and scalable online solution to train ranking models. Second, among the two proposed online algorithms, SOLAR-II still outperforms SOLAR-I in most cases, which again shows the importance of exploiting second-order information.

Scalability Evaluation
This experiment aims to examine the scalability of the proposed SOLAR algorithms. We com-  pare it with RankSVM [20], a widely used and efficient batch algorithm. For implementation, we adopt the code from [9] 3 , which is known to be the fastest implementation. Figure 3 illustrates the scalability evaluation on "2008MQ" dataset. From the results, we observe that SOLAR is much faster (e.g., 100+ times faster on this dataset)and significantly more scalable than RankSVM.

Conclusions and Future Work
This paper presented SOLAR -a new framework of Scalable Online Learning Algorithms for Ranking. SOLAR overcomes the limitations of traditional batch learning to rank for real-world online applications. Our empirical results concluded that SOLAR algorithms share competitive efficacy as the state-of-the-art batch algorithms, but enjoy salient properties which are critical to many applications. Our future work include (i) extending our techniques to the framework of listwise learning to rank; (ii) modifying the framework to handle learning to ranking with ties; and (iii) conducting more in-depth analysis and comparisons to other types of online learning to rank algorithms in diverse settings, e.g., partial feedback [41,22].