SVM_pAUC-tight

A New Support Vector Method for Optimizing Partial AUC Based on a Tight Convex Upper Bound

Harikrishna Narasimhan
Shivani Agarwal

Machine Learning and Learning Theory Group
Department of Computer Science and Automation
Indian Insititute of Science, Bangalore

A paper on this method was presented at KDD 2013: [paper].
This method builds upon an earlier method that appeared in ICML 2013: [paper]

Upload Date: 19/09/13

Abstract

The area under the ROC curve (AUC) is a well known performance measure in machine learning and data mining. In an increasing number of applications, however, ranging from ranking applications to a variety of important bioinformatics applications, performance is measured in terms of the partial area under the ROC curve between two specified false positive rates. In recent work, we proposed a structural SVM based approach for optimizing this performance measure (Narasimhan and Agarwal, 2013). In this paper, we develop a new support vector method, SVM_pAUC-tight, that optimizes a tighter convex upper bound on the partial AUC loss, which leads to both improved accuracy and reduced computational complexity. We demonstrate on a wide variety of bioinformatics tasks, ranging from protein-protein interaction prediction to drug discovery tasks, that the proposed method does, in many cases, perform significantly better on the partial AUC measure than the previous structural SVM approach.

Implementation

A MATLAB implementation of the proposed method (termed SVMpAUC-tight) for optimizing partial AUC in an FPR interval [alpha, beta] can be downloaded here. This implementation works with a linear kernel. We will soon provide an implementation that can work with non-linear kernels. A MATLAB implementation of our earlier method (SVMpAUC) can be downloaded from here.

Usage: The svmpauc_tight function provided contains the implementation of the proposed method and can be called from the MATLAB terminal as follows:

w = svmpauc_tight(X, y, alpha, beta, C, eps, verbosity);

where

Input:

X is a n x d matrix containing 'n' training examples of dimension 'd'
y is a n x 1 matrix containing labels (-1/+1) for the 'n' training examples
[alpha, beta] is a user-specified false positive rate (FPR) interval over which the algorithm optimizes partial AUC.
C is the regularization paramter that controls the trade-off between the model complexity and training error.
eps is a tolerance parameter that determines convergence of the algorithm. (default: 0.0001)
verbosity can be used to turn on output display (0 or 1). (default: 0)

Output:

w is the d x 1 weight vector learnt by the algorithm.

Examples

To be uploaded soon.

Disclaimer

This code is free to use for scientific purpose, but should not be distributed without permission from the authors. The authors are not responsible for any implications arising out of the usage of this code.

Contact

For any queries or suggestions, please feel free to contact us at Harikrishna Narasimhan .

References

Joachims, T. A support vector method for multivariate performancemeasures. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pp. 377-384, 2005.
Joachims, T. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 217-226, 2006.
Narasimhan, H. and Agarwal, S. A Structural SVM Based Approach for Optimizing Partial AUC. In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.
Narasimhan, H. and Agarwal, S. SVM_pAUC^tight: A new support vector method for optimizing partial AUC based on a tight convex upper bound. In Proceedings of the ACM SIGKDD Conference on Knowledge, Discovery and Data Mining (KDD), 2013.
Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun, Y. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453-1484, 2005.