Abstract
Recently there has been a steep growth in the development of kernel-based learning algorithms. The intrinsic problem in such algorithms is the selection of the optimal kernel for the learning task of interest. In this paper, we propose an unsupervised approach to learn a linear combination of kernel functions, such that the resulting kernel best serves the objectives of the learning task. This is achieved through measuring the influence of each point on the structure of the dataset. This measure is calculated by constructing a weighted graph on which a random walk is performed. The measure of influence in the feature space is probabilistically related to the input space that yields an optimization problem to be solved. The optimization problem is formulated in two different convex settings, namely linear and semidefinite programming, dependent on the type of kernel combination considered. The contributions of this paper are twofold: first, a novel unsupervised approach to learn the kernel function, and second, a method to infer the local similarity represented by the kernel function by measuring the global influence of each point toward the structure of the dataset. The proposed approach focuses on the kernel selection which is independent of the kernel-based learning algorithm. The empirical evaluation of the proposed approach with various datasets shows the effectiveness of the algorithm in practice.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Available at: http://www.people.csail.mit.edu/jrennie/20Newsgroups/.
References
Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernal functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML ’04: proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 6. doi:10.1145/1015330.1015424
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Burges CJC (1999) Geometry and invariance in kernel based methods. pp 89–116
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30. doi:10.1016/j.acha.2006.04.006. http://www.dx.doi.org/10.1016/j.acha.2006.04.006
Cristianini N, Shawe-taylor J, Elissee A, Kandola J (2002) On kernel-target alignment. In: Advances in neural information processing systems 14. MIT Press, pp 367–373
Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning svm hyperparameters. Neurocomputing 51:41–59. doi:10.1016/S0925-2312(02)00601-X. http://www.sciencedirect.com/science/article/B6V10-4625PVR-1/2/cc81e4581eca9413c3784c75c1ba0ee1
Graf A, Smola A, Borer S (2003) Classification in a normalized feature space using support vector machines. Neural Netw, IEEE Trans 14(3):597–605. doi:10.1109/TNN.2003.811708
Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs. http://www.dx.doi.org/10.1007/978-1-84800-155-8_7
Grant M, Boyd S (2009) Cvx: Matlab software for disciplined convex programming web page and software. http://www.stanford.edu/boyd/cvx
Guo Y, Gao J, Kwan P (2008) Twin kernel embedding. Pattern Anal Mach Intell, IEEE Trans 30(8):1490–1495. doi:10.1109/TPAMI.2008.74
Herbster M, Pontil M, Wainer L (2005) Online learning over graphs. In: ICML ’05: proceedings of the 22nd international conference on Machine learning. ACM, New York, NY, USA, pp 305–312. http://www.doi.acm.org/10.1145/1102351.1102390
Kandola J, Shawe-Taylor J, Cristianini N (2002) On the extensions of kernel alignment. NC-TR-2002-120. http://www.eprints.ecs.soton.ac.uk/9745
Kandola J, Shawe-Taylor J, Cristianini N (2002) Optimizing kernel alignment over combinations of kernel. http://www.eprints.ecs.soton.ac.uk/9746/
Kim SJ, Zymnis A, Magnani A, Koh K, Boyd S (2008) Learning the kernel via convex optimization. pp 1997–2000. doi:10.1109/ICASSP.2008.4518030
Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Proceedings of the ICML, pp 315–322
Kulis B, Sustik M, Dhillon I (2006) Learning low-rank kernel matrices. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning. ACM, New York, NY, USA, pp 505–512. http://www.doi.acm.org/10.1145/1143844.1143908
Vandenberghe L, Vandenberghe SB, Boyd S (1996) Semidefinite programming. SIAM Rev 38(1):49–95. http://www.jstor.org/stable/2132974
Lafon S, Lee A (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. Pattern Anal Mach Intell, IEEE Trans 28(9):1393–1403. doi:10.1109/TPAMI.2006.184
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Marie Szafranski YG, Rakotomamonjy A (2008) Composite kernel learning. In: ICML
Meila M, Shi J (2001) A random walks view of spectral segmentation
Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6:1099–1125
Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels, pp 41–48. doi:10.1109/NNSP.1999.788121
Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. Neural Netw, IEEE Trans 12(2):181–201. doi:10.1109/72.914517
Nadler B, Lafon S, Coifman R, Kevrekidis I (2007) Diffusion maps-a probabilistic interpretation for spectral embedding and clustering algorithms. http://www.dx.doi.org/10.1007/978-3-540-73750-6_10
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: ICML ’07: Proceedings of the 24th international conference on Machine learning. ACM, New York, NY, USA, pp 775–782. http://www.doi.acm.org/10.1145/1273496.1273594
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp 10(5):1299–1319. http://neco.mitpress.org/cgi/content/abstract/10/5/1299
Shaw B, Jebara T (2009) Structure preserving embedding. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning. ACM, New York, NY, USA, pp 937–944.doi:10.1145/1553374.1553494
Smola A, Hofmann T, Scholkopf B (2007) Kernel methods in machine learning. Ann Stat 36:1171–1220. http://eprints.pascal-network.org/archive/00003984
Society FRKCAM (1997) Spectral graph theory. Regional conference series in mathmatics. American Mathematical Society, Providence, RI
Terence Sim SB, Bsat M (2001) The cmu pose, illumination, and expression (pie) database of human faces. Technical Reports CMU-RI-TR-01-02, Robotics Institute, Pittsburgh, PA
Vapnik V (1999) An overview of statistical learning theory. Neural Netw, IEEE Trans 10(5):988–999. doi:10.1109/72.788640
Weinberger KQ, Packer BD, Saul LK (2005) Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 381–388
Weinberger KQ, Saul LK (2006) An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Unfolding, Proceedings of the 21st national conference on artificial intelligence. AAAI
Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 106. doi:10.1145/1015330.1015345
Xiong H, Swamy M, Ahmad M (2005) Optimizing the kernel in the empirical feature space. Neural Netw, IEEE Trans 16(2):460–474. doi:10.1109/TNN.2004.841784
Yang MH, Ahuj N, Kriegman D (2000) Face recognition using kernel eigenfaces. In: Image Processing, 2000. Proceedings. 2000 International Conference on, vol 1, pp 37–40. doi:10.1109/ICIP.2000.900886
Zhang X, Lee WS (2006) Hyperparameter learning for graph based semi-supervised learning algorithms. In: NIPS. http://books.nips.cc/papers/files/nips19/NIPS2006_0148.pdf
Zhengdong Lu, Dhillon PJI (2009) Geometry-aware metric learning. In: ICML ’09: Proceedings of the 26nd international conference on Machine learning. ACM
Acknowledgments
This research has been made possible through the Science Fund Grant “Delineation and 3D Visualization of Tumor and Risk Structures” (DVTRS), No: 1001/PKOMP/817001 by the Ministry of Science, Technology and Innovation of Malaysia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbasnejad, M.E., Ramachandram, D. & Mandava, R. An unsupervised approach to learn the kernel functions: from global influence to local similarity. Neural Comput & Applic 20, 703–715 (2011). https://doi.org/10.1007/s00521-010-0411-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-010-0411-7