Abstract
Distance metric learning is a fundamental problem in data mining and knowledge discovery. Many representative data mining algorithms, such as \(k\)-nearest neighbor classifier, hierarchical clustering and spectral clustering, heavily rely on the underlying distance metric for correctly measuring relations among input data. In recent years, many studies have demonstrated, either theoretically or empirically, that learning a good distance metric can greatly improve the performance of classification, clustering and retrieval tasks. In this survey, we overview existing distance metric learning approaches according to a common framework. Specifically, depending on the available supervision information during the distance metric learning process, we categorize each distance metric learning algorithm as supervised, unsupervised or semi-supervised. We compare those different types of metric learning methods, point out their strength and limitations. Finally, we summarize open challenges in distance metric learning and propose future directions for distance metric learning.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper two data vectors are considered to be similar if the Euclidean distance between them is small, two data tensors are considered to be similar if the Frobenius norm of their difference tensor is small.
References
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14:585–591
Bengio Y, Paiement J-F, Vincent P, Delalleau O, Le Roux N, Ouimet M (2004) Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. In: Advances in neural information processing systems, vol 16, pp 177–184
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, Berlin, pp 11–18
Censor Y (1997) Parallel optimization: theory, algorithms, and applications. Oxford University Press, New York
Cox TF, Cox MAA (2000) Multidimensional scaling, 2nd edn. Chapman and Hall/CRC, Boca Raton
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Dasgupta S, Langford J (2009) A tutorial on active learning. In: International conference on machine learning
Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126
Davis JV, Kulis B, Jain P, Sra Suvrit, Dhillon IS (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML), pp 209–216
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97
Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2, 2nd edn. Wiley, New York
Elkan C (2011) Bilinear models of affinity. Personal note
Fukunaga K (1990) Introduction to statistical pattern recognition, second edition (computer science and scientific computing series), 2nd edn. Academic Press, Boston
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighborhood component analysis. In: Advances in neural information processing systems (NIPS)
Guo Y, Li S, Yang J, Shu T, Wu L (2003) A generalized foleysammon transform based on generalized fisher discriminant criterion and its application to face recognition. Pattern Recognit Lett 24(1–3):147–158
He J, Li M, Zhang HJ, Tong H, Zhang C (2006) Generalized manifold-ranking-based image retrieval. IEEE Trans Image Process 15(10):3170–3177
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems (NIPS), vol 16, pp 234–241
Hinton GE, Roweis ST (2002) Stochastic neighbor embedding. In: Advances in neural information processing systems (NIPS), pp 833–840
Hoi Steven CH, Liu W, Chang S-F (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proceedings of IEEE Computer Society conference on computer vision and pattern recognition
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River
Jia Y, Nie F, Zhang C (2009) Trace ratio problem revisited. IEEE Trans Neural Netw 20(4):729–735
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233
Kocsor A, Kovács K, Szepesvári C (2004) Margin maximizing discriminant analysis. In: Proceedings of European conference on machine learning, vol 3201 of Lecture notes in computer science. Springer, Berlin, pp 227–238
Kulis B (2010) Metric learning. In: Tutorial at International conference on machine learning
Kulis Brian (2012) Metric learning: a survey. Found Trends Mach Learn 5(4):287–364
Li Z, Cao L, Chang S, Smith JR, Huang TS (2012) Beyond mahalanobis distance: Learning second-order discriminant function for people verification. In: Prcoeedings of computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on workshops, pp 45–50
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Mika S, Ratsch G, Weston J, Schölkopf B, Müllers KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. proceedings of the 1999 IEEE signal processing society workshop, pp 41–48
Modi JJ (1989) Parallel algorithms and matrix computation. Oxford University Press, Inc
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359
Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: Computer vision ECCV 2010, volume 6312 of lecture notes in computer science, chapt 54. Springer, Berlin, pp 749–762
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
Schultz M, Joachims T (2004) Learning a distance metric from relative comparisons. In: Advances in neural information processing systems (NIPS), vol 16, pp 41–48
Shalev-Shwartz S (2007, July) Online learning: theory, algorithms, and applications. The Hebrew University of Jerusalem. Ph.D. Thesis
Shalev-Shwartz S, Singer Y, Ng AY (2004) Online and batch learning of pseudo-metrics. In: Proceedings of international conference on machine learning, pp 94–101
Shental N, Hertz T, Weinshall D, Pavel M (2002) Adjustment learning and relevant component analysis. In: Proceedings of European conference on computer vision, pp 776–790
Singh A, Nowak RD, Zhu X (2008) Unlabeled data: now it helps, now it doesn’t. In: Advances in neural information processing systems, pp 1513–1520
Sun J, Sow D, Hu J, Ebadollahi S (2010) Localized supervised metric learning on temporal physiological data. In: International conference on pattern recognition (ICPR)
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tsang IW, Cheung PM, Kwok JT (2005) Kernel relevant component analysis for distance metric learning. In: In IEEE International joint conference on neural networks (IJCNN), pp 954–959
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Wang F, Chen S, Zhang C, Li T (2008) Semi-supervised metric learning by maximizing constraint margin. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1457–1458
Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in patient similarity assessment. In: SIAM data mining conference (SDM), pp 59–70
Wang F, Sun J, Hu J, Ebadollahi S (2011) Imet: interactive metric learning in healthcare applications. In: SIAM data mining conference (SDM), pp 944–955
Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)
Wang F, Zhao B, Zhang C (2011) Unsupervised large margin discriminative projection. IEEE Trans Neural Netw 22(9):1446–1456
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Werman M, Pele O, Kulis B (2010) Distance functions and metric learning. In: Tutorial at European conference on computer vision
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems (NIPS), vol 15, pp 505–512
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University
Yang L, Jin R, Sukthankar R (2007) Bayesian active distance metric learning. In: Proceedings of uncertainties in artificial intelligence, AUAI Press, Corvallis, pp 442–449
Yang X, Fu H, Zha H, Barlow J (2006) Semi-supervised nonlinear dimensionality reduction. In: 23rd International conference on machine learning, pp 1065–1072
Zhang Y, Yeung D-Y (2010) Transfer metric learning by learning task relationships. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 1199–1208
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Ian Davidson.
Rights and permissions
About this article
Cite this article
Wang, F., Sun, J. Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Disc 29, 534–564 (2015). https://doi.org/10.1007/s10618-014-0356-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0356-z