Survey on distance metric learning and dimensionality reduction in data mining

Fei Wang¹ &
Jimeng Sun²

5473 Accesses
123 Citations
1 Altmetric
Explore all metrics

Abstract

Distance metric learning is a fundamental problem in data mining and knowledge discovery. Many representative data mining algorithms, such as $k$-nearest neighbor classifier, hierarchical clustering and spectral clustering, heavily rely on the underlying distance metric for correctly measuring relations among input data. In recent years, many studies have demonstrated, either theoretically or empirically, that learning a good distance metric can greatly improve the performance of classification, clustering and retrieval tasks. In this survey, we overview existing distance metric learning approaches according to a common framework. Specifically, depending on the available supervision information during the distance metric learning process, we categorize each distance metric learning algorithm as supervised, unsupervised or semi-supervised. We compare those different types of metric learning methods, point out their strength and limitations. Finally, we summarize open challenges in distance metric learning and propose future directions for distance metric learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

http://en.wikipedia.org/wiki/Metric_(mathematics).
http://en.wikipedia.org/wiki/Mahalanobis_distance
http://en.wikipedia.org/wiki/Covariance_matrix
http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix.
In this paper two data vectors are considered to be similar if the Euclidean distance between them is small, two data tensors are considered to be similar if the Frobenius norm of their difference tensor is small.
http://en.wikipedia.org/wiki/Hinge_loss

References

Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965
MATH MathSciNet Google Scholar
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14:585–591
Google Scholar
Bengio Y, Paiement J-F, Vincent P, Delalleau O, Le Roux N, Ouimet M (2004) Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. In: Advances in neural information processing systems, vol 16, pp 177–184
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, Berlin, pp 11–18
Censor Y (1997) Parallel optimization: theory, algorithms, and applications. Oxford University Press, New York
MATH Google Scholar
Cox TF, Cox MAA (2000) Multidimensional scaling, 2nd edn. Chapman and Hall/CRC, Boca Raton
Google Scholar
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Google Scholar
Dasgupta S, Langford J (2009) A tutorial on active learning. In: International conference on machine learning
Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126
Davis JV, Kulis B, Jain P, Sra Suvrit, Dhillon IS (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML), pp 209–216
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97
Article MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2, 2nd edn. Wiley, New York
MATH Google Scholar
Elkan C (2011) Bilinear models of affinity. Personal note
Fukunaga K (1990) Introduction to statistical pattern recognition, second edition (computer science and scientific computing series), 2nd edn. Academic Press, Boston
Google Scholar
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighborhood component analysis. In: Advances in neural information processing systems (NIPS)
Guo Y, Li S, Yang J, Shu T, Wu L (2003) A generalized foleysammon transform based on generalized fisher discriminant criterion and its application to face recognition. Pattern Recognit Lett 24(1–3):147–158
Article MATH Google Scholar
He J, Li M, Zhang HJ, Tong H, Zhang C (2006) Generalized manifold-ranking-based image retrieval. IEEE Trans Image Process 15(10):3170–3177
Article Google Scholar
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems (NIPS), vol 16, pp 234–241
Hinton GE, Roweis ST (2002) Stochastic neighbor embedding. In: Advances in neural information processing systems (NIPS), pp 833–840
Hoi Steven CH, Liu W, Chang S-F (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proceedings of IEEE Computer Society conference on computer vision and pattern recognition
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River
MATH Google Scholar
Jia Y, Nie F, Zhang C (2009) Trace ratio problem revisited. IEEE Trans Neural Netw 20(4):729–735
Article Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
MATH Google Scholar
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233
Article MATH Google Scholar
Kocsor A, Kovács K, Szepesvári C (2004) Margin maximizing discriminant analysis. In: Proceedings of European conference on machine learning, vol 3201 of Lecture notes in computer science. Springer, Berlin, pp 227–238
Kulis B (2010) Metric learning. In: Tutorial at International conference on machine learning
Kulis Brian (2012) Metric learning: a survey. Found Trends Mach Learn 5(4):287–364
Article MATH MathSciNet Google Scholar
Li Z, Cao L, Chang S, Smith JR, Huang TS (2012) Beyond mahalanobis distance: Learning second-order discriminant function for people verification. In: Prcoeedings of computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on workshops, pp 45–50
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Mika S, Ratsch G, Weston J, Schölkopf B, Müllers KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. proceedings of the 1999 IEEE signal processing society workshop, pp 41–48
Modi JJ (1989) Parallel algorithms and matrix computation. Oxford University Press, Inc
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359
Article Google Scholar
Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: Computer vision ECCV 2010, volume 6312 of lecture notes in computer science, chapt 54. Springer, Berlin, pp 749–762
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
Google Scholar
Schultz M, Joachims T (2004) Learning a distance metric from relative comparisons. In: Advances in neural information processing systems (NIPS), vol 16, pp 41–48
Shalev-Shwartz S (2007, July) Online learning: theory, algorithms, and applications. The Hebrew University of Jerusalem. Ph.D. Thesis
Shalev-Shwartz S, Singer Y, Ng AY (2004) Online and batch learning of pseudo-metrics. In: Proceedings of international conference on machine learning, pp 94–101
Shental N, Hertz T, Weinshall D, Pavel M (2002) Adjustment learning and relevant component analysis. In: Proceedings of European conference on computer vision, pp 776–790
Singh A, Nowak RD, Zhu X (2008) Unlabeled data: now it helps, now it doesn’t. In: Advances in neural information processing systems, pp 1513–1520
Sun J, Sow D, Hu J, Ebadollahi S (2010) Localized supervised metric learning on temporal physiological data. In: International conference on pattern recognition (ICPR)
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Tsang IW, Cheung PM, Kwok JT (2005) Kernel relevant component analysis for distance metric learning. In: In IEEE International joint conference on neural networks (IJCNN), pp 954–959
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Wang F, Chen S, Zhang C, Li T (2008) Semi-supervised metric learning by maximizing constraint margin. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1457–1458
Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in patient similarity assessment. In: SIAM data mining conference (SDM), pp 59–70
Wang F, Sun J, Hu J, Ebadollahi S (2011) Imet: interactive metric learning in healthcare applications. In: SIAM data mining conference (SDM), pp 944–955
Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)
Wang F, Zhao B, Zhang C (2011) Unsupervised large margin discriminative projection. IEEE Trans Neural Netw 22(9):1446–1456
Article Google Scholar
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Werman M, Pele O, Kulis B (2010) Distance functions and metric learning. In: Tutorial at European conference on computer vision
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems (NIPS), vol 15, pp 505–512
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University
Yang L, Jin R, Sukthankar R (2007) Bayesian active distance metric learning. In: Proceedings of uncertainties in artificial intelligence, AUAI Press, Corvallis, pp 442–449
Yang X, Fu H, Zha H, Barlow J (2006) Semi-supervised nonlinear dimensionality reduction. In: 23rd International conference on machine learning, pp 1065–1072
Zhang Y, Yeung D-Y (2010) Transfer metric learning by learning task relationships. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 1199–1208

Download references

Author information

Authors and Affiliations

Healthcare Analytics Research Group, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Fei Wang
School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Jimeng Sun

Authors

Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jimeng Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wang.

Additional information

Responsible editor: Ian Davidson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Sun, J. Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Disc 29, 534–564 (2015). https://doi.org/10.1007/s10618-014-0356-z

Download citation

Received: 06 November 2012
Accepted: 24 May 2014
Published: 27 June 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10618-014-0356-z

Survey on distance metric learning and dimensionality reduction in data mining

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reduced-Rank Local Distance Metric Learning

Mutual k-Nearest Neighbor Graph for Data Analysis: Application to Metric Space Clustering

Metric Learning in Dissimilarity Space for Improved Nearest Neighbor Performance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Survey on distance metric learning and dimensionality reduction in data mining

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reduced-Rank Local Distance Metric Learning

Mutual k-Nearest Neighbor Graph for Data Analysis: Application to Metric Space Clustering

Metric Learning in Dissimilarity Space for Improved Nearest Neighbor Performance

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation