Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
Distance-preserving
dimensionality reduction
Li Yang∗
This paper presents an overview of basic concepts and principles that deal with the
problem of mapping high-dimensional data to low-dimensional space such that
distances between all or some pairs of data points are preserved. It introduces re-
lated techniques and systematizes today’s methods into linear methods, methods
using iterative optimization, methods preserving exact distances, methods using
geodesic distances, and methods using alignments of local models. It discusses
these methods by focusing on their basic ideas, by summarizing their common
features and differences, and by comparing their strengths and weaknesses. This
paper assumes no familiarity with dimensionality reduction. The main text should
be readable by people with little technical background. Technical information of
important algorithms is briefly presented in sidebars, the reading of which as-
sumes basics in statistics and matrix computation. C 2011 John Wiley & Sons, Inc. WIREs
Data Mining Knowl Discov 2011 1 369–380 DOI: 10.1002/widm.39
is principal component analysis (PCA),1 which maps BOX 1: THE INPUT DATA
data to a lower dimensional space such that vari-
ance of the data is maximized. Nonlinear methods We are given n data observations {x1 , . . . , xn }, each being
often have the explicit goal of preserving distances. a column vector xi = [xi 1 , . . . , xip ]T , representing a point
One commonly used idea is to construct a cost func- in p-dimensional Euclidean space. The data set has a mean
tion, which measures difference between the distances μ = E (xi ) and a covariance matrix p× p = E {(xi −
in the input space and the corresponding distances μ)(xi − μ)T }. For easy representation, we denote the input
in the output space, and project the data to a low- data in a matrix form X n× p = [xi , . . . , xn ]T . Assume the
dimensional space by minimizing the cost function. data is centered, that is, μ = 0, the covariance matrix can
Recently, research on manifold learning for dimen- be expressed as = 1n X T X. A popular measure of dis-
sionality reduction has become active. One way to tance between xi and xj is the Minkowski distance, which
understand this is to assume that the data are lo- is defined as dij = (|xi 1 − x j 1 |c + · · · + |xip − xjp |c )1/c ,
cated on a manifold in high-dimensional space and where c is a constant. When c = 2, it is the usual Euclidean
geodesic distance along the manifold provides bet- distance. When c = 1, it is called Manhattan distance or
ter measurement of dissimilarity between pairs of the city distance.
data points than Euclidean distances. Using the anal-
ogy of a folded newspaper, the paper needs to be
unfolded to reveal its contents. linear method is characterized by a projection ma-
The above constitutes major ideas used for trix that linearly transforms input data points to a
distance-preserving dimensionality reduction. This low-dimensional space. As an example, PCA has the
paper introduces related techniques and systematizes distinction of being the optimal linear transforma-
major methods using these ideas. Throughout the tion such that projected data in the subspace have the
paper, we assume that all dimensions (variables, fea- largest variances. From a distance-preservation per-
tures) are numerical and do not consider the practical spective, it offers the best linear mapping such that
issue of how to define distance measures for categor- the sum of changes of squared distances is minimized.
ical variables and how to reduce the dimensionality Nonlinear methods are often derived either through
of data with mixed categorical and numerical vari- extensions of linear methods or by minimizing cost
ables. In addition to using data coordinates as input, functions that are defined explicitly in terms of change
algorithms should also be able to directly accept dis- of distances. These methods often try to preserve all
tances as input, in which case all we need to do is to distances and usually end with the case where no dis-
find a data configuration in Euclidean space with a tance is exactly preserved. Nonlinear methods are also
fixed dimensionality such that the input distances are developed to explicitly retain some exact distances.
best preserved. These algorithms are often in dual-
ity with techniques using data coordinates as input
and are also considered as dimensionality reduction Linear Methods
techniques. They have applications in areas such as Linear methods include PCA, classical multidimen-
psychometrics and perceptual mapping where the col- sional scaling (classical MDS)2 that take interpoint
lected data are interpoint distances. distances as inputs and are equivalent to PCA, higher-
order generalizations such as independent compo-
nent analysis (ICA),3 and random projection where
the projection matrix is randomly chosen. In various
TRADITIONAL METHODS fields, PCA is also known as Karhunen–Loève trans-
The two commonly used strategies for dimension- form or Hotelling transform.
ality reduction are feature selection and feature ex- PCA offers an orthogonal linear transformation
traction. Feature selection reduces the dimensionality that projects the data to a low-dimensional space.
by selecting a subset of dimensions. Clearly, distance It seeks to reduce the dimensionality of the data by
preservation is not an objective of feature selection. finding a few orthogonal linear combinations of the
Feature extraction goes a step further by extracting original dimensions such that the greatest variance
new dimensions from input data based on transfor- by any projection of the data comes to lie on the
mations or combinations of the original dimensions. first coordinate, the second greatest variance on the
Distance-preserving methods for dimensionality re- second coordinate, and so on, as shown in Figure 1.
duction are feature extraction methods. Traditional PCA achieves this through eigenvalue decomposition
distance-preserving methods can be classified into two of the covariance matrix. For many data sets, a few
categories: linear methods and nonlinear methods. A principal coordinates would explain most of the data
370
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011
WIREs Data Mining and Knowledge Discovery Distance-preserving dimensionality reduction
10
ality is high, however, vectors having random direc-
tions might be sufficiently close to being orthogonal
with each other and a random projection would well
5
approximate an orthogonal projection.
y
1
y2
x2 0
Kernel PCA
Kernel principal component analysis (kernel PCA)6 is
a nonlinear extension of PCA using a technique called
−5
kernel method. It is equivalent to mapping the data to
a very high (up to infinite) dimensional space, namely,
reproducing kernel Hilbert space (RKHS), and apply-
−10
−15 −10 −5 0
x1
5 10 15
ing PCA in the RKHS. According to Mercer’s The-
orem, the inner product of two mapped points can
F I G U R E 1 | Principal components of a Gaussian distribution. be expressed as a kernel function of the two points.
Therefore, any algorithm that solely depends on the
variances so that the rest of the coordinates can be inner products between pairs of input vectors can
disregarded with minimal information loss. be transformed to a kernelized version by using a
Classical MDS is a technique that takes inter- kernel function to replace the otherwise intractable
point distances as input and derives data coordinates inner product operation. The kernelized version is
through eigenvalue decomposition of the inner prod- equivalent to the algorithm operating in an RKHS.
uct matrix (Gram matrix) of all data points. Because Because the kernel function is used, however, map-
distances are invariant to translations, we assume that ping to the RKHS is never explicitly computed. As we
the data set is centered at the origin in order to de- have discussed, classical MDS is such an algorithm
rive a unique inner product matrix. The inner product that involves only inner products of the input data.
matrix is obtained from the input distances through By replacing the inner product with a kernel oper-
a bidirectional centering process. Classical MDS is ation, classical MDS is extended to a nonlinear ver-
equivalent to PCA. In fact, when Euclidean distances sion, which effectively performs PCA after a nonlinear
are used as input distances, PCA and classical MDS mapping of the input data to the RKHS. Because of
produce identical results. the nonlinear mapping, distance preservation is not an
Being based on eigenvalue decomposition of the objective of kernel PCA although PCA offers distance
covariance matrix, PCA is a second-order method preservation in the RKHS.
which projects the data to uncorrelated dimensions.
One step further, ICA4 has the goal to produce sta-
tistically independent dimensions. This is a stronger Distance Preservation by Iterative
requirement involving higher-order statistics. In order Optimization
for the problem to have a solution, the input data set Being linear methods, PCA and classical MDS pre-
has to be indeed generated by a linear combination of serve distances while reducing data dimensionality
independent components. In practice, ICA is usually under the assumption that the data are distributed on
implemented as an extra unitary transformation after or close to a hyperplane in high-dimensional space.
the application of PCA. Such an assumption is usually too restrictive in many
Random projection5 is a simple and computa- applications. Therefore, methods have been devel-
tionally efficient approximation to the above linear oped to generate nonlinear maps with the explicit
methods. In random projection, the linear transfor- objective of distance preservation.
mation is randomly chosen. For distance preservation, Many of these methods belong to an area called
we can constrain the projection matrix such that its multidimensional scaling. Multidimensional scaling
columns have unit lengths. The key idea of random originated from the earlier study in psychometrics
projection comes from the Johnson–Lindenstrauss where researchers were interested in giving quantita-
Lemma: if data points in a vector space are pro- tive scores to subjects and dimensionalities to abstract
jected onto a randomly selected subspace of suitable concepts. It searches for a configuration of data points
high dimension, the distances between the points are in a low-dimensional space such that the Euclidean
approximately preserved. Strictly speaking, random distances between the points in the space match the
projection is not a projection because the projection original dissimilarities as much as possible. Thus it is
matrix may not be orthogonal. When the dimension- most often used to project data when only interpoint
dissimilarities are available. When the data coordi- purpose of distance preservation, the error measure
nates are available, multidimensional scaling can also is usually defined as a weighted sum of differences
be used by calculating interpoint distances and by between distances in the input space and the corre-
creating a low-dimensional configuration of points to sponding distances in the output space. Because of
best preserve the distances. the complexity of the error measure, there is usually
To minimize distance changes, a simple idea is to no closed form solution to the minimization problem.
define an error measure of distance changes and then A method using this idea usually employs an iterative
design a procedure to minimize the measure. For the procedure to minimize the error measure. With regard
372
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011
WIREs Data Mining and Knowledge Discovery Distance-preserving dimensionality reduction
374
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011
WIREs Data Mining and Knowledge Discovery Distance-preserving dimensionality reduction
two points are neighbors if the distance between them to choose a proper value of k or ε becomes a diffi-
is smaller than a user-defined threshold ε. cult problem. If the parameter were chosen too small,
For illustration, Figure 4(a) shows a synthetic the neighborhood graph would be disconnected. If it
data of 1000 points randomly distributed on a 4 × were chosen too large, a so-called ‘short-circuit’ prob-
1 rectangle, which is then wrapped into a three- lem would occur and the constructed neighborhood
dimensional Swiss roll. Figure 4(b) shows its 5-NN graph would not follow the manifold. This is shown
(k = 5) neighborhood graph superimposed on the in Figure 4(b) where the points A and B may be di-
data. Figure 4(c) displays the neighborhood graph rectly connected by an edge if the parameter is chosen
unwrapped in two-dimensional. Figures 4(b) and (c) too large.
illustrate the shortest path between an example pair In principle, graph distances approach the cor-
of data points A and B. Geodesic distance between responding geodesic distances as the number of data
A and B is estimated as the length of the path. Ap- points increases. Data projection using geodesic dis-
plying distance-preserving methods (such as classical tances is guaranteed to converge asymptotically to the
MDS) to the estimated geodesic distances will unfold true intrinsic structure of the data. In practice, how-
the data set. ever, neighborhood graph may not offer good estima-
k-NN and ε-neighbor have problems in con- tion of a geodesic distance, especially a short one that
structing neighborhood graphs for geodesic distance spans a few edges on the neighborhood graph. For
estimation. Neither approach guarantees the connec- example, the constructed neighborhood graph shown
tivity of the constructed neighborhood graph, espe- in Figure 4(c) contains many holes and looks like a
cially when the data are not evenly distributed. How Swiss cheese. As illustrated by a path between points
(a) (b)
B
C D
(c)
F I G U R E 4 | (a) A Swiss-roll data set, (b) its 5-NN neighborhood graph, and (c) its two-dimensional projection.
C and D in the figure, geodesic distance between a then be defined for each data point on the manifold, in
pair of points on opposite sides of a hole is over- which all points resume the behavior in q-dimensional
estimated by length of the path along the bank of Euclidean space. Therefore, q-dimensional local mod-
the hole. This means that the calculated graph dis- els can be derived at each neighborhood through lin-
tances, especially the short ones each of which spans ear methods. On the global scale, we are interested
a few edges on the graph, may not be good estimates in data projection through an alignment of the local
of the corresponding geodesic distances. This may ex- models. The local models are coordinated and aligned
plain the different behavior of Isomap, GeoNLM, and by affine transformations, one for each model from
CDA. The underlying methods, NLM and CCA, used its own variable space to a single global coordinate
in GeoNLM and CDA emphasize the preservation system. If the local models preserve distances, we ex-
of short distances, some of which may be overesti- pect that the resulting global configuration preserves
mated by detoured paths on a neighborhood graph local distances. The alignment is inherently an opti-
with holes. mization problem to minimize a pre-defined global
alignment error. An exciting result is that, as long as
the error can be expressed in a quadratic form of the
resulting coordinates, it can be minimized by solving a
ALIGNMENT APPROACHES
sparse eigenvalue problem subject to constraints that
Another way for dimensionality reduction of data dis- make the problem well posed. Results of the align-
tributed on a q-dimensional manifold is through di- ment are globally optimized and there is no problem
rect alignment of local linear models. In principle, a with local optima. The underlying manifold may be
q-dimensional smooth manifold locally resembles a highly nonlinear although both the local models and
q-dimensional Euclidean space. A neighborhood can the alignment process are linear.
376
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011
WIREs Data Mining and Knowledge Discovery Distance-preserving dimensionality reduction
BOX 5: LOCALLY LINEAR EMBEDDING essential information (such as distances between data
observations) and performs better than the original
Although LLE does not preserve distances, we present its data in further processing. We summarize the major
basic idea since it inspired other alignment methods with approaches used for distance-preserving dimensional-
isometric local models. LLE uses local linearity of the mani- ity reduction as the following:
fold to find a linear representation of each point in terms of
its neighbors. Let W = {wij } be a matrix of weights where
• A simple approach is linear mapping, which
each wij summarizes the contribution of xj to xi ’s recon-
is characterized by a projection matrix that
struction. The first step of LLE is to find W to minimize the
linearly transforms high-dimensional points
reconstruction error i (xi − j wij x j )2 with constraints
to low-dimensional space. Example meth-
j wij = 1, wii = 0 and wij = 0 if xj is not within the
ods are PCA and classical MDS. The pro-
neighborhood of xi . Please note that each column of W is
jection matrix is designed so that variations
decoupled and can be minimized independently. The sec-
of the result data points are maximized. It
ond step of LLE finds a new set of points {y1 , . . . , yn } in
offers the optimal linear mapping for dis-
q-dimensional space such that the global alignment error
tance preservation in the sense that the sum
i (yi − j wij y j )2 is minimized. Both error measures have
of changes of squared distances is minimized.
the similar form, but variables in the first error measure are
In addition, random projection offers a sim-
wij ’s and variables in the second error measure are yi ’s.
ple and efficient approach for linear mapping.
The second error measure defines a quadratic form of yi ’s.
Random projection can project a small set
Subject to constraints that make the problem well posed,
of high-dimensional data points to a low-
it can be minimized by solving a sparse n × n eigenvector
dimensional space in such a way that dis-
problem, whose smallest q nonzero eigenvectors provide
tances between data points are approximately
an ordered set of orthogonal coordinates centered at the
preserved.
origin. It should be noted, similar to PCA and classical MDS,
that a solution can be found for all values of q simultane- • Methods can be developed to explicitly mini-
ously: the best one-dimensional projection is simply the first mize the change of distances before and after
coordinate of the best two-dimensional projection, and so the projection. Usually, the minimization of
on. Other methods extend LLE with different local models a cost function has no closed-form solution.
and alignment functions. Therefore, the cost function is iteratively min-
imized. Example methods include Kruskal’s
MDS, NLM, and CCA. A problem with iter-
ative minimization is that it may terminate at
a data point from other data points in its neighbor- a local minimum.
hood. LMDS takes advantage of this observation by • Many methods take interpoint distances as
deriving and aligning local models on a minimum set input. In addition to the usual Euclidean dis-
of overlapping neighborhoods. The set of overlap- tances, the input distances can be geodesic
ping neighborhoods can be obtained through a greedy distances between data points along the man-
approximation algorithm for the classical minimum ifold. The geodesic distance between a pair
set cover problem. Consequently, this makes LMDS of data points is estimated by the length of
scalable to the number of overlapping neighborhoods the shortest path between the pair of points
instead of the number of data points. Another ma- on a neighborhood graph, which connects ev-
jor difference between LTSA and LMDS associates ery data point to its neighbor data points.
with their requirements on input data. Unlike PCA, Example methods include Isomap, GeoNLM,
which requires input data coordinates, classical MDS and CCA. Isomap applies classical MDS,
requires only local distances as input. This makes GeoNLM applies NLM, and CDA applies
LMDS applicable to applications where each entity CCA to the estimated geodesic distances.
knows nothing beyond its neighborhoods and there
• A manifold locally resembles a Euclidean
are demands to derive global information from local
space. Therefore, linear models can be built
neighborhoods.
locally within local neighborhoods and are
then aligned on a global scale in the target
SUMMARY AND COMPARISON
space. The alignment process is to minimize
The objective of dimensionality reduction is to project an error, which is usually defined as hav-
a set of high-dimensional data observations to a low- ing a quadratic form to the resulting coor-
dimensional space such that the projection preserves dinates and can be elegantly minimized using
378
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011
WIREs Data Mining and Knowledge Discovery Distance-preserving dimensionality reduction
an eigensolver. These methods include LLE by Isomap, is possible only when the manifold is flat.
and its derivations using various local models A simple counterexample is data distributed on the
and alignment mechanisms. If the local mod- surface of a sphere, which cannot be projected to two-
els preserve distances, for example, in LTSA dimensional space with global distances preserved.
and LMDS, the result of global alignment of- The success of both Isomap-like and LLE-like
fers local isometry. methods depends on how to assign a neighborhood
• There are also other distance-preserving to each data point. The size of the neighborhood is a
methods that do not fit into the above frame- compromise: it must be large enough to build a con-
work. These include the triangular mapping nected neighborhood graph for Isomap or to allow
method and the tetrahedral mapping method good construction of local models for model align-
that preserve some exact distances. ment, but small enough for the data manifold on the
neighborhood to have little or no curvature. Several
approaches have been proposed22 to overcome these
The linear methods (PCA and classical MDS), problems and build connected neighborhood graphs.
Isomap (which uses classical MDS), and the methods The difficulty of choosing a proper neighborhood size
using local model alignments share an important fea- and the contradiction between the preservation of
ture that they provide closed-form solutions by solv- global and local distances reflect a fundamental prob-
ing eigenvalue problems of specially constructed ma- lem: what is global and what is local, a persistent
trices. This feature comes from the simple fact that problem in pattern analysis and machine learning that
their error measures are defined as quadratic forms of may not be easily answered without domain-specific
the unknown data configurations, the minimization of knowledge.
which can be solved using an eigensolver. These meth-
ods considered together are called spectral methods.
Their computations are based on tractable optimiza-
CONCLUSION
tion techniques such as shortest paths, least squares
fits, and matrix eigenvalue decomposition. The solu- In this paper, we have introduced major ideas and
tions they give are globally optimized. In contrast, important techniques for distance-preserving dimen-
this situation changes dramatically if the error mea- sionality reduction. These techniques and algorithms
sure cannot be expressed in a quadratic form, in which are classified into linear methods, methods using iter-
case we have to appeal to iterative procedures to min- ative optimization, methods using geodesic distances,
imize the error measure. Indeed, the power of mathe- and methods through alignments of local models. We
matical tool at our disposal decides the scope of what discussed these methods by focusing on their basic
we can do in this and many other research areas. ideas and by summarizing their common features and
Dimensionality reduction using geodesic dis- differences.
tances and dimensionality reduction using model Dimensionality reduction using geodesic dis-
alignments represent active areas of research. In both tances and through model alignment represents ac-
areas, methods take bottom-up approaches by assign- tive areas of research. In both areas, the assignment
ing a local neighborhood to each data point and com- of local neighborhoods is of ultimate importance to
bining local information to obtain global data config- the success of dimensionality reduction. There exists
uration. However, they are fundamentally different in a risk of overfitting or disconnected neighborhood
terms of the distances they preserve. By applying clas- graph if the neighborhood is too small and a risk of
sical MDS to the estimated geodesic distance between mismatch of local geometry if the neighborhood is
every pair of data points, Isomap attempts to offer too large. Therefore, one direct question is how to
global isometry. As we discussed, however, Isomap make the neighborhood sizes adaptive to local cur-
does not work well at preserving local distances. On vatures and data densities. In general, studying data
the other hand, variations of LLE, such as LTSA and transformation and manipulation on manifold opens
LMDS, try to maintain local isometry by applying a new arena of research for data processing, which
distance-preserving local models within each neigh- invites ideas and theories from differential geometry.
borhood at the cost of giving up global isometry. In From an algorithmic point of view, other interesting
recent years, there are attempts, such as maximum topics include bidirectional mapping and incremental
variance unfolding,21 that try to maximize the overall mapping of new data points. In fact, there exists incre-
variance of the embedding while preserving distances mental extension23 of Isomap. We expect to see more
within local neighborhoods. In fact, a data projection extensions and improvements along these directions
preserving global distances, such as the one attempted in the near future.
REFERENCES
1. Jolliffe I. Principal Component Analysis. 2nd ed. New 14. Yang L. Sammon’s nonlinear mapping using geodesic
York: Springer; 2002. distances. In: Proceedings of the 17th International
2. Cox T, Cox M. Multidimensional Scaling. 2nd ed. Conference on Pattern Recognition (ICPR’04). Vol.
London: Chapman and Hall/CRC; 2000. 2. IEEE Computer Society, Cambridge; 2004.
3. Stone JV. Independent Component Analysis: A Tuto- 15. Lee JA, Lendasse A, Donckers N, Verleysen M. A ro-
rial Introduction. Cambridge: The MIT Press; 2004. bust nonlinear projection method. In: Proceedings of
the 8th European Symposium on Artificial Neural Net-
4. Hyvärinen A, Karhunen J, Oja E. Independent Com-
works (ESANN2000) Bruges; 2000, 13–20.
ponent Analysis. New York: Wiley-Interscience; 2001.
16. Roweis ST, Saul LK. Nonlineaar dimensionality reduc-
5. Bingham E, Mannila H. Random projection in dimens-
tion by locally linear embedding. Science 2000, 290:
ionality reduction: applications to image and text data.
2323–2326.
In: Proceedings of the 7th ACM SIGKDD international
conference on Knowledge discovery and data mining 17. Belkin M, Niyogi P. Laplacian eigenmaps for dimen-
(KDD ’01). New York: ACM Press; 2001, 245–250. sionality reduction and data representation. Neural
Comput 2003, 15:1373–1396.
6. Schölkopf B, Smola A, Müller K-R. Nonlinear com-
ponent analysis as a kernel eigenvalue problem. 1998, 18. Donoho DL, Grimes C. Hessian eigenmaps: locally lin-
10:1299–1319. ear embedding techniques for high-dimensional data.
Proc Natl Acad Sci 2003, 100:5591–5596.
7. Kruskal JB, Wish M. Multidimensional Scaling. Beverly
Hills, CA: Sage Publications; 1978. 19. Zhang Z, Zha H. Principal manifolds and nonlinear
dimensionality reduction via tangent space alignment.
8. Sammon JJ. A nonlinear mapping for data structure
SIAM J Sci Comput 2005, 26:313–338.
analysis. IEEE Trans Comput 1969, C-18:401–409.
20. Yang L. Alignment of overlapping locally scaled
9. Demartines P, Herault J. Curvilinear component anal-
patches for multidimensional scaling and dimensional-
ysis: a self-organizing neural network for nonlinear
ity reduction. IEEE Tran Pattern Anal Machine Intell
mapping of data sets. IEEE Trans Neural Netw 1997,
2008, 30: 438–450.
8:148–154.
21. Weinberger KQ, Sha F, Saul LK. Learning a kernel
10. Niemann H, Weiss J. A fast converging algorithm for
matrix for nonlinear dimensionality reduction. In: Pro-
nonlinear mapping of high-dimensional data onto a
ceedings of the 21st International Conference on Ma-
plane. IEEE Trans Comput 1979, C-28:142–147.
chine Learning (ICML-04). Banff: ACM Press; 2004,
11. Lee RC, Slagle JR, Blum H. A triangulation method 839–846.
for the sequential mapping of points from N-space to
22. Yang L. Building connected neighborhood graphs for
two-space. IEEE Trans Comput 1977, 26:288–292.
isometric data embedding. In: Proceedings of the 11th
12. Yang L. Distance-preserving projection of high- ACM SIGKDD International Conference on Knowl-
dimensional data for nonlinear dimensionality reduc- edge Discovery and Data Mining (KDD’2005). ACM
tion. IEEE Trans Pattern Anal Machine Intell 2004, Press, Chicago, IL; 2005.
26: 1243–1246. 23. Law MH, Jain AK. Incremental nonlinear dimen-
13. Tenenbaum JB, Silva Vd, Langford JC. A global geo- sionality reduction by manifold learning. IEEE
metric framework for nonlinear dimensionality reduc- Trans Pattern Anal Machine Intell 2006, 28:377–
tion. Science 2000, 290:2319–2323. 391.
FURTHER READING
Borg I, Groenen P. Modern Multidimensional Scaling: Theory and Applications. 2nd ed. New York: Springer; 2005.
Gorban AN, Kégl B, Wunsch DC, Zinovyev A, eds. Principal Manifolds for Data Visualization and Dimension Reduction.
Berlin: Springer; 2007.
Lee JA, Verleysen M. Nonlinear Dimensionality Reduction. New York: Springer; 2007.
380
c 2011 John Wiley & Sons, Inc. Volume 1, September/October 2011