Persistence Images: A Stable Vector Representation of Persistent Homology
Persistence Images: A Stable Vector Representation of Persistent Homology
Persistence Images: A Stable Vector Representation of Persistent Homology
Abstract
Many data sets can be viewed as a noisy sampling of an underlying space, and tools from
topological data analysis can characterize this structure for the purpose of knowledge dis-
covery. One such tool is persistent homology, which provides a multiscale description of
the homological features within a data set. A useful representation of this homological
information is a persistence diagram (PD). Efforts have been made to map PDs into spaces
with additional structure valuable to machine learning tasks. We convert a PD to a finite-
dimensional vector representation which we call a persistence image (PI), and prove the
stability of this transformation with respect to small perturbations in the inputs. The
2017
c Adams, et al.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided
at http://jmlr.org/papers/v18/16-337.html.
Adams, et al.
discriminatory power of PIs is compared against existing methods, showing significant per-
formance gains. We explore the use of PIs with vector-based machine learning tools, such
as linear sparse support vector machines, which identify features containing discriminating
topological information. Finally, high accuracy inference of parameter values from the dy-
namic output of a discrete dynamical system (the linked twist map) and a partial differential
equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of
the discriminatory power of PIs.
Keywords: topological data analysis, persistent homology, persistence images, machine
learning, dynamical systems
1. Introduction
In recent years, the field of topology has grown to include a large set of computational tools
(Edelsbrunner and Harer, 2010). One of the fundamental tools is persistent homology, which
tracks how topological features appear and disappear in a nested sequence of topological
spaces (Edelsbrunner and Harer, 2008; Zomorodian and Carlsson, 2005). This multiscale
information can be represented as a persistence diagram (PD), a collection of points in the
plane where each point (x, y) corresponds to a topological feature that appears at scale
x and disappears at scale y. We say the feature has a persistence value of y x. This
compact summary of topological characteristics by finite multi-sets of points in the plane is
responsible, in part, for the surge of interest in applying persistent homology to the analysis
of complex, often high-dimensional data. Computational topology has been successfully
applied to a broad range of data-driven disciplines (Perea and Harer, 2013; Dabaghian
et al., 2012; Chung et al., 2009; Heath et al.; Singh et al., 2008; Topaz et al., 2015; Pearson
et al., 2015).
Concurrent with this revolution in computational topology, a growing general interest
in data analysis has driven advances in data mining, pattern recognition, and machine
learning (ML). Since the space of PDs can be equipped with a metric structure (bottleneck
or Wasserstein (Mileyko et al., 2011; Turner et al., 2014)), and since these metrics reveal
the stability of PDs under small perturbations of the data they summarize (Cohen-Steiner
et al., 2007, 2010; Chazal et al., 2014), it is possible to perform a variety of ML techniques
using PDs as a statistic for clustering data sets. However, many other useful ML tools
and techniques (e.g., support vector machines (SVM), decision tree classification, neural
networks, feature selection, and dimension reduction methods) require more than a metric
structure. In addition, the cost of computing the bottleneck or Wasserstein distance grows
quickly as the number of off-diagonal points in the diagrams increases (Di Fabio and Ferri,
2015). To resolve these issues, considerable effort (which we review in 2) has been made
to map PDs into spaces which are suitable for other ML tools (Bubenik, 2015; Reininghaus
et al., 2015; Rouse et al., 2015; Adcock et al., 2016; Donatini et al., 1998; Ferri et al., 1997;
Chung et al., 2009; Pachauri et al., 2011; Bendich et al., 2016; Chen et al., 2015; Carrire
et al., 2015; Di Fabio and Ferri, 2015). With the benefits and drawbacks of these approaches
in mind, we pose the following question:
Problem Statement: How can we represent a persistence diagram so that
(i) the output of the representation is a vector in Rn ,
(ii) the representation is stable with respect to input noise,
2
Persistence Images
(iv) the representation maintains an interpretable connection to the original PD, and
(v) the representation allows one to adjust the relative importance of points in different
regions of the PD?
3
Adams, et al.
2. Related Work
The space of PDs can be equipped with the bottleneck or Wasserstein metric (defined in 3),
and one reason for the popularity of PDs is that these metrics are stable with respect to small
deviations in the inputs (Cohen-Steiner et al., 2007, 2010; Chazal et al., 2014). Furthermore,
the bottleneck metric allows one to define Frchet means and variances for a collection of
PDs (Mileyko et al., 2011; Turner et al., 2014). However, the structure of a metric space
alone is insufficient for many ML techniques, and a recent area of interest in the topological
data analysis community has been encoding PDs in ways that broaden the applicability of
persistence. For example, Adcock et al. (2016) study a ring of algebraic functions on the
space of persistence diagrams, and Verovek (2016) identifies tropical coordinates on the
space of diagrams. Ferri and Landi (1999) and Di Fabio and Ferri (2015) encode a PD using
the coefficients of a complex polynomial that has the points of the PD as its roots.
Bubenik (2015) develops the notion of a persistence landscape, a stable functional rep-
resentation of a PD that lies in a Banach space. A persistence landscape (PL) is a function
: N R [, ], which can equivalently be thought of as a sequence of functions
k : R [, ]. For 1 p the p-landscape distance between two landscapes
and 0 is defined as k 0 kp ; the -landscape distance is stable with respect to the bot-
tleneck distance on PDs, and the p-landscape distance is continuous with respect to the
p-Wasserstein distance on PDs. One of the motivations for defining persistence landscapes
is that even though Frchet means of PDs are not necessarily unique (Mileyko et al., 2011),
a set of persistence landscapes does have a unique mean. Unique means are also a feature
of PIs as they are vector representations. An advantage of PLs over PIs is that the map
from a PD to a PL is easily invertible; an advantage of PIs over PLs is that PIs live in
Euclidean space and hence are amenable to a broader range of ML techniques. In 6, we
compare PDs, PLs, and PIs in a classification task on synthetic data sampled from common
topological spaces. We find that PIs behave comparably or better than PDs when using
ML techniques available to both representations, but PIs are significantly more efficient to
compute. Also, PIs outperform PLs in the majority of the classification tasks and are of
comparable computational efficiency.
A vector representation of a PD, due to Carrire et al. (2015), can be obtained by
rearranging the entries of the distance matrix between points in a PD. In their Theorem 3.2,
they prove that both the L and L2 norms between their resulting vectors are stable with
respect to the bottleneck distance on PDs. They remark that while the L norm is useful
for nearest-neighbor classifiers, the L2 norm allows for more elaborate algorithms such as
SVM. However, though their stability result for the L norm is well-behaved, their constant
for the L2 norm scales undesirably with the number of points in the PD. We provide this as
motivation for our Theorem 10, in which we prove the L , L1 , and L2 norms for PI vectors
4
Persistence Images
are stable with respect to the 1-Wasserstein distance between PDs, and in which none of
the constants depend on the number of points in the PD.
By superimposing a grid over a PD and counting the number of topological features in
each bin, Rouse et al. (2015) create a feature vector representation. An advantage of this ap-
proach is that the output is easier to interpret than other more complicated representations,
but a disadvantage is that the vectors are not stable for two reasons:
(i) an arbitrarily small movement of a point in a PD may move it to another bin, and
Source (i) of instability can be improved by first smoothing a PD into a surface. This idea
has appeared multiple times in various formseven prior to the development of persistent
homology, Donatini et al. (1998) and Ferri et al. (1997) convert size functions (closely related
to 0-dimensional PDs) into surfaces by taking a sum of Gaussians centered on each point in
the diagram. This conversion is not stable due to (ii), and we view our work as a continued
study of these surfaces, now also in higher homological dimensions, in which we introduce
a weighting function3 to address (ii) and obtain stability. Chung et al. (2009) produce a
surface by convolving a PD with the characteristic function of a disk, and Pachauri et al.
(2011) produce a surface by centering a Gaussian on each point, but both of these methods
lack stability again due to (ii). Surfaces produced from random PDs are related to the
empirical intensity plots of Edelsbrunner et al. (2012).
Reininghaus et al. (2015) produce a stable surface from a PD by taking the sum of a
positive Gaussian centered on each PD point together with a negative Gaussian centered
on its reflection below the diagonal; the resulting surface is zero along the diagonal. This
approach is similar to PIs, and indeed we use a result of Reininghaus et al. (2015, Theo-
rem 3) to show that persistence surfaces are stable only with respect to the 1-Wasserstein
distance (Remark 6). Nevertheless, we propose our independently-developed surfaces as an
alternative stable representation of PDs with the following potential advantages. First, the
sum of non-negatively weighted Gaussians in PIs may be easier to interpret than a sum
including negative Gaussians. Second, we produce vectors from persistence surfaces with
well-behaved stability bounds, allowing one to use vector-based learning methods such as
linear SVM. Indeed, Zeppelzauer et al. (2016) report that while the kernel of Reininghaus
et al. (2015) can be used with nonlinear SVMs, in practice, this becomes inefficient for a
large number of training vectors because the entire kernel matrix must be computed. Third,
while the surface of Reininghaus et al. (2015) weights persistence points further from the
diagonal more heavily, there are situations in which one may prefer different weightings, as
discussed in 1 and item (v) of our Problem Statement. Hence, one may want weightings on
PD points that are non-increasing or even decreasing when moving away from the diagonal,
an option available in the PI approach.
We produce a persistence surface from a PD by taking a weighted sum of Gaussians
centered at each point. We create vectors, or PIs, by integrating our surfaces over a grid,
allowing ML techniques for finite-dimensional vector spaces to be applied to PDs. Persistence
images are stable, and distinct homology dimensions may be concatenated together into
a single vector to be analyzed simultaneously. Persistence surfaces are studied from the
3. Our weighting function is continuous and zero for points of zero persistence, i.e., points along the diagonal.
5
Adams, et al.
statistical point of view by Chen et al. (2015); their applications in Section 4 use the L1
norm between these surfaces, which can be justified as a reasonable notion of distance due
to Theorem 9 that proves the L1 distance between such surfaces is stable.
Zeppelzauer et al. (2016) apply PIs to 3D surface analysis for archeological data, in
which the machine learning task is to distinguish scans of natural rock surfaces from those
containing ancient human-made engravings. The authors state they select PIs over other
topological methods because PIs are computationally efficient and can be used with a broader
set of ML techniques. PIs are compared to an aggregate topological descriptor for a PD: the
first entry of this vector is the number of points in the diagram, and the remaining entries
are the minimum, maximum, mean, standard deviation, variance, 1st-quartile, median, 3rd-
quartile, sum of square roots, sum, and sum of squares of all the persistence values. In their
three experiments, the authors find the following.
When classifying natural rock surfaces from engravings using PDs produced from the
sublevel set filtration, PIs outperform the aggregate descriptor.
When the natural rock and engraved surfaces are first preprocessed using the completed
local binary pattern (CLBP) operator for texture classificiation (Guo et al., 2010), PIs
outperform the aggregate descriptor.
The authors added PIs and the aggregate descriptor to eleven different non-topological
baseline descriptors, and found that the classification accuracy of the baseline descrip-
tor was improved more by the addition of PIs than by the addition of the aggregate
descriptor.
Furthermore, Zeppelzauer et al. (2016, Table 1) demonstrate that for their machine learning
task, PIs have low sensitivity to the parameter choices of resolution and variance (4).
6
Persistence Images
features die after they are born, necessarily each point appears on or above the diagonal
line y = x. A PD is a multiset of such points, as distinct topological features may have
the same birth and death coordinates.5 Points near the diagonal are often considered to be
noise while those further from the diagonal represent more robust topological features.
In this paper, we produce PDs from two different types of input data:
(i) When the data is a point cloud, i.e., a finite set of points in some space, then we
produce PDs using the VietorisRips filtration.
(ii) When the data is a real-valued function, we produce PDs using the sublevel set filtra-
tion.6
For setting (i), point cloud data often comes equipped with a metric or a measure of inter-
nal (dis)similarity and is rich with latent geometric content. One approach to identifying
geometric shapes in data is to consider the data set as the vertices of a simplicial complex
and to add edges, triangles, tetrahedra, and higher-dimensional simplices whenever their
diameter is less than a fixed choice of scale. This topological space is called the Vietoris
Rips simplicial complex, which we introduce in more detail in A.2. The homology of the
VietorisRips complex depends crucially on the choice of scale, but persistent homology
eliminates the need for this choice by computing homology over a range of scales (Carlsson,
2009; Ghrist, 2008). In 6.16.4.1, we obtain PDs from point cloud data using the Vietoris
Rips filtered simplicial complex, and we use ML techniques to classify the point clouds by
their topological features.
In setting (ii), our input is a real valued function f : X R defined on some domain X.
One way to understand the behavior of map f is to understand the topology of its sublevel
sets f 1 ((, ]). By letting increase, we obtain an increasing sequence of topological
spaces, called the sublevel set filtration, which we introduce in more detail in A.3. In
6.4.2, we obtain PDs from surfaces u : [0, 1]2 R produced from the Kuramoto-Sivashinsky
equation, and we use ML techniques to perform parameter classification.
In both settings, the output of the persistent homology computation is a collection of
PDs encoding homological features of the data across a range of scales. Let D denote the
set of all PDs. The space D can be endowed with metrics as studied by Cohen-Steiner et al.
(2007) and Mileyko et al. (2011). The p-Wasserstein distance defined between two PDs B
and B 0 is given by
X 1/p
Wp (B, B 0 ) = inf 0 ||u (u)||p ,
:BB
uB
where 1 p < and ranges over bijections between B and B 0 . Another standard
choice of distance between diagrams is W (B, B 0 ) = inf 0 sup ||u (u)|| , referred to as
:BB uB
the bottleneck distance. These metrics allow us to measure the (dis)similarity between the
homological characteristics of two data sets.
5. By convention, all points on the diagonal are taken with infinite multiplicity. This facilitates the defini-
tions of the p-Wasserstein and bottleneck distances below.
6. As explained in A.3, (i) can be viewed as a special case of (ii).
7
Adams, et al.
4. Persistence Images
We propose a method for converting a PD into a vector while maintaining an interpretable
connection to the original PD. Figure 1 illustrates the pipeline from data to PI starting with
spectral and spatial information in R5 from an immunofluorescent image of a circulating
tumor cell (Emerson et al., 2015).
Precisely, let B be a PD in birth-death coordinates.7 Let T : R2 R2 be the linear
transformation T (x, y) = (x, y x), and let T (B) be the transformed multiset in birth-
persistence coordinates,8 where each point (x, y) B corresponds to a point (x, y x)
T (B). Let u : R2 R be a differentiable probability distribution with mean u = (ux , uy )
R2 . In all of our applications, we choose this distribution to be the normalized symmetric
Gaussian u = gu with mean u and variance 2 defined as
PIs provide a convenient way to combine PDs of different homological dimensions into a
single object. Indeed, suppose in an experiment the PDs for H0 , H1 , . . . , Hk are computed.
One can concatenate the PI vectors for H0 , H1 , . . . , Hk into a single vector representing all
homological dimensions simultaneously, and then use this concatenated vector as input into
ML algorithms.
When generating a PI, the user makes three choices: the resolution, the distribution
(and its associated parameters), and the weighting function. A strength of PIs is that they
are flexible; a weakness is that these choices are noncanonical.
7. We omit points that correspond to features with infinite persistence, e.g., the H0 feature corresponding
to the connectedness of the complete simplicial complex.
8. Instead of birth-persistence coordinates, one could also use other choices such as birth-death or (average
size)-persistence coordinates. Our stability results (5) still hold with only a slight modification to the
constants.
8
Persistence Images
persistence
death
birth birth
Resolution of the image: The resolution of the PI corresponds to the grid being
overlaid on the PD. The classification accuracy in the PI framework appears to be fairly
robust to choice of resolution, as discussed in 6.2 and by Zeppelzauer et al. (2016).
The Distribution: Our method requires the choice of a probability distribution asso-
ciated to each point in the PD. The examples in this paper use a Gaussian centered at each
point, but other distributions may be used. The Gaussian distribution depends on a choice
of variance: we leave this choice as an open problem, though the experiments in 6.2 and
those of Zeppelzauer et al. (2016) show a low sensitivity to the choice of variance.
The Weighting Function: In order for our stability results in 5 to hold, our weighting
function f : R2 R must be zero along the horizontal axis (the analogue of the diagonal
in birth-persistence coordinates), continuous, and piecewise differentiable. A simple choice
is a weighting function that depends only on the vertical persistence coordinate y. In order
to weight points of higher persistence more heavily, functions which are nondecreasing in y,
such as sigmoidal functions, are a natural choice. However, in certain ML tasks such as the
work of Bendich et al. (2016) the points of small or medium persistence may perform best,
and hence one may choose to use more general weighting functions. In our experiments
in 6, we use a piecewise linear weighting function f : R2 R which only depends on the
persistence coordinate y. Given b > 0, define wb : R R via
0 if t 0,
wb (t) = bt if 0 < t < b, and
1 if t b.
We use f (x, y) = wb (y), where b is the persistence value of the most persistent feature in all
trials of the experiment.
In the event that the birth coordinate is zero for all points in the PD, as is often the
case for H0 , it is possible to generate a 1-dimensional (instead of 2-dimensional) PI using
1-dimensional distributions. This is the approach we adopt. Appendix B displays examples
of PIs for the common topological spaces of a circle and a torus with various parameter
choices.
9
Adams, et al.
set to a PD is stable (Lipschitz) with respect to the bottleneck metric andgiven some
mild assumptions about the underlying datais also stable with respect to the Wasserstein
metrics (Edelsbrunner and Harer (2010)). In 5.1, we show that persistence surfaces and
images are stable with respect to the 1-Wasserstein distance between PDs. In 5.2, we
prove stability with improved constants when the PI is constructed using the Gaussian
distribution.
Theorem 4 The persistence surface is stable with respect to the 1-Wasserstein distance
between diagrams: for B, B 0 D we have
10 kf k || + kk |f | W1 (B, B 0 ).
kB B 0 k
10
Persistence Images
Proof Since we assume B and B 0 consist of finitely many points, there exists a matching
that achieves the infimum in the Wasserstein distance. Then
X X
kB B 0 k = k f (u)u f ((u))(u) k
uT (B) uT (B)
X
kf (u)u f ((u))(u) k
uT (B)
X
kf k || + kk |f | ku (u)k2 by Lemma 3.
uT (B)
X
2 kf k || + kk |f | ku (u)k since k k2 2k k in R2
uT (B)
X
10 kf k || + kk |f | ku (u)k since kT ()k2 5k k
uB
10 kf k || + kk |f | W1 (B, B 0 ).
=
The step transforming from a sum over all u T (B) to one over all u B is necessary be-
cause the Wasserstein distance is defined
using birth-death coordinates, not birth-persistence
coordinates. The bound kT ()k2 5k k follows from the fact the unit ball with respect
to the L norm in R2 (i.e., a square) gets mappedunder T to a parallelogram contained
inside a ball with respect to the L2 norm of radius 5.
Theorem 5 The persistence image I(B ) is stable with respect to the 1-Wasserstein distance
between diagrams. More precisely, if A is the maximum area of any pixel in the image, A0
is the total area of the image, and n is the number of pixels in the image, then
kI(B ) I(B 0 )k 10A kf k || + kk |f | W1 (B, B 0 )
kI(B ) I(B 0 )k1 10A0 kf k || + kk |f | W1 (B, B 0 )
kI(B ) I(B 0 )k2 10nA kf k || + kk |f | W1 (B, B 0 ).
The constant for the L2 norm bound containing n goes to infinity as the resolution of
the image increases. For this reason, in Theorem 10 we provide bounds with better constants
in the specific case of Gaussian distributions.
Proof Note for any pixel p with area A(p) we have
|I(B )p I(B 0 )p | = B dydz B 0 dydx
p p
= B B 0 dydx
p
A(p)kB B 0 k
10A(p) kf k || + kk |f | W1 (B, B 0 )
by Theorem 4.
11
Adams, et al.
Hence we have
10A kf k || + kk |f | W1 (B, B 0 )
kI(B ) I(B 0 )k
kI(B ) I(B 0 )k1 10A0 kf k || + kk |f | W1 (B, B 0 )
kI(B ) I(B 0 )k2 nkI(B ) I(B 0 )k
10nA kf k || + kk |f | W1 (B, B 0 ).
Remark 6 Recall D is the set of all PDs. The kernel k : D D R defined by k(B, B 0 ) =
hI(B ), I(B 0 )iRn is non-trivial and additive, and hence Theorem 3 of Reininghaus et al.
(2015) implies that k is not stable with respect to Wp for any 1 < p . That is, when
1 < p there is no constant c such that for all B, B 0 D we have kI(B ) I(B 0 )k2
cWp (B, B 0 ).
p root is za 0 2 ln (a/b)
positive real p while if b >0 a, the p unique positive real root is
zb i 2 ln (a/b). Since F (za ) = b 2// and F (zb ) = a 2//, we conclude that
r r
0 2 min{a, b} 2 min{a, b}
kF k = , and hence F (z) |a b| + |z|.
The result follows by letting z = v u.
12
Persistence Images
The proof of Lemma 8 is shown in Appendix C and uses a similar construction to that
of Lemma 7. We are prepared to prove the stability of persistence surfaces with Gaussian
distributions.
Theorem 9 The persistence surface with Gaussian distributions is stable with respect to
the 1-Wasserstein distance between diagrams: for B, B 0 D we have
!
r
10 kf k
kB B 0 k1 5|f | + W1 (B, B 0 ).
Proof Since we assume B and B 0 consist of finitely many off-diagonal points, there exists
a matching that achieves the infimum in the Wasserstein distance. Then
X X
kB B 0 k1 =
f (u)gu f ((u))g(u)
uT (B) uT (B)
1
X
kf (u)gu f ((u))g(u) k1
uT (B)
r !
2 kf k X
|f | + ku (u)k2 by Lemma 8, where min{f (u), f (v)} kf k
uT (B)
!
r
10 kf k X
5|f | + ku (u)k since kT ()k2 5k k
uB
!
r
10 kf k
= 5|f | + W1 (B, B 0 ).
Theorem 10 The persistence image I(B ) with Gaussian distributions is stable with respect
to the 1-Wasserstein distance between diagrams. More precisely,
!
r
10 kf k
kI(B ) I(B 0 )k1 5|f | + W1 (B, B 0 )
!
r
10 kf k
kI(B ) I(B 0 )k2 5|f | + W1 (B, B 0 )
!
r
10 kf k
kI(B ) I(B 0 )k 5|f | + W1 (B, B 0 ).
13
Adams, et al.
Proof We have
X
kI(B ) I(B 0 )k1 = B dydz B 0 dydx |B B 0 | dydz
p p p R2
!
r
10 kf k
= kB B 0 k1 5|f | + W1 (B, B 0 )
6. Experiments
We perform several experiments in order to assess the added value of our vector repre-
sentation of PDs. First, in 6.1, we compare the performance of PDs, PLs, and PIs in a
classification task for a synthetic data set consisting of point clouds sampled from six dif-
ferent topological spaces using K-medoids, which requires only a metric space (instead of a
vector space) structure. We find that PIs produce consistently high classification accuracy,
and furthermore, the computation time for PIs is significantly faster than computing bottle-
neck or Wasserstein distances between PDs. In 6.2, we explore the impact that the choices
of parameters determining our PIs have on classification accuracy. We find that the accuracy
is insensitive to the particular choices of PI resolution and distribution variance. In 6.3, we
combine PIs with a sparse support vector machine classifier to identify the most strongly
differentiating pixels for classification; this is an example of a ML task which is facilitated
by the fact that PIs are finite vectors. Finally, as a novel machine learning application,
we illustrate the utility of PIs to infer dynamical parameter values in both continuous and
discrete dynamical systems: a discrete time system called the linked twist map in 6.4.1,
and a partial differential equation called the anisotropic Kuramoto-Sivashinsky equation in
6.4.2.
Our synthetic data set consists of six shape classes: a unit cube, a circle of diameter one,
a sphere of diameter one, three clusters with centers randomly chosen in the unit cube,
three clusters within three clusters (where the centers of the minor clusters are chosen as
smalli.e., <0.1random perturbations from the major cluster centers), and a torus with
a major diameter of one and a minor diameter of one half. We produce 25 point clouds of
500 points sampled uniformly at random from each of the six shapes, and then add a level
of Gaussian noise. This gives 150 point clouds in total.
We then compute the H0 and H1 PDs for the VietorisRips filtration (A.2) built from
each point cloud which have been endowed with the ambient Euclidean metric on R3 .
Our goal is to compare various methods for transforming PDs into distance matrices
to be used to establish proximity of topological features extracted from data. We create
32 22 = 36 distance matrices of size 150 150, using three choices of representation (PDs,
14
Persistence Images
PLs, PIs), three choices of metric (L1 , L2 , L ),9 two choices of Gaussian noise ( = 0.05,
0.1), and two homological dimensions (H0 , H1 ). For example, the PD, H1 , L2 , = 0.1,
distance matrix contains the 2-Wasserstein distances between the H1 PDs for the random
point clouds with noise level 0.1. By contrast, the PI, H1 , L2 , = 0.1 distance matrix
contains all pairwise L2 distances between the PIs10 produced from the H1 PDs with noise
level 0.1.
We first compare these distance matrices based on how well they classify the random
point clouds into shape classes via K-medoids clustering (Kaufman and Rousseeuw, 1987;
Park and Jun, 2009). K-medoids produces a partition of a metric space into K clusters
by choosing K points from the data set called medoids and assigning each metric space
point to its closest medoid. The score of such a clustering is the sum of the distances from
each point to its closest medoid. The desired output of K-medoids is the clustering with
the minimal clustering score. Unfortunately, an exhaustive search for the global minimum
is often prohibitively expensive. A typical approach to search for this global minimum is
to choose a large selection of K random initial medoids, improve each selection of medoids
iteratively in rounds until the clustering score stabilizes and then return the identified final
clustering with the lowest score for each initialization. In our experiments, we choose 1,000
random initial selections of K = 6 medoids (as there are six shape classes) for each distance
matrix, improve each selection of medoids using the Voronoi iteration method (Park and
Jun, 2009), and return the clustering with the lowest classification score. To each K-medoids
clustering we assign an accuracy which is equal to the percentage of random point clouds
identifed with a medoid of the same shape class. In Table 1, we report the classification
accuracy of the K-medoids clustering with the lowest clustering score, for each distance
matrix.
Our second criterion for comparing methods to produce distance matrices is compu-
tational efficiency. In Table 1, we report the time required to produce each distance
matrix, starting with 150 precomputed PDs as input. In the case of PLs and PIs, this
time includes the intermediate step of transforming each PD into the alternate represen-
tation, as well as computing the pairwise distance matrix. All timings are computed on
a laptop with a 1.3 GHz Intel Core i5 processor and 4 GB of memory. We compute
bottleneck, 1-Wasserstein, and 2-Wasserstein distance matrices using the software of Ker-
ber et al. (2016). For PL computations, we use the Persistence Landscapes Toolbox by
Bubenik and Dlotko (2016). Our MATLAB code for producing PIs is publically available
at https://github.com/CSU-TDA/PersistenceImages.
We see in Table 1 that PI distance matrices have higher classification accuracy than
nearly every PL distance matrix, and higher classification accuracy than PDs in half of the
trials.
Furthermore, the computation times for PI distance matrices are significantly lower
than the time required to produce distance matrices from PDs using the bottleneck or p-
Wasserstein metrics. There is certainly no guarantee that PIs will outperform PDs or PLs
in any given machine learning task. However, in this experiment, persistent images provide
9. The L1 , L2 , L distances on PDs are more commonly known as the 1-Wasserstein, 2-Wasserstein, and
bottleneck distances.
10. For PIs in this experiment, we use variance = 0.1, resolution 20 20, and the weighting function
defined in 4.
15
Adams, et al.
Table 1: Comparing classification accuracy and times of PDs, PLs, and PIs. The timings
contain the computation time in seconds for producing a 150 150 distance matrix from 150
precomputed PDs. In the case of PLs and PIs, this requires first transforming each PD into
its alternate representation and then computing a distance matrix. We consider 36 distinct
distance matrices: three representations (PDs, PLs, PIs), two homological dimensions (H0 ,
H1 ), three choices of metric (L1 , L2 , L ), and two levels of Gaussian noise ( = 0.05, 0.1).
Accuracy Time Accuracy Time
Distance Matrix (Noise 0.05) (Noise 0.05) (Noise 0.1) (Noise 0.1)
PD, H0 , L1 96.0% 37346s 96.0% 42613s
PD, H0 , L2 91.3% 24656s 91.3% 25138s
PD, H0 , L 60.7% 1133s 63.3% 1149s
PD, H1 , L1 100% 657s 96.0% 703s
PD, H1 , L2 100% 984s 97.3% 1042s
PD, H1 , L 81.3% 527s 66.7% 564s
PL, H0 , L1 92.7% 29s 96.7% 33s
PL, H0 , L2 77.3% 29s 82.0% 34s
PL, H0 , L 60.7% 2s 63.3% 2s
PL, H1 , L1 83.3% 36s 80.7% 48s
PL, H1 , L2 83.3% 50s 66.7% 69s
PL, H1 , L 74.7% 8s 66.7% 9s
PI, H0 , L1 93.3% 9s 95.3% 9s
PI, H0 , L2 92.7% 9s 95.3% 9s
PI, H0 , L 94.0% 9s 96.0% 9s
PI, H1 , L1 100% 17s 95.3% 18s
PI, H1 , L2 100% 17s 96.0% 18s
PI, H1 , L 100% 17s 96.0% 18s
a representation of persistent diagrams which is both useful for the classification task and
also computationally efficient.
16
Persistence Images
(a) (b)
1 1
0.8 0.8
accuracy 0.6 0.6
0.4 0.4
0.2 0.2
0 0
20 40 60 80 100 20 40 60 80 100
resolution resolution
(c) (d)
1 1
0.8 0.8
accuracy
H0
0.6 0.6
0.4 0.4
0.2 0.2 H1
0 0
0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2
variance variance
Figure 2: K-medoids classification accuracy as a function of resolution and variance for the
data set of six shape classes. First column: noise level = 0.05. Second column: noise level
= 0.1. First row: fixed variance 0.1 with resolutions ranging from 5 5 to 100 100 in
increments of 5. Second row: fixed resolution 20 20 with variances ranging from 0.01 to
0.2 in increments of 0.01.
clustering with the minimum clustering score on the two sets of noise levels for the homology
dimensions H0 and H1 . We observe that the classification accuracy is insensitive to the choice
of resolution and variance.
The plots in Figure 2 are characteristic of the 2-dimensional accuracy surface over all
combinations of parameters in the ranges of variances and resolutions we tested. In an
application to archeology, Zeppelzauer et al. (2016) find a similar robustness of PIs to the
choices of resolution and variance.
17
Adams, et al.
using the 1-norm results in minimizing the number of kernel functions, not the number of
features in the input space (i.e., pixels in our application) (Fung and Mangasarian, 2004).
Hence, for the purpose of feature selection or more precisely, PI pixel selection, we employ
the linear SSVM.
We adopt the one-against-all (OAA) SSVM on the sets of H0 and H1 PIs from the six
class shape data. In a one-against-all SSVM, there is one binary SSVM for each class to
separate members of that class from members of all other classes. The PIs were generated
using resolution 20 20, variance 0.0001, and noise level 0.05. Note that because of the
resolution parameter choice of 20 20, each PI is a 400-dimensional vector, and the selected
features will be a subset of indices corresponding to pixels within the PI. Using 5-fold cross-
validated SSVM resulted in 100% accuracy comparing six sparse models with indications
of the discriminatory features. Feature selection is performed by retaining the features
(again, in this application, pixels) with non-zero SSVM weights, determined by magnitude
comparison using weight ratios; for details see Chepushtanova et al. (2014). Figure 3 provides
two examples, indicating the pixels of H1 PIs that discriminate circles and tori from the other
classes in the synthetic data set.
Figure 3: SSVM-based feature (pixel) selection for H1 PIs from two classes of the synthetic
data. Selected pixels are marked by blue crosses. (a) A noisy circle with the two selected
pixels (indices 21 and 22 out of 400). (b) A noisy torus with the two selected pixels (indices
59 and 98 out of 400). The PI parameters used are resolution 20 20 and variance 104 ,
for noise level 0.05.
Feature selection produces highly interpretable results. The discriminatory pixels in the
H1 PIs that separate circles from the other classes correspond to the region where highly
persistent H1 topological features exist across all samples of a noisy circle (highlighted
in Figure 3a). Alternatively, the discriminatory pixels in H1 PIs that separate tori from
the other classes correspond to points of short to moderate persistence (see Figure 3b).
In this way, Figure 3b reiterates an observation of Bendich et al. (2016) that points of
short to moderate persistence can contain important discriminatory information. Similar
conclusions can be drawn from the discriminatory pixels of others classes (Appendix D).
Our classification accuracy of 100% is obtained using only those pixels selected by SSVM (a
cumulative set of only 10 distinct pixels).
18
Persistence Images
19
Adams, et al.
Figure 4: Examples of the first 1000 iterations, {(xn , yn ) : n = 0, . . . , 1000}, of the linked
twist map with parameter values r = 2, 3.5, 4.0, 4.1 and 4.3, respectively.
Figure 5: Truncated orbits, {(xn , yn ) : n = 0, . . . , 1000}, of the linked twist map with fixed
r = 4.3 for different initial conditions (x0 , y0 ).
(of a fixed dimension), and classifies and assigns a score to each point based on the current
subspace. The final classification arises from an average of the scores of each data point over
all learners (Ho, 1998). We perform 10 trials and average the classification accuracies. For
the concatenated H0 and H1 PIs, this method achieves a classification accuracy of 82.5%;
compared to 49.8% when using only H0 PIs and 65.7% when using H1 PIs. This experiment
highlights two strengths of PIs: they offer flexibility in choosing a ML algorithm that is
well suited to the data under consideration, and homological information from multiple
dimensions may be leveraged simultaneously for greater discriminatory power.
This application is a brief example of the utility of PIs in classification of data from dy-
namical systems and modeling real-world phenomena, which provides a promising direction
for further applications of PIs.
2 2
u = 2 u 2 2 u + r u + u , (4)
t x y
20
Persistence Images
2 2
where 2 = x 2 + y 2 , and the real parameter r controls the degree of anisotropy. At
a fixed time t , u(x, y, t ) is a patterned surface (periodic in both x and y) defined over
the (x, y)-plane. Visibly, the anisotropy appears as a slight tendency for the pattern to be
elongated in the vertical or horizontal direction.
Numerical simulations of the aKS equation for a range of parameter values (columns) and
simulation times (rows) are shown in Figure 6. For all simulations, the initial conditions were
low-amplitude white noise. We employed a Fourier spectral method with periodic boundary
conditions on a 512 512 spatial grid, with a fourth-order exponential time differencing
Runge-Kutta method for the time stepping. Five values for the parameter r were chosen,
namely r = 1, 1.25, 1.5, 1.75 and 2, and thirty trials were performed for each parameter
value. Figure 7 shows the similarity between surfaces associated to two parameter values
r = 1.75 and r = 2 at an early time.
Figure 7: To illustrate the difficulty of our classification task, consider five instances of
surfaces u(x, y, 3) for r = 1.75 or r = 2, plotted on the same color axis. These surfaces
are found by numerical integration of Equation (4), starting from random initial conditions.
Can you group the images by eye?
Answer: (from left) r = 1.75, 2, 1.75, 2, 2.
21
Adams, et al.
We aim to identify the anisotropy parameter for each simulation using snapshots of
surfaces u(x, y, ) as they evolve in time. Inference of the parameter using the surface
alone proves difficult for several reasons. First, Equation (4) exhibits sensitivity to initial
conditions: initially nearby solutions diverge quickly. Second, although the surface u(x, y, t )
at a fixed time is an approximation due to the finite discretization of its domain, the spatial
resolution is still very large: in fact, these surfaces may be thought of as points in R266144 .
We were unable to perform standard classification techniques in this space. It was therefore
necessary to perform some sort of dimension reduction. One such method is to simply
resize the surface by coarsening the discretization of the spatial domain after computing
the simulation at a high resolution by replacing a block of grid elements with their average
surface height. The surfaces were resized in this way to a resolution of 10 10 and a
subspace discriminant ensemble was used to perform classification. Unsurprisingly, this
method performs very poorly at all times (first row of Table 2).
The anisotropy parameter also influences the mean and amplitude of the surface pattern.
We eliminate differences in the mean by mean-centering each surface after the simulation. To
assess the impact of the variance of surface height on our task, classification was performed
using a normal distribution-based classifier built on the variances of the surface heights. In
this classifier, a normal distribution was fit to a training set of 2/3 of the variances for each
parameter value, and the testing data was classified based on a z-test for each of the different
models. That is, a p-value for each new variance was computed for membership to the five
normal distributions (corresponding to the five parameter choices of r), and the surface
was classified based on the model yielding the highest p-value. After the pattern has more
fully emerged (by, say, time t = 5) this method of classification yields 75% accuracy,11 as
shown in Table 2. However, early on in the formation of the pattern, this classifier performs
very poorly because height variance is not yet a discriminating feature. Figure 8 shows the
normal distribution fit to the variance of the surfaces for each parameter value at times t = 3
and 5, and illustrates why the variance of surface height is informative only after a surface
is allowed to evolve for a sufficiently long time.
0.30
0.06
0.10
0.02
19 20 21 22 23 24 25 40 50 60 70 80 90 100
variance variance
Figure 8: Histograms of the variances of surface heights for each parameter value, and the
normal distribution fit to each histogram, for times (a) t = 3 and (b) t = 5.
11. Accuracy reported is averaged over 100 different training and testing partitions.
22
Persistence Images
Table 2: Classification accuracies at different times of the aKS solution, using different
classification approaches. Classification of times t = 15 and 20 result in accuracies similar
to t =10.
Variance of a surface is reflected in its sublevel set filtration (see A.3 for more details)
PD. Yet, the PD and the subsequent PI contain additional topological structure, which
may reveal other influences of the anisotropy parameter on the evolution of the surface.
Persistence diagrams were computed using the sublevel set filtration, and PIs were generated
with resolution 10 10 and a Gaussian with variance = 0.01. We think of our pipeline
to a PI as a dimensionality reduction in this case, taking a surface which in actuality is
a very high-dimensional point and producing a much lower dimensional one that retains
meaningful characteristics of the original surface.
As we observed in 6.4.1, concatenating H0 and H1 PIs can notably improve the clas-
sification accuracy over either feature vector individually. We again note that classification
accuracy appears insensitive to the PI parameters. For example, when the variance of the
Guassians used to generate the PIs was varied from 0.0001 to 0.1, the classification accuracy
of the H0 PIs, changed by less than one percentage point. The classification accuracy for H1
fluctuated in a range of approximately five points. For a fixed variance, when the resolution
of the image was varied from 5 to 20, the H0 accuracy varied by little more than three points
until the accuracy dropped by six points for a resolution of 25.
PIs performed remarkably well in this classification task, allowing one to capitalize on
subtle structural differences in the patterns and significantly reduce the dimension of the
data for classification. There is more to be explored in the realm of pattern formation and
persistence that is outside the scope of this paper.
23
Adams, et al.
7. Conclusion
PIs offer a stable representation of the topological characteristics captured by a PD. Through
this vectorization, we open the door to a myriad of ML tools. This serves as a vital bridge
between the fields of ML and topological data analysis and enables one to capitalize on
topological structure (even in multiple homological dimensions) in the classification of data.
We have shown PIs yield improved classification accuracy over PLs and PDs on sampled
data of common topological spaces at multiple noise levels using K-medoids. Additionally,
computing distances between PIs requires significantly less computation time compared to
computing distances between PDs, and comparable computation times with PLs. Through
PIs, we have gained access to a wide variety of ML tools, such as SSVM which can be used
for feature selection. Features (pixels) selected as discriminatory in a PI are interpretable
because they correspond to regions of a PD. We have explored data sets derived from
dynamical systems and illustrated that topological information of solutions can be used for
inference of parameters since PIs encapsulate this information in a form amenable to ML
tools, resulting in high accuracy rates for data that is difficult to classify.
The classification accuracy is robust to the choice of parameters for building PIs, pro-
viding evidence that it is not necessary to perform large-scale parameter searches to achieve
reasonable classification accuracy. This indicates the utility of PIs even when there is not
prior knowledge of the underlying data (i.e., high noise level, expected holes, etc.). The
flexibility of PIs allows for customization tailored to a wide variety of real-world data sets.
Acknowledgments
We would like to acknowledge the research group of Paul Bendich at Duke University for
allowing us access to a persistent homology package, which greatly reduced computational
time and made analysis of large point clouds feasible. This code can be accessed via GitLab
after submitting a request to Paul Bendich. This research is partially supported by the
National Science Foundation under Grants No. DMS-1228308, DMS-1322508, NSF DMS-
1115668, NSF DMS-1412674, NSF DMS-1615909, and DMR-1305449 as well as the DOD-
USAF under Award Number FA9550-12-1-0408.
24
Persistence Images
k+1k
Ck+1 Ck Ck1 .
Here, vector space Ck consists of all F-linear combinations of the k simplices of S, and has
as a basis the set of all k-simplices. The linear map k : Ck Ck1 , known as the boundary
operator, maps a k-simplex to its boundary, a sum of its (k 1)-faces. More formally, the
boundary map acts on a k-simplex [v0 , v1 , . . . , vk ] by
k
X
k ([v0 , v1 , . . . , vk ]) = (1)i [v0 , . . . , vi , . . . , vk ],
i=0
25
Adams, et al.
a large number of connected components, and selecting too large results in a topological
space that is contractible (equivalent to a single point).
The idea of persistent homology is to compute homology at many scales and observe
which topological features persist across those scales (Ghrist, 2008; Carlsson, 2009; Edels-
brunner and Harer, 2008). Indeed, if 1 2 . . . m is an increasing sequence of
scales, then the corresponding VietorisRips simplicial complexes form a filtered sequence
S1 S2 . . . Sm . As varies, so does the homology of S , and for any homological
dimension k we get a sequence of linear maps Hk (S1 ) Hk (S2 ) . . . Hk (Sm ). Per-
sistent homology tracks the homological features over a range of values of . Those features
which persist over a larger range are considered to be true topological characteristics, while
short-lived features are often considered as noise.
For each choice of homological dimension k, the information measured by persistent
homology can be presented as a persistence diagram (PD), a multiset of points in the plane.
Each point (x, y) = (, 0 ) corresponds to a topological feature that appears (is born) at
scale parameter and which no longer remains (dies) at scale 0 . Since all topological
features die after they are born, this is an embedding into the upper half plane, above the
diagonal line y = x. Points near the diagonal are considered to be noise while those further
from the diagonal represent more robust topological features.
If X is a simplicial complex, then one can produce an increasing sequence of simplicial com-
plexes using a modification of this procedure called the lower star filtration (Edelsbrunner
and Harer, 2010). Similarly, if X is a cubical complex (an analogue of a simplicial complex
that is instead a union of vertices, edges, squares, cubes, and higher-dimensional cubes),
then one can produce an increasing sequence of cubical complexes.
In 6.4.2, we study surfaces u : [0, 1]2 R produced from the Kuramoto-Sivashinsky
equation. The domain [0, 1]2 is discretized into a grid of 512 512 vertices, i.e., a 2-
dimensional cubical complex with 5122 vertices, 511 512 horizontal edges, 511 512 vertical
edges, and 5112 squares. We produce an increasing sequence of cubical complexes as follows:
Our PDs are obtained by taking the persistent homology of the sublevel set filtration of this
cubical complex.
26
Persistence Images
We remark that PDs from point cloud data in A.2 can be viewed as a specific case of
PDs from functions. Indeed, given a data set X in some metric space (M, d), let dX : M R
be the distance function to set X, defined by dX (m) = inf xX d(x, m) for all m M . Note
that d1
X ((, ]) is the union of the metric balls of radius centered at each point in X.
For 1 2 . . . m , the persistent homology of
d1 1 1
X ((, 1 ]) dX ((, 2 ]) . . . dX ((, m ])
is identical to the persistent homology of a simplicial complex filtration called the ech
complex. Furthermore, the persistent homology of the VietorisRips complex is an approx-
imation of the persistent homology of the ech complex (Edelsbrunner and Harer, 2010,
Section III.2).
Figure 9: Examples of PIs for homology dimension H1 arising from a noisy circle with a
variety of resolutions and variances. The first row has resolution 5 5 while the second has
50 50. The columns have variance = 0.2, = 0.01, and = 0.0001, respectively.
27
Adams, et al.
Figure 10: Examples of PIs for homology dimension H1 arising from a noisy torus with a
variety of resolutions and variances. The first row has resolution 5 5 while the second has
50 50. The columns have variance = 0.2, = 0.01, and = 0.0001, respectively.
(
|a b| if z = 0
F (z) = z 2 +2 2 ln(a/b)
2
z +2 2 ln(a/b)
aErf z2 2
bErf
z2 2
otherwise.
Proof If v = u then the statement follows from the fact that gu and gv are normalized to
have unit area under the curve. Hence we may assume u 6= v.
For u 6= v a straightforward calculation shows there is a unique real solution z to
agu (z) = bgv (z), namely
v 2 u2 + 2 2 ln(a/b)
z = .
2(v u)
Note
z
kagu bgv k1 = |agu (z)bgv (z)|dz = agu (z) bgv (z)dz + bgv (z) agu (z)dz .
z
(5)
28
Persistence Images
There are four integrals to compute, and we do each one in turn. We have
z z
a 2 2
agu (z) dz = e(zu) /2 dz
2
P (vu)
a 2
= et dt by substitution t = zu
2
0 P (vu) !
a 2 2
= et dt + et dt
0
a
= + Erf(P (v u))
2 2
a
= (1 + Erf(P (v u))) ,
2
z 2 +2 2 ln(a/b)
where P (z) = z2 2
. Nearly identical calculations show
a
(1 Erf(P (v u)))
agu (z) dz =
z 2
z
b
bgv (z) dz = (1 + Erf(Q(v u)))
2
b
bgv (z) dz = (1 Erf(Q(v u))) ,
z 2
z 2 +2 2 ln(a/b)
where Q(z) =
z2 2
. Plugging back into (5) gives kagu bgv k1 = F (v u).
r !
2 min{f (u), f (v)}
kf (u)gu f (v)gv k1 |f | + ku vk2 .
Proof The result will follow from the observation that we can reduce the two-dimensional
case involving Gaussians centered at u, v R2 to one-dimensional Gaussians centered at 0
and r = ku vk2 . Let u = (ux , uy ) and v = (vx , vy ); we may assume ux > vx w.l.o.g. Let
(r, ) be the magnitude and angle of vector u v when expressed in polar coordinates. The
change of variables (z, w) = R (x vx , y vy ), where R is the clockwise rotation of the
29
Adams, et al.
plane by , gives
kf (u)gu f (v)gv k1
f (u) [(xu )2 +(yu )2 ]/22 f (v) [(xv )2 +(yv )2 ]/2 2
=
2
e x y
e x y dy dx
2
2 2
f (u) [w2 +(zr)2 ]/22 f (v) [w2 +z 2 ]/22
= 2 2 e
e dz dw
2 2
1 w2 /2 2
f (u) (zr)2 /22 f (v) z 2 /22
= e 2 e
e dz dw
2 2
1 2 2
=kf (u)gr f (v)g0 k1 ew /2 dw with g0 , gr 1-dimensional Gaussians
2
=kf (u)gr f (v)g0 k1
r
2 min{f (u), f (v)}
|f (u) f (v)| + ku vk2 by Lemma 7
r !
2 min{f (u), f (v)}
|f | + ku vk2 .
We performed feature selection using one-against-all (OAA) SSVM on the six classes of
synthetic data with noise level = 0.05. The PIs used in the experiments were generated
from the H1 PDs, with the parameter choices of resolution 20 20 and variance = 0.0001.
Note that because of the resolution parameter choice of 20 20, each PI is a vector in
R400 , and the selected features will be a subset of indices corresponding to pixels within
the PI. We trained an OAA SSVM model for PIs of dimension H1 . In the experiment,
we used 5-fold cross-validation and obtained 100% overall accuracy. Feature selection was
performed by retaining the features with non-zero SSVM weights, determined by magnitude
comparison using weight ratios (Chepushtanova et al., 2014). The resulting six sparse models
contain subsets of discriminatory features for each class. Note that one can use only these
selected features for classification without loss in accuracy. These features correspond to
discriminatory pixels in the persistence images.
Figure 11 shows locations of pixels in the vectorized PIs selected by OAA SSVM that
discriminate each class from all the others.
30
Persistence Images
Figure 11: SSVM-based feature (pixel) selection for H1 PIs from the six classes of the
synthetic data. The parameters used are resolution 20 20 and variance 0.0001, for noise
level 0.05. Selected pixels are marked by blue crosses. (a) A noisy solid cube with the two
selected pixels (indices 59 and 79 out of 400). (b) A noisy torus with the two selected pixels
(indices 59 and 98 out of 400). (c) A noisy sphere with the five selected pixels (indices 58,
59, 60, 79, and 98 out of 400). (d) Noisy three clusters with the one selected pixel (index
20 out of 400). (e) Noisy three clusters within three clusters with the seven selected pixels
(indices 20, 40, 59, 60, 79, 80, and 98 out of 400). (f) A noisy circle with the two selected
pixels (indices 21 and 22 out of 400).
References
Aaron Adcock, Erik Carlsson, and Gunnar Carlsson. The ring of algebraic functions on
persistence bar codes. Homology, Homotopy and Applications, 18(1):381402, 2016.
Paul Bendich. Analyzing Stratified Spaces Using Persistent Versions of Intersection and
Local Homology. PhD thesis, Duke University, 2009.
Paul Bendich, James S. Marron, Ezra Miller, Alex Pieloch, and Sean Skwerer. Persistent
homology analysis of brain artery trees. Ann. Appl. Stat., 10(1):198218, 2016.
Paul S. Bradley and Olvi L. Mangasarian. Feature selection via concave minimization and
support vector machines. In Proceedings of the Fifteenth International Conference on
Machine Learning, pages 8290, San Francisco, CA, 1998.
Peter Bubenik. Statistical topological data analysis using persistence landscapes. The Jour-
nal of Machine Learning Research, 16(1):77102, 2015.
31
Adams, et al.
Peter Bubenik and Pawel Dlotko. A persistence landscapes toolbox for topological statistics.
Journal of Symbolic Computation, 2016. Accepted.
Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46
(2):255308, 2009.
Mathieu Carrire, Steve Y. Oudot, and Maks Ovsjanikov. Stable topological signatures for
points on 3d shapes. In Computer Graphics Forum, volume 34, pages 112, 2015.
Frdric Chazal, Vin de Silva, and Steve Oudot. Persistence stability for geometric com-
plexes. Geometriae Dedicata, 173(1):193214, 2014.
Yen-Chi Chen, Daren Wang, Alessandro Rinaldo, and Larry Wasserman. Statistical analysis
of persistence intensity functions. arXiv preprint arXiv:1510.02502, 2015.
Sofya Chepushtanova, Christopher Gittins, and Michael Kirby. Band selection in hyper-
spectral imagery using sparse support vector machines. In Proceedings SPIE DSS 2014,
volume 9088, pages 90881F90881F15, 2014.
Moo K. Chung, Peter Bubenik, and Peter T. Kim. Persistence diagrams of cortical surface
data. In Information Processing in Medical Imaging, pages 386397. Springer, 2009.
David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence dia-
grams. Discrete & Computational Geometry, 37(1):103120, 2007.
David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz
functions have Lp -stable persistence. Foundations of computational mathematics, 10(2):
127139, 2010.
Barbara Di Fabio and Massimo Ferri. Comparing persistence diagrams through complex
vectors. In International Conference on Image Analysis and Processing 2015 Part I;
Editors V. Murino, E. Puppo, LNCS 9279, pages 294305, 2015.
Pietro Donatini, Patrizio Frosini, and Alberto Lovato. Size functions for signature recog-
nition. In SPIEs International Symposium on Optical Science, Engineering, and Instru-
mentation, pages 178183, 1998.
32
Persistence Images
Herbert Edelsbrunner, Alexandr Ivanov, and Roman Karasev. Current open problems in
discrete and computational geometry. Modelirovanie i Analiz Informats. Sistem, 19(5):
517, 2012.
Tegan Emerson, Michael Kirby, Kelly Bethel, Anand Kolatkar, Madelyn Luttgen, Stephen
OHara, Paul Newton, and Peter Kuhn. Fourier-ring descriptor to characterize rare circu-
lating cells from images generated using immunofluorescence microscopy. Computerized
Medical Imaging and Graphics, 40:7087, 2015.
Massimo Ferri and Claudia Landi. Representing size functions by complex polynomials.
Proc. Math. Met. in Pattern Recognition, 9:1619, 1999.
Massimo Ferri, Patrizio Frosini, Alberto Lovato, and Chiara Zambelli. Point selection: A
new comparison scheme for size functions (with an application to monogram recognition).
In Computer Vision ACCV98, pages 329337. Springer, 1997.
Glenn M. Fung and Olvi L. Mangasarian. A feature selection newton method for support
vector machine classification. Computational Optimization and Applications, 28(2):185
202, 2004.
Robert Ghrist. Barcodes: The persistent topology of data. Bulletin of the American Math-
ematical Society, 45(1):6175, 2008.
Zhenhua Guo, Lei Zhang, and David Zhang. A completed modeling of local binary pattern
operator for texture classification. IEEE Transactions on Image Processing, 19(6):1657
1663, 2010.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, and Leonidas J Guibas.
Image webs: Computing and exploiting connectivity in image collections. In 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition.
Jan-Martin Hertzsch, Rob Sturman, and Stephen Wiggins. DNA microarrays: Design prin-
ciples for maximizing ergodic, chaotic mixing. Small, 3(2):202218, 2007.
Tin Kam Ho. The random subspace method for constructing decision forests. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 20(8):832844, 1998.
Michael Kerber, Dmitriy Morozov, and Arnur Nigmetov. Geometry helps to compare persis-
tence diagrams. In 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering
and Experiments (ALENEX), pages 103112, 2016.
33
Adams, et al.
Yuriy Mileyko, Sayan Mukherjee, and John Harer. Probability measures on the space of
persistence diagrams. Inverse Problems, 27(12):124007, 2011.
Francis C. Motta, Patrick D. Shipman, and R. Mark Bradley. Highly ordered nanoscale
surface ripples produced by ion bombardment of binary compounds. Journal of Physics
D: Applied Physics, 45(12):122001, 2012.
Deepti Pachauri, Christian Hinrichs, Moo K. Chung, Sterling C. Johnson, and Vikas Singh.
Topology-based kernels with application to inference problems in Alzheimers disease.
IEEE Transactions on Medical Imaging, 30(10):17601770, 2011.
Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for k-medoids clustering.
Expert Systems with Applications, 36(2):33363341, 2009.
Daniel A. Pearson, R. Mark Bradley, Francis C. Motta, and Patrick D. Shipman. Pro-
ducing nanodot arrays with improved hexagonal order by patterning surfaces before ion
sputtering. Phys. Rev. E, 92:062401, Dec 2015.
Jose A. Perea and John Harer. Sliding windows and persistence: An application of topolog-
ical methods to signal analysis. Foundations of Computational Mathematics, pages 140,
2013.
Jan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt. A stable multi-scale kernel
for topological machine learning. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 47414748, 2015.
Martin Rost and Joachim Krug. Anisotropic Kuramoto-Sivashinsky equation for surface
growth and erosion. Physical Review Letters, 75:3894, 1995.
David Rouse, Adam Watkins, David Porter, John Harer, Paul Bendich, Nate Strawn, Eliz-
abeth Munch, Jonathan DeSena, Jesse Clarke, Jeffrey Gilbert, Peter Chin, and Andrew
Newman. Feature-aided multiple hypothesis tracking using topological and statistical
behavior classifiers. In SPIE Proceedings, volume 9474, page 94740L, 2015.
Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and
Dario L. Ringach. Topological analysis of population activity in visual cortex. Journal of
Vision, 8(8):11, 2008.
Chad M. Topaz, Lori Ziegelmeier, and Tom Halverson. Topological data analysis of biological
aggregation models. PloS One, 10(5):e0126383, 2015.
Katharine Turner, Yuriy Mileyko, Sayan Mukherjee, and John Harer. Frchet means for
distributions of persistence diagrams. Discrete & Computational Geometry, 52(1):4470,
2014.
Sara Kalinik Verovek. Tropical coordinates on the space of persistence barcodes. arXiv
preprint arXiv:1604.00113, 2016.
J. Villain. Continuum models of crystal growth from atomic beams with and without des-
orption. J. Phys. I France, 1:1942, 1991.
34
Persistence Images
Dietrich E. Wolf. Kinetic roughening of vicinal surfaces. Physical Review Letters, 67:1783,
1991.
Matthias Zeppelzauer, Bartosz Zieliski, Mateusz Juda, and Markus Seidl. Topological
descriptors for 3d surface analysis. In Computational Topology in Image Context: 6th
International Workshop Proceedings, pages 7787, Marseille, France, 2016.
Li Zhang and Weida Zhou. On the sparseness of 1-norm support vector machines. Neural
Networks, 23(3):373385, 2010.
Ji Zhu, Saharon Rosset, Trevor Hastie, and Rob Tibshirani. 1-norm support vector machines.
Advances in neural information processing systems, 16(1):4956, 2004.
Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Com-
putational Geometry, 33(2):249274, 2005.
35