Abstract
One of the primary areas of interest in applied algebraic topology is persistent homology, and, more specifically, the persistence diagram. Persistence diagrams have also become objects of interest in topological data analysis. However, persistence diagrams do not naturally lend themselves to statistical goals, such as inferring certain population characteristics, because their complicated structure makes common algebraic operations—such as addition, division, and multiplication—challenging (e.g., the mean might not be unique). To bypass these issues, several functional summaries of persistence diagrams have been proposed in the literature (e.g. landscape and silhouette functions). The problem of analyzing a set of persistence diagrams then becomes the problem of analyzing a set of functions, which is a topic that has been studied for decades in statistics. First, we review the various functional summaries in the literature and propose a unified framework for the functional summaries. Then, we generalize the definition of persistence landscape functions, establish several theoretical properties of the persistence functional summaries, and demonstrate and discuss their performance in the context of classification using simulated prostate cancer histology data, and two-sample hypothesis tests comparing human and monkey fibrin images, after developing a simulation study using a new data generator we call the Pickup Sticks Simulator (STIX).
Similar content being viewed by others
Notes
Actually, as long as we have a filtration, we can define a persistence diagram.
Some literature considers the lower level set \(f^{-1}((-\infty , \lambda ))\). Both definitions are valid and here we use upper level sets because they yield a very straight forward definition when we consider the image data or certain functions estimated from a point cloud, such as kernel density estimates (Wasserman 2006).
Adams et al. (2017) also presented conditions under which persistence images are stable with respect to changes in the corresponding persistence diagrams.
A finitely discrete probability measure Q puts probability mass only finitely many points in \({\mathcal {B}}_F\).
While Fasy et al. (2018) will release a version of the gland generator that is fully verified with a pathologist, the preliminary version of this software that we used for the experiments in this paper will be made available upon request.
R (R Core Team 2017) is used to produce the STIX images, and the thickness t of the segments are set by the lwd plotting option.
The p-Wasserstein distance between persistence diagrams \(D_1\) and \(D_2\) is \(d_p(D_1, D_2) = \left( \inf _{\phi :D_1 \rightarrow D_2}\sum _{d\in D_1}\Vert d - \phi (d)\Vert _p^p\right) ^{1/p}\), where \(\phi \) is a bijection between the two persistence diagrams. See Robinson and Turner (2017) for more details.
This is important to ensure that the tests do not consistently incorrectly reject the null hypothesis.
References
Abdollahi, A., Meysamie, A., Sheikhbahaei, S., Ahmadi, A., Tabriz, H.M., Bakhshandeh, M., Hosseinzadeh, H.: Inter/intra-observer reproducibility of Gleason scoring in prostate adenocarcinoma in Iranian pathologists. Urol. J. 9(2), 486–490 (2012)
Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., Ziegelmeier, L.: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18(1), 218–252 (2017)
Adcock, A., Carlsson, E., Carlsson, G.: The ring of algebraic functions on persistence bar codes. Homol. Homotopy Appl. 18(1), 381–402 (2016)
Adler, R.J.: On excursion sets, tube formulas and maxima of random fields. Ann. Appl. Probab. 10, 1–74 (2000)
Adler, R.J., Agami, S.: Modelling persistence diagrams with planar point processes, and revealing topology with bagplots. J. Appl. Comput. Topol. 3(3), 139–183 (2019)
Adler, R.J., Bobrowski, O., Borman, M.S., Subag, E., Weinberger, S.: Persistent homology for random fields and complexes. In: Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown, Vol. 6, pp. 124–143. Institute of Mathematical Statistics (2010)
Bendich, P., Marron, J.S., Miller, E., Pieloch, A., Skwerer, S.: Persistent homology analysis of brain artery trees. Ann. Appl. Stat. 10(1), 198 (2016)
Biscio, C., Møller, J.: The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications. arXiv preprint arXiv:1611.00630 (2016)
Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)
Campbell, R.A., Overmyer, K.A., Selzman, C.H., Sheridan, B.C., Wolberg, A.S.: Contributions of extravascular and intravascular cells to fibrin network formation, structure, and stability. Blood 114(23), 4886–4896 (2009)
Carlsson, G.: Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009)
Carlsson, G., Zomorodian, A., Collins, A., Guibas, L.J.: Persistence barcodes for shapes. Int. J. Shape Model. 11(2), 149–187 (2005)
Center, M.M., Jemal, A., Lortet-Tieulent, J., Ward, E., Ferlay, J., Brawley, O., Bray, F.: International variation in prostate cancer incidence and mortality rates. Eur. Urol. 61(6), 1079–1092 (2012)
Chazal, F., Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L.: Stochastic convergence of persistence landscapes and silhouettes. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, p. 474. ACM (2014)
Chen, Y.-C., Wang, D., Rinaldo, A., Wasserman, L.: Statistical analysis of persistence intensity functions. arXiv preprint arXiv:1510.02502 (2015)
Ciollaro, M., Genovese, C., Lei, J., Wasserman, L.: The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions. arXiv preprint arXiv:1408.1187 (2014)
Cisewski, J., Croft, R.A., Freeman, P.E., Genovese, C.R., Khandai, N., Ozbek, M., Wasserman, L.: Non-parametric 3D map of the intergalactic medium using the Lyman-alpha forest. Mon. Not. R. Astron. Soc. 440(3), 2599–2609 (2014)
Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete Comput. Geom. 37(1), 103–120 (2007)
Crawford, L., Monod, A., Chen, A.X., Mukherjee, S., Rabadán, R.: Functional data analysis using a topological summary statistic: the smooth Euler characteristic transform. arXiv preprint arXiv:1611.06818 (2016a)
Crawford, L., Monod, A., Chen, A.X., Mukherjee, S., Rabadán, R.: Topological summaries of tumor images improve prediction of disease free survival in glioblastoma multiforme. arXiv preprint arXiv:1611.06818 (2016b)
Degras, D.A.: Simultaneous confidence bands for nonparametric regression with functional data. Stat. Sin. 21, 1735–1765 (2011)
Edelsbrunner, H., Harer, J.: Persistent homology—a survey. Contemp. Math. 453, 257–282 (2008)
Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. Discrete Comput. Geom. 28(4), 511–533 (2002)
Edelsbrunner, H., Morozov, D.: Persistent homology: theory and practice. In: Proceedings of the European Congress of Mathematics, pp. 31–50 (2012)
Engers, R.: Reproducibility and reliability of tumor grading in urological neoplasms. World J. Urol. 25(6), 595–605 (2007)
Epstein, J.I., Egevad, L., Amin, M.B., Delahunt, B., Srigley, J.R., Humphrey, P.A., Committee, G.: The 2014 International Society of Urological Pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: definition of grading patterns and proposal for a new grading system. Am. J. Surg. Pathol. 40(2), 244–252 (2016)
Evans, S.M., Patabendi Bandarage, V., Kronborg, C., Earnest, A., Millar, J., Clouston, D.: Gleason group concordance between biopsy and radical prostatectomy specimens: a cohort study from Prostate Cancer Outcome Registry-Victoria. Prostate Int. 4(4), 145–151 (2016)
Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., Singh, A.: Confidence sets for persistence diagrams. Ann. Stat. 42(6), 2301–2339 (2014)
Fasy, B.T., Payne, S., Schenfish, A., Schupback, J., Stouffer, N.: Simulating prostate cancer slide scans (2018) (forthcoming)
Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., Bray, F.: Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 136(5), E359–E386 (2015)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer, New York (2006)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001)
Gameiro, M., Mischaikow, K., Kalies, W.: Topological characterization of spatial-temporal chaos. Phys. Rev. E 70(3), 035203 (2004)
Ghrist, R.: Barcodes: the persistent topology of data. Bull. Am. Math. Soc. 45(1), 61–75 (2008)
Ghrist, R.W.: Elementary Applied Topology. Createspace, Seattle (2014)
Goodman, M., Ward, K.C., Osunkoya, A.O., Datta, M.W., Luthringer, D., Young, A.N., Marks, K., Cohen, V., Kennedy, J.C., Haber, M.J., Amin, M.B.: Frequency and determinants of disagreement and error in gleason scores: a population-based study of prostate cancer. Prostate 72(13), 1389–1398 (2012)
Helpap, B., Kristiansen, G., Beer, M., Köllermann, J., Oehler, U., Pogrebniak, A., Fellbaum, C.: Improving the reproducibility of the gleason scores in small foci of prostate cancer—suggestion of diagnostic criteria for glandular fusion. Pathol. Oncol. Res. 18(3), 615–621 (2012)
Humphrey, P.A.: Gleason grading and prognostic factors in carcinoma of the prostate. Mod. Pathol. 17(3), 292–306 (2004)
Ieva, F., Paganoni, A., Pigoli, D., Vitelli, V.: Multivariate functional clustering for the analysis of ECG curves morphology. In: Cladag 2011 (8th International Meeting of the Classification and Data Analysis Group), pp. 1–4 (2011)
Jacques, J., Preda, C.: Functional data clustering: a survey. Adv. Data Anal. Classif. 8(3), 231–255 (2014)
Kerber, M., Morozov, D., Nigmetov, A.: Geometry helps to compare persistence diagrams. J. Exp. Algorithm. JEA 22(1), 1–4 (2017)
Khasawneh, F.A., Munch, E.: Exploring equilibria in stochastic delay differential equations using persistent homology. In: ASME 2014 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, pp. V008T11A034–V008T11A034. American Society of Mechanical Engineers (2014)
Khasawneh, F.A., Munch, E.: Chatter detection in turning using persistent homology. Mech. Syst. Signal Process. 70, 527–541 (2016)
Kosorok, M.R.: Introduction to Empirical Processes and Semiparametric Inference. Springer, New York (2007)
Lamprecht, M.R., Sabatini, D.M., Carpenter, A.E.: CellProfiler: free, versatile software for automated biological image analysis. Biotechniques 42(1), 71–75 (2007)
Lawson, P., Berry, E., Brown, J.Q., Fasy, B.T., Wenk, C.: Topological descriptors for quantitative prostate cancer morphology analysis. In: Conference on Digital Pathology, Part of SPIE Medical Imaging. Honorable Mention Poster Award (2017)
Lawson, P., Sholl, A.B., Brown, J.Q., Fasy, B.T., Wenk, C. Persistent Homology for the quantitative evaluation of architectural features in prostate cancer histology. Sci. Rep. 9(1), 1–15 (2019)
Lei, J., Rinaldo, A., Wasserman, L.: A conformal prediction approach to explore functional data. Ann. Math. Artif. Intell. 74(1–2), 29–43 (2015)
Li, B., Yu, Q.: Classification of functional data: a segmentation approach. Comput. Stat. Data Anal. 52(10), 4790–4800 (2008)
Ma, S., Yang, L., Carroll, R.J.: A simultaneous confidence band for sparse longitudinal regression. Stat. Sin. 22, 95–122 (2012)
Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Prob. 27(12), 124007 (2011)
Monod, A., Kališnik, S., Patiño-Galindo, J.Á., Crawford, L.: Tropical sufficient statistics for persistent homology with a parametric application to infectious viral disease. SIAM J. Appl. Algebra Geom. 3(2), 337–371 (2019)
Munkres, J.R.: Algebraic Topology. Prentice Hall, Upper Saddle River (1964)
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H., Hahn, U.: Global envelope tests for spatial processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(2), 381–404 (2017)
Obayashi, I., Hiraoka, Y., Kimura, M.: Persistence diagrams with linear machine learning models. J. Appl. Comput. Topol. 1(3–4), 421–449 (2018)
Padellini, T., Brutti, P.: (2017) Persistence flamelets: Multiscale persistent homology for kernel density exploration. arXiv preprint arXiv:1709.07097
Perea, J.A., Harer, J.: Sliding windows and persistence: an application of topological methods to signal analysis. Found. Comput. Math. 15(3), 799–838 (2015)
Pretorius, E., Vieira, W., Oberholzer, H., Auer, R.: Comparative scanning electron microscopy of platelets and fibrin networks of human and differents animals. Int. J. Morphol. 27(1), 69–76 (2009)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017)
Ramsay, J.O.: Functional Data Analysis. Wiley, New York (2006)
Robins, V.: Computational topology for point data: betti numbers of \(\alpha \)-shapes. In: Mecke, K.R., Stoyan, D. (eds.) Morphology of Condensed Matter, pp. 261–274. Springer, New York (2002)
Robinson, A., Turner, K.: Hypothesis testing for topological data analysis. J. Appl. Comput. Topol. 1(2), 241–261 (2017)
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7–9), 730–742 (2006)
Rubin, H.: Uniform convergence of random functions with applications to statistics. Ann. Math. Stat. 27(1), 200–203 (1956)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9(Mar), 371–421 (2008)
Singh, N., Couture, H.D., Marron, J.S., Perou, C., Niethammer, M.: Topological descriptors of histology images. In: International Workshop on Machine Learning in Medical Imaging, pp. 231–239. Springer (2014)
Sousbie, T., Pichon, C., Kawahara, H.: The persistent cosmic web and its filamentary structure–II. Illustrations. Mon. Not. R. Astron. Soc. 414(1), 384–403 (2011)
Tarpey, T., Kinateder, K.K.: Clustering functional data. J. Classif. 20(1), 093–114 (2003)
Topaz, C.M., Ziegelmeier, L., Halverson, T.: Topological data analysis of biological aggregation models. PLoS ONE 10(5), e0126383 (2015)
Truesdale, M.D., Cheetham, P.J., Turk, A.T., Sartori, S., Hruby, G.W., Dinneen, E.P., Benson, M.C., Badani, K.K.: Gleason score concordance on biopsy-confirmed prostate cancer: is pathological re-evaluation necessary prior to radical prostatectomy? BJU Int. 107(5), 749–754 (2011)
Truong, M., Slezak, J.A., Lin, C.P., Iremashvili, V., Sado, M., Razmaria, A.A., Leverson, G., Soloway, M.S., Eggener, S.E., Abel, E.J., Downs, T.M., Jarrard, D.F.: Development and multi-institutional validation of an upgrading risk tool for Gleason 6 prostate cancer. Cancer 119(22), 3992–4002 (2013)
Turner, K.: Means and medians of sets of persistence diagrams. arXiv preprint arXiv:1307.8300 (2013)
Turner, K., Mileyko, Y., Mukherjee, S., Harer, J.: Fréchet means for distributions of persistence diagrams. Discrete Comput. Geom. 52(1), 44–70 (2014)
Turner, K., Mukherjee, S., Boyer, D.M.: Persistent homology transform for modeling shapes and surfaces. Inf. Inference J. IMA 3(4), 310–344 (2014)
Van de Weygaert, R., Vegter, G., Edelsbrunner, H., Jones, B.J., Pranav, P., Park, C., Hellwing, W.A., Eldering, B., Kruithof, N., Bos, E., et al.: Alpha, betti and the megaparsec universe: on the topology of the cosmic web. In: Transactions on Computational Science XIV, pp. 60–101. Springer (2011)
Van der Vaart, A.W.: Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics), vol. 3. Cambridge University Press, Cambridge (2000)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005)
Wang, J.-L., Chiou, J.-M., Müller, H.-G.: Review of functional data analysis. Annu. Rev. Stat. Appl. 3, 257–295 (2016)
Wasserman, L.: All of Nonparametric Statistics. Springer, New York (2006)
Wasserman, L.: Topological data analysis. Annual Review of Statistics and Its Application 5, 501–532 (2016)
Worsley, K.J.: The geometry of random images. Chance 9(1), 27–40 (1996)
Yuan, K.-H.: A theorem on uniform convergence of stochastic functions with applications. J. Multivar. Anal. 62(1), 100–109 (1997)
Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, pp. 1953–1959 (2013)
Zomorodian, A.J.: Topology for Computing. Cambridge Monographs, vol. 16. Cambridge University Press, Cambridge (2005)
Acknowledgements
Berry acknowledges support from NSF DMS 1812055 and DMS 1945639. Berry and Fasy acknowledge supported by NIH and NSF under Grant Numbers NSF DMS 1664858 and NSF DMS 1557716, as well as the research teams at MSU and Tulane University working on the prostate cancer histology project. Chen is supported by NIH U01 AG016976 and NSF DMS 1810960. Cisewski-Kehe thanks the Yale Center for Research Computing for guidance and use of the research computing infrastructure. Cisewski-Kehe and Fasy acknowledge support from NSF under Grant Numbers DMS 1854220 and 1854336. Fasy is additionally supported by NSF CCF 1618605.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Proofs
The proofs from the propositions of Sect. 3 are presented below.
Proof of Proposition 1
This proof uses ideas from Rubin (1956) and Yuan (1997).
Let \(\epsilon >0\) be given and \({\widehat{F}}_n = {\widehat{F}} = \frac{1}{n}\sum _{i=1}^n F_i\). Because \({\mathcal {B}}_F\) is equicontinuous, the collection of differences
is also equicontinuous. Since \({\mathbb {T}}\) is compact, there exists a number \(M>0\) and points \(t_1,\ldots ,t_M\in {\mathbb {T}}\) such that
Namely, \(t_1,\ldots ,t_M\) forms a \(\epsilon /2\)-covering of \({\mathcal {B}}_{\varDelta }\).
Let \(\varDelta _n(t) = {\widehat{F}}(t)-{\bar{F}}(t)\), and note that \(\varDelta _n\in {\mathcal {B}}_{\varDelta }\). Then
By (5), every \(t\in {\mathbb {T}}\) satisfies \({{\mathbb {E}}}|F_i(t)|<\infty \) so by the strong law of large numbers,
for every \(j=1,\ldots ,M\) and this implies that for any \(\delta >0\), there exists \(N>0\) such that
for every \(j=1,\ldots ,M\).
As a result,
and the result follows. \(\square \)
Proof of Corollary 1
The case of persistence landscapes is just a special case of the generalized landscape so we will only prove the case of persistence intensity and generalized landscape. Note that Chazal et al. (2014) already show that persistence silhouettes with a power-weighted silhouette is 1-Lipschitz so the equicontinuity follows so we ignore this case.
Persistence intensity Given a persistence diagram \({\textsf {D}}\), the persistence intensity is the function
Thus, for any \((t_1,s_1)\) and \((t_2,s_2)\),
where \(L_0\) is the Lipschitz constant of the kernel function K, i.e., \(|K(x)-K(y)|\le L_0|x-y| \). The bound \(\sup _{a,b\in {\mathbb {T}}}\Vert \omega (a,b)\Vert L_0/h\) is uniform for all \((t_1,s_1)\) and \((t_2,s_2)\) so the corresponding persistence intensity functions form an equicontinuous family.
Generalized landscape Recall that generalized landscapes use the kth maximum value of \(\{{\widetilde{\varLambda }}_j(t; h):j =1,\ldots , |{\textsf {D}}|\}\), where
and \(y_j =|b_j-d_j|\) is the persistence, \(x_j = \frac{b_j+d_j}{2}\), and \(K_h(\cdot /h) = \frac{1}{h}K(\cdot /h)\). Therefore, for any \(t_1,t_2\),
where \(L_0\) is the Lipschitz constant of the kernel function K and \(\mathsf{diam}({\mathbb {T}})\) is the diameter of \({\mathbb {T}}\). Thus, for a given \(h>0\), \({\widetilde{\varLambda }}_j(t; h)\) is Lipschitz with a constant \(\frac{\mathsf{diam}({\mathbb {T}}) L_0}{K(0)h} \). Moreover, this analysis applies to every \(j=1,\ldots , |{\textsf {D}}|\) so the kth maximum value of \(\{{\widetilde{\varLambda }}_j(t; h):j =1,\ldots , |{\textsf {D}}|\}\) is still Lipschitz with the same constant. This implies that the generalized landscape forms an equicontinuous family. \(\square \)
Proof of Proposition 2
Note that by Eq. (5), the functional summary satisfies
Pointwise normality The first assertion \(\sqrt{n}({\widehat{F}}(t)-{\bar{F}}(t))\rightarrow N(0,\sigma ^2(t))\) follows from the usual central limit theorem because the variance is finite.
Normality of integrated difference To show the normality for the integrated difference, note that
where \({\bar{Y}}_n = \frac{1}{n}\sum _{i=1}^n Y_i\) and \(Y_1,\ldots ,Y_n\) are i.i.d. with mean \({{\mathbb {E}}}(Y_i) = 0\) and variance
Thus, by the usual central limit theorem again, we obtain the normality.
Convergence to a Gaussian process The proof of convergence toward a Gaussian process involves the concept of a Donsker class; see Section 2.2 of Kosorok (2007) for an introduction. Simply put, given i.i.d. random elements \(S_1,\ldots , S_n\in {\mathcal {S}}\) and a collection of mappings \(f_t: {\mathcal {S}}\mapsto {\mathbb {R}}\) with an index set \(t\in {\mathbb {T}}\), the collection \(\{f_t: t\in {\mathbb {T}}\}\) is called Donsker if the average function \({\bar{f}}(t) = \frac{1}{n}\sum _{i=1}^nf_t(S_i)\) converges to a tight Gaussian process in \({\mathcal {L}}_\infty \) metric. The assumption in Eq. (6) along with Theorem 2.5 in Kosorok (2007) implies that the class \({\mathcal {W}}\) is Donsker, which implies that the sample mean functional \({\widehat{F}}(t)\) converges to a Gaussian process. \(\square \)
Proof of of Proposition 3
Because Eq. (6) implies that the class \({\mathcal {W}}\) is Donsker, this proposition is a well-known result in the empirical process theory. For instance, see the discussion on page 21 of Kosorok (2007). Here we briefly highlight the basic idea.
By Proposition 3 and the continuous mapping theorem [see, e.g., page 16 of Kosorok (2007)],
In the case of the bootstrap, using the same proposition but now apply it to the bootstrap version, we obtain
where \(\overset{D|F^{\otimes n}}{\rightarrow }\) denotes convergences in distribution given \(F_1,\ldots ,F_n\).
Therefore, the bootstrap quantile converges to the corresponding quantile of the original difference. \(\square \)
Proof of Proposition 4
This proof consists of two parts. In the first part, we prove that \({\widehat{q}}_\gamma \overset{P}{\rightarrow } q_\gamma \). In the second part, we prove the desired result.
Part 1 Given \(F_1,\ldots ,F_n\), we define
to be the empirical version of Q(t) and define
to be the corresponding \(\gamma \) quantile.
Because \({\widehat{Q}}\) is just the empirical distribution function (EDF) of \(Y_1,\ldots ,Y_n\) where \(Y_i = d(F_i,{\bar{F}})\) and Q(t) is the cumulative distribution function (CDF) of \(Y_i\),
By the mean value theorem and the fact that \({\widehat{Q}}({\widetilde{q}}_\gamma ) -Q({\widetilde{q}}_\gamma ) = o_P(1)\),
for some \(q_\gamma ^* \in [q_\gamma , {\widetilde{q}}_\gamma ]\). By assumption, \(Q'(q_\gamma ^*)\ge q_0>0\) so
Now, because \(\max {i=1,\ldots ,n}|d(F_i,{\bar{F}})-d(F_i,{\widehat{F}})| \le d({\bar{F}},{\widehat{F}})\), \({\widetilde{q}}_\gamma \), the quantile of \(d(F_1,{\bar{F}}),\ldots , d(F_n,{\bar{F}})\), and \({\widehat{q}}_\gamma \), the quantile of \(d(F_1,{\widehat{F}}),\ldots , d(F_n,{\widehat{F}})\), are bounded by
which implies
Part 2 Let
be the probability of interest. By the triangular inequality,
Thus,
Thus,
which completes the proof. \(\square \)
B STIX simulation results
For completeness, results for the STIX simulation study from Sect. 4.2.1 are displayed for \(t_2 = 5, 5.25, 5.5, 5.75, 6, 7\). Figures 16 and 17 include the 0th and 1st homology dimensions, respectively, for the landscape and generalized landscape functions for their first 10 function orders; a subset of the results for the 1st homology dimension are displayed in Fig. 12 and discussed in the main text. Note that the results for the setting with \(t_2 = 8\) are not displayed because the permutation p values for all function orders for the 1st homology dimension achieve the minimum possible p value, and most of the function orders for the 0th homology dimension achieve this minimum as well.
Similarly, the results for the other functional summaries and the concatenated orders of the landscape and generalized landscape functions for \(t_2 = 5, 5.25, 5.5, 5.75, 6, 7\) are displayed in Figs. 18 and 19 for the 0th and 1st homology dimensions, respectively. A subset of the results for the 1st homology dimension are displayed in Fig. 13 and discussed in the main text. Note that the results for the setting with \(t_2 = 8\) are not displayed because the permutation p values for all functional summaries for the 1st homology dimension achieve the minimum possible p value, and all functional summaries (except the PDT for the 0th homology dimension) achieve this minimum as well.
C Alternative generalized landscape function kernel results
1.1 C.1 Simulated Gleason data
Figure 20 displays classification test error results for generalized landscape functions using an alternative kernel function (the tricubic kernel) to the one considered in Sect. 4.1 (the triangle kernel). The ratio of the classification test errors are plot with the tricubic kernel in the numerator and the corresponding triangle kernel plotted in the denominator. The ratios vary between about 0.98 and 1.02 suggesting that the kernel choice is not strongly affecting the results in this setting.
1.2 C.2 Pick-up sticks simulation data
Figure 21 displays the two-sample hypothesis test results for generalized landscape functions using an alternative kernel function (the tricubic kernel) for the setting with samples drawn from the null population, \({\mathcal {P}}_{F,1}\), with an average thickness of \(t_1 = 5\) and an alternative hypothesis with an average thicknesses of \(t_2 = 5.75\). Additionally, a comparison between the median p values using the tricubic kernel compared to the triangle kernal considered in Sect. 4.2.1. There appears to be more variability in these results compared to what we saw in “Appendix C.1”; however, the p values used in this setting tend to be smaller values than the classification test error values so a larger variation may imply only a small magnitude difference. For \(H_0\), the largest magnitude difference in p values (across all function orders and thicknesses considered) is 0.089 (in the setting with \(t_2 = 5\), function order 3, bandwidth of 0.60), with an interquartile range of 0 to 0.014. For \(H_1\), the largest magnitude difference in p values (across all function orders and thicknesses considered) is 0.065 (in the setting with \(t_2 = 5\), function order 2, bandwidth of 0.40), with an interquartile range of 0 to 0.006. There does not appear to be a discernible pattern in the variations between the tricubic and triangle kernel function results.
Rights and permissions
About this article
Cite this article
Berry, E., Chen, YC., Cisewski-Kehe, J. et al. Functional summaries of persistence diagrams. J Appl. and Comput. Topology 4, 211–262 (2020). https://doi.org/10.1007/s41468-020-00048-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41468-020-00048-w