Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Statistical analysis and parameter selection for mapper

Published: 01 January 2018 Publication History

Abstract

In this article, we study the question of the statistical convergence of the 1-dimensional Mapper to its continuous analogue, the Reeb graph. We show that the Mapper is an optimal estimator of the Reeb graph, which gives, as a byproduct, a method to automatically tune its parameters and compute confidence regions on its topological features, such as its loops and ares. This allows to circumvent the issue of testing a large grid of parameters and keeping the most stable ones in the brute-force setting, which is widely used in visualization, clustering and feature selection with the Mapper.

References

[1]
Aravindakshan Babu. Zigzag Coarsenings, Mapper Stability and Gene-network Analyses. PhD Thesis, 2013.
[2]
Ulrich Bauer, Xiaoyin Ge, and Yusu Wang. Measuring Distance Between Reeb Graphs. In Proceedings of the 30th Symposium on Computational Geometry, pages 464-473, 2014.
[3]
Ulrich Bauer, Elizabeth Munch, and Yusu Wang. Strong Equivalence of the Interleaving and Functional Distortion Metrics for Reeb Graphs. In Proceedings of the 31st Symposium on Computational Geometry, 2015.
[4]
Silvia Biasotti, Daniela Giorgi, Michela Spagnuolo, and Bianca Falcidieno. Reeb Graphs for Shape Analysis and Applications. Theoretical Computer Science, 392:5-22, 2008.
[5]
Gérard Biau and André Mas. PCA-Kernel estimation. Statistics and Risk Modeling with Applications in Finance and Insurance, 29(1):19-46, 2012.
[6]
Gilles Blanchard, Olivier Bousquet, and Laurent Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66(2-3):259-294, 2007.
[7]
Mickaël Buchet, Frédéric Chazal, Steve Oudot, and Donald Sheehy. Efficient and Robust Persistent Homology for Measures. In Proceedings of the 26th Symposium on Discrete Algorithms, pages 168-180, 2015.
[8]
Mathieu Carrière. Cover complex. In GUDHI User and Reference Manual. GUDHI Editorial Board, 2017. URL http://gudhi.gforge.inria.fr/doc/latest/groupícoverícomplex.html.
[9]
Mathieu Carrière and Steve Oudot. Local Equivalence and Induced Metrics for Reeb Graphs. In Proceedings of the 33rd Symposium on Computational Geometry, 2017a.
[10]
Mathieu Carrière and Steve Oudot. Structure and Stability of the 1-Dimensional Mapper. Foundations of Computational Mathematics, 2017b.
[11]
Frédéric Chazal, David Cohen-Steiner, and Quentin Mérigot. Geometric Inference for Probability Measures. Foundations of Computational Mathematics, 11(6):733-751, 2011.
[12]
Frédéric Chazal, Brittany Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman. On the Bootstrap for Persistence Diagrams and Landscapes. Modeling and Analysis of Information Systems, 20(6):111-120, 2013.
[13]
Frédéric Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Robust topological inference: distance to a measure and kernel distance. CoRR, abs/1412.7197, 2014. Accepted for publication in Journal of Machine Learning Research.
[14]
Frédéric Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Subsampling Methods for Persistent Homology. In Proceedings of the 32nd International Conference on Machine Learning, pages 2143-2151, 2015a.
[15]
Frédéric Chazal, Marc Glisse, Catherine Labruére, and Bertrand Michel. Convergence rates for persistence diagram estimation in topological data analysis. Journal of Machine Learning Research, 16:3603-3635, 2015b.
[16]
Frédéric Chazal, Vin de Silva, Marc Glisse, and Steve Oudot. The Structure and Stability of Persistence Modules. Springer, 2016a.
[17]
Frédéric Chazal, Pascal Massart, and Bertrand Michel. Rates of convergence for robust geometric inference. Electronic Journal of Statistics, 10(2):2243-2286, 2016b.
[18]
X. Chen, A. Golovinskiy, and T. Funkhouser. A Benchmark for 3D Mesh Segmentation. ACM Transactions on Graphics, 28(3):1-12, 2009.
[19]
David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of Persistence Diagrams. Discrete and Computational Geometry, 37(1):103-120, 2007.
[20]
David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Extending persistence using Poincaré and Lefschetz duality. Foundation of Computational Mathematics, 9(1):79-103, 2009.
[21]
Antonio Cuevas. Set estimation: another bridge between statistics and geometry. Boletín de Estadística e Investigación Operativa, 25(2):71-85, 2009.
[22]
Antonio Cuevas and Alberto Rodríguez-Casal. On boundary estimation. Advances in Applied Probability, pages 340-354, 2004.
[23]
Vin de Silva, Elizabeth Munch, and Amit Patel. Categorified Reeb Graphs. Discrete and Computational Geometry, 55:854-906, 2016.
[24]
Ronald DeVore and George Lorentz. Constructive approximation, volume 303. Springer Science & Business Media, 1993.
[25]
Tamal Dey and Yusu Wang. Reeb Graphs: Approximation and Persistence. Discrete and Computational Geometry, 49(1):46-73, 2013.
[26]
Tamal Dey, Facundo Mémoli, and Yusu Wang. Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers. In Proceedings of the 33rd Symposium on Computational Geometry, volume 77, pages 36:1-36:16, 2017.
[27]
Barbara di Fabio and Claudia Landi. The Edit Distance for Reeb Graphs of Surfaces. Discrete and Computational Geometry, 55(2):423-461, 2016.
[28]
Herbert Edelsbrunner and John Harer. Computational Topology: an introduction. AMS Bookstore, 2010.
[29]
Brittany Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. Confidence Sets for Persistence Diagrams. The Annals of Statistics, 42(6):2301-2339, 2014.
[30]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The Elements of Statistical Learning. Springer series in statistics Springer, Berlin, 2001.
[31]
Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Manifold estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics, 40:941-963, 2012a.
[32]
Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Minimax Manifold Estimation. Journal of Machine Learning Research, 13:1263-1291, 2012b.
[33]
TS. Hinks, X. Zhou, KJ. Staples, BD. Dimitrov, A. Manta, T. Petrossian, P. Lum, CG. Smith, JA. Ward, PH Howarth, AF. Walls, SD. Gadola, and R. Djukanovic. Innate and adaptive t cells in asthmatic patients: Relationship to severity and disease mechanisms. Journal of Allergy and Clinical Immunology, 136(2):323-333, 2015.
[34]
P. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson. Extracting insights from the shape of complex data using topology. Scientific Reports, 3, 2013.
[35]
Dmitriy Morozov. Homological Illusions of Persistence and Stability. Ph.D. dissertation, Department of Computer Science, Duke University, 2008.
[36]
Elizabeth Munch and Bei Wang. Convergence between Categorical Representations of Reeb Space and Mapper. In Proceedings of the 32nd Symposium on Computational Geometry, volume 51, pages 53:1-53:16, 2016.
[37]
Sameer Nene, Shree Nayar, and Hiroshi Murase. Columbia Object Image Library (COIL- 100). Technical Report CUCS-006-96, 1996.
[38]
Jessica Nielson, Jesse Paquette, Aiwen Liu, Cristian Guandique, Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John Gensel, Jennifer Kloke, Tanya Petrossian, Pek Lum, Gunnar Carlsson, Geo_rey Manley, Wise Young, Michael Beattie, Jacqueline Bresnahan, and Adam Ferguson. Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury. Nature Communications, 6, 2015.
[39]
Steve Oudot. Persistence Theory: From Quiver Representations to Data Analysis. Number 209 in Mathematical Surveys and Monographs. American Mathematical Society, 2015.
[40]
Gerald Reaven and Rupert Miller. An attempt to define the nature of chemical diabetes using a multidimensional analysis. Diabetologia, 16(1):17-24, 1979.
[41]
Georges Reeb. Sur les points singuliers d'une forme de Pfaff complètement intégrable ou d'une fonction numérique. Compte Rendu de l'Académie des Science de Paris, 222:847- 849, 1946.
[42]
Matteo Rucco, Emanuela Merelli, Damir Herman, Devi Ramanan, Tanya Petrossian, Lorenzo Falsetti, Cinzia Nitti, and Aldo Salvi. Using topological data analysis for diagnosis pulmonary embolism. Journal of Theoretical and Applied Computer Science, 9 (1):41-55, 2015.
[43]
G. Sarikonda, J. Pettus, S. Phatak, S. Sachithanantham, JF. Miller, JD. Wesley, E. Cadag, J. Chae, L. Ganesan, R. Mallios, S. Edelman, B. Peters, and M. von Herrath. Cd8 t-cell reactivity to islet antigens is unique to type 1 while cd4 t-cell reactivity exists in both type 1 and type 2 diabetes. Journal of Autoimmunity, 50:77-82, 2014.
[44]
John Shawe-Taylor, Christopher Williams, Nello Cristianini, and Jaz Kandola. On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Transactions on Information Theory, 51(7):2510-2522, 2005.
[45]
Gurjeet Singh, Facundo Mémoli, and Gunnar Carlsson. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In Symposium on Point Based Graphics, pages 91-100, 2007.
[46]
Yuan Yao, Jian Sun, Xuhui Huang, Greg Bowman, Gurjeet Singh, Michael Lesnick, Leonidas Guibas, Vijay Pande, and Gunnar Carlsson. Topological methods for exploring low-density states in biomolecular folding pathways. Journal of Chemical Physics, 130 (14), 2009.
[47]
Bin Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam, pages 423-435. Springer, 1997.

Cited By

View all
  • (2024)Topological Interpretability for Deep LearningProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659935(1-11)Online publication date: 3-Jun-2024
  • (2020)ShapeVis: High-dimensional Data Visualization at ScaleProceedings of The Web Conference 202010.1145/3366423.3380058(2920-2926)Online publication date: 20-Apr-2020
  1. Statistical analysis and parameter selection for mapper

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 19, Issue 1
    January 2018
    3249 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Revised: 01 May 2018
    Published: 01 January 2018
    Published in JMLR Volume 19, Issue 1

    Author Tags

    1. confidence regions
    2. extended persistence
    3. mapper
    4. parameter selection
    5. topological data analysis

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Topological Interpretability for Deep LearningProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659935(1-11)Online publication date: 3-Jun-2024
    • (2020)ShapeVis: High-dimensional Data Visualization at ScaleProceedings of The Web Conference 202010.1145/3366423.3380058(2920-2926)Online publication date: 20-Apr-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media