Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3366423.3380058acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

ShapeVis: High-dimensional Data Visualization at Scale

Published: 20 April 2020 Publication History

Abstract

We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization.

References

[1]
Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15, 6 (2003), 1373–1396.
[2]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8(2013), 1798–1828.
[3]
Silvia Biasotti, Daniela Giorgi, Michela Spagnuolo, and Bianca Falcidieno. 2008. Reeb graphs for shape analysis and applications. Theoretical computer science 392, 1-3 (2008), 5–22.
[4]
Mickaël Buchet, Yasuaki Hiraoka, and Ippei Obayashi. 2018. Persistent homology and materials informatics. In Nanoinformatics. Springer, Singapore, 75–95.
[5]
Pablo G Cámara. 2017. Topological methods for genomics: present and future directions. Current opinion in systems biology 1 (2017), 95–101.
[6]
Pablo G Camara, Daniel IS Rosenbloom, Kevin J Emmett, Arnold J Levine, and Raul Rabadan. 2016. Topological data analysis generates high-resolution, genome-wide maps of human recombination. Cell systems 3, 1 (2016), 83–94.
[7]
Gunnar Carlsson, Tigran Ishkhanov, Vin De Silva, and Afra Zomorodian. 2008. On the local behavior of spaces of natural images. International journal of computer vision 76, 1 (2008), 1–12.
[8]
Mathieu Carriere, Bertrand Michel, and Steve Oudot. 2018. Statistical analysis and parameter selection for Mapper. The Journal of Machine Learning Research 19, 1 (2018), 478–516.
[9]
Corrie J Carstens and Kathy J Horadam. 2013. Persistent homology of collaboration networks. Mathematical problems in engineering 2013 (2013).
[10]
Anne Collins, Afra Zomorodian, Gunnar Carlsson, and Leonidas J Guibas. 2004. A barcode shape descriptor for curve point cloud data. Computers & Graphics 28, 6 (2004), 881–894.
[11]
Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Delaunay triangulations: height interpolation. Computational Geometry: Algorithms and Applications (2008), 191–218.
[12]
Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, and Alessandro Provetti. 2011. Generalized louvain method for community detection in large networks. In 2011 11th International Conference on Intelligent Systems Design and Applications. IEEE, 88–93.
[13]
Vin de Silva and Gunnar Carlsson. 2004. Topological estimation using witness complexes. In Proceedings of the First Eurographics conference on Point-Based Graphics. Eurographics Association, 157–166. https://doi.org/10.2312/SPBG/SPBG04/157-166
[14]
Vin de Silva and Robert Ghrist. 2007. Coverage in sensor networks via persistent homology. Algebr. Geom. Topol. 7, 1 (2007), 339–358. https://doi.org/10.2140/agt.2007.7.339
[15]
Bishal Deb, Ankita Sarkar, Nupur Kumari, Akash Rupela, Piyush Gupta, and Balaji Krishnamurthy. 2018. Multimapper: Data Density Sensitive Topological Visualization. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 1054–1061.
[16]
Marian Gidea. 2017. Topological data analysis of critical transitions in financial networks. In International Conference and School on Network Science. Springer, 47–59.
[17]
Leonidas J Guibas and Steve Y Oudot. 2008. Reconstruction using witness complexes. Discrete & computational geometry 40, 3 (2008), 325–356.
[18]
Allen Hatcher. 2002. Algebraic topology. Cambridge University Press, Cambridge. xii+544 pages.
[19]
Geoffrey E Hinton and Sam T Roweis. 2003. Stochastic neighbor embedding. In Advances in neural information processing systems. 857–864.
[20]
Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, and Giseon Heo. 2016. Using persistent homology and dynamical distances to analyze protein binding. Statistical applications in genetics and molecular biology 15, 1(2016), 19–38.
[21]
Johannes F Kruiger, Paulo E Rauber, Rafael M Martins, Andreas Kerren, Stephen Kobourov, and Alexandru C Telea. 2017. Graph Layouts by t-SNE. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 283–294.
[22]
Joseph B Kruskal and Myron Wish. 1978. Multidimensional scaling. Vol. 11. Sage.
[23]
Yan Lecun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition.
[24]
Hyekyoung Lee, Moo K Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee. 2011. Discriminative persistent homology of brain networks. In 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, 841–844.
[25]
John Aldo Lee and Michel Verleysen. 2005. Nonlinear dimensionality reduction of data manifolds with essential loops. Neurocomputing 67(2005), 29–53.
[26]
Yongjin Lee, Senja D Barthel, Paweł Dłotko, Seyed Mohamad Moosavi, Kathryn Hess, and Berend Smit. 2018. High-throughput screening approach for nanoporous materials genome using topological data analysis: application to zeolites. Journal of chemical theory and computation 14, 8 (2018), 4427–4437.
[27]
Ying-Ke Lei, Zhu-Hong You, Tianbao Dong, Yun-Xiao Jiang, and Jun-An Yang. 2013. Increasing reliability of protein interactome by fast manifold embedding. Pattern Recognition Letters 34, 4 (2013), 372–379.
[28]
David Letscher and Jason Fritts. 2007. Image segmentation using topological persistence. In International Conference on Computer Analysis of Images and Patterns. Springer, 587–595.
[29]
Dong Liang, Chen Qiao, and Zongben Xu. 2015. Enhancing both efficiency and representational capability of isomap by extensive landmark selection. Mathematical Problems in Engineering 2015 (2015).
[30]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.
[31]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426(2018).
[32]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
[33]
James Munkres. 2014. Topology. Pearson Education.
[34]
Monica Nicolau, Arnold J Levine, and Gunnar Carlsson. 2011. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108, 17(2011), 7265–7270.
[35]
Alice Patania, Giovanni Petri, and Francesco Vaccarino. 2017. The shape of collaborations. EPJ Data Science 6, 1 (2017), 18.
[36]
Nicola Pezzotti, Thomas Höllt, B Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2016. Hierarchical stochastic neighbor embedding. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 21–30.
[37]
Ashutosh Saxena, Abhinav Gupta, and Amitabha Mukerjee. 2004. Non-linear dimensionality reduction by locally linear isomaps. In International Conference on Neural Information Processing. Springer, 1038–1043.
[38]
Michael Sedlmair, Tamara Munzner, and Melanie Tory. 2013. Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices.IEEE Trans. Vis. Comput. Graph. 19, 12 (2013), 2634–2643. http://dblp.uni-trier.de/db/journals/tvcg/tvcg19.html#SedlmairMT13
[39]
Lee M Seversky, Shelby Davis, and Matthew Berger. 2016. On time-series topological data analysis: New data and opportunities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 59–67.
[40]
Hao Shi, Baoqun Yin, Yu Kang, Chao Shao, and Jie Gui. 2017. Robust l-Isomap with a novel landmark selection method. Mathematical Problems in Engineering 2017 (2017).
[41]
Vin D Silva and Joshua B Tenenbaum. 2003. Global versus local methods in nonlinear dimensionality reduction. In Advances in neural information processing systems. 721–728.
[42]
Gurjeet Singh, Facundo Mémoli, and Gunnar E Carlsson. 2007. Topological methods for the analysis of high dimensional data sets and 3d object recognition. In SPBG. 91–100.
[43]
Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 287–297.
[44]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1067–1077.
[45]
Vincent A Traag, Ludo Waltman, and Nees Jan van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports 9(2019).
[46]
Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research 15, 1 (2014), 3221–3245.
[47]
Martin Wattenberg, Fernanda Viégas, and Ian Johnson. 2016. How to Use t-SNE Effectively. Distill (2016). https://doi.org/10.23915/distill.00002
[48]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems 2, 1-3 (1987), 37–52.
[49]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
[50]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.

Cited By

View all
  • (2020)GPU-Embedding of kNN-Graph Representing Large and High-Dimensional DataComputational Science – ICCS 202010.1007/978-3-030-50417-5_24(322-336)Online publication date: 15-Jun-2020

Index Terms

  1. ShapeVis: High-dimensional Data Visualization at Scale
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Proceedings of The Web Conference 2020
          April 2020
          3143 pages
          ISBN:9781450370233
          DOI:10.1145/3366423
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. manifold learning
          2. topological data analysis
          3. visualization

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)24
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 16 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2020)GPU-Embedding of kNN-Graph Representing Large and High-Dimensional DataComputational Science – ICCS 202010.1007/978-3-030-50417-5_24(322-336)Online publication date: 15-Jun-2020

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media