Article

Persistent homology: an introduction and a new text representation for natural language processing

Author:

Xiaojin ZhuAuthors Info & Claims

IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Pages 1953 - 1959

Published: 03 August 2013 Publication History

Abstract

Persistent homology is a mathematical tool from topological data analysis. It performs multi-scale analysis on a set of points and identifies clusters, holes, and voids therein. These latter topological structures complement standard feature representations, making persistent homology an attractive feature extractor for artificial intelligence. Research on persistent homology for AI is in its infancy, and is currently hindered by two issues: the lack of an accessible introduction to AI researchers, and the paucity of applications. In response, the first part of this paper presents a tutorial on persistent homology specifically aimed at a broader audience without sacrificing mathematical rigor. The second part contains one of the first applications of persistent homology to natural language processing. Specifically, our Similarity Filtration with Time Skeleton (SIFTS) algorithm identifies holes that can be interpreted as semantic "tie-backs" in a text document, providing a new document structure representation. We illustrate our algorithm on documents ranging from nursery rhymes to novels, and on a corpus with child and adolescent writings.

References

[1]

Sivaraman Balakrishnan, Alessandro Rinaldo, Don Sheehy, Aarti Singh, and Larry A. Wasserman. Minimax rates for homology inference. In The fifteenth international conference on Artificial Intelligence and Statistics (AISTATS), pages 64-72, 2012.

[2]

Sivaraman Balakrishnan, Brittany Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman. Statistical inference for persistent homology. In arXiv:1303.7117, 2013.

[3]

Gunnar Carlsson. Topology and data. Bulletin (New Series) of the American Mathematical Society, 46(2):255-308, 2009.

[4]

Moo K. Chung, Peter Bubenik, Peter T. Kim, Kim M. Dalton, and Richard J. Davidson. Persistence diagrams of cortical surface data. In Information Processing in Medical Imaging, pages 386-397, 2009.

[5]

Vin de Silva and Robert Ghrist. Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7:339-358, 2007.

[6]

Vin de Silva and Robert Ghrist. Homological sensor networks. Notices of the American Mathematical Society, 54, 2007.

[7]

H. Edelsbrunner and J. Harer. Persistent homology -- a survey. In Twenty Years After, eds. J. E. Goodman, J. Pach and R. Pollack, AMS., 2007.

[8]

H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. Applied mathematics. Amer Mathematical Society, 2010.

[9]

Daniel Freedman and Chao Chen. Algebraic topology for computer vision. In Sota R. Yoshida, editor, Computer Vision, chapter 5, pages 239-268. Nova Science Pub. Inc., 2011.

[10]

Jennifer Gamble and Giseon Heo. Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Analysis, 101(9):2184-2199, 2010.

[11]

P. Giblin. Graphs, Surfaces and Homology. Cambridge University Press, 2010.

[12]

Alan Gous. Spherical subfamily models. Technical report, 1999.

[13]

Keith Hall and Thomas Hofmann. Learning curved multinomial subfamilies for natural language processing and information retrieval. In ICML, pages 351-358, 2000.

[14]

Allen Hatcher. Algebraic Topology. Cambridge University Press, first edition, December 2001.

[15]

Peter M. Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, and Vijay S. Pande. Persistent voids: a new structural metric for membrane fusion. Bioinformatics, 23(14):1753-1759, 2007.

[16]

Guy Lebanon, Yi Mao, and Joshua V. Dillon. The locally weighted bag of words framework for document representation. Journal of Machine Learning Research, 8:2405-2441, 2007.

[17]

Guy Lebanon. Sequential document representations and simplicial curves. In UAI. AUAI Press, 2006.

[18]

Günter Rote and Gert Vegter. Computational topology: an introduction. In Jean-Daniel Boissonnat and Monique Teillaud, editors, Effective Computational Geometry for Curves and Surfaces, Mathematics and Visualization, chapter 7, pages 277-312. Springer-Verlag, 2006.

[19]

G. Salton, editor. The SMART Retrieval System Experiments in Automatic Document Processing. Englewood Cliffs: Prentice-Hall, 1971.

[20]

Geoffrey R. Sampson. The structure of children's writing: moving from spoken to adult written norms. In S. Granger and S. Petch-Tyson, editors, Extending the Scope of Corpus-Based Research, pages 177-93. Rodopi, 2003. http://www.grsampson.net/RLucy.html.

[21]

Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L. Ringach. Topological analysis of population activity in visual cortex. J. Vis., 8(8):1-18, 6 2008.

[22]

Andrew Tausz, Mikael Vejdemo-Johansson, and Henry Adams. Javaplex: A research software package for persistent (co)homology. Software available at http://code.google.com/javaplex, 2011.

[23]

Afra Joze Zomorodian. Computing and comprehending topology: persistence and hierarchical Morse complexes. PhD thesis, University of Illinois at Urbana-Champaign, 2001.

Cited By

Singh RMalott NSauerwein BMcgrogan NWilsey P(2023)Generating High Dimensional Test Data for Topological Data AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_2(18-37)Online publication date: 3-Dec-2023
https://dl.acm.org/doi/10.1007/978-981-97-0316-6_2
Gasarch WFasy BWang B(2017)Open Problems in Computational TopologyACM SIGACT News10.1145/3138860.313886748:3(32-36)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3138860.3138867
Sami IFarrahi KLiu QLienhart RWang HChen SBoll SChen PFriedland GLi JYan S(2017)A Simplified Topological Representation of Text for Local and Global ContextProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123330(1451-1456)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3123266.3123330
Show More Cited By

Recommendations

Computing Persistent Homology

We show that the persistent homology of a filtered d-dimensional simplicial complex is simply the standard homology of a particular graded module over a polynomial ring. Our analysis establishes the existence of a simple description of persistent homology ...
Persistent Intersection Homology

The theory of intersection homology was developed to study the singularities of a topologically stratified space. This paper incorporates this theory into the already developed framework of persistent homology. We demonstrate that persistent ...
Distributing Persistent Homology via Spectral Sequences
Abstract
We set up the theory for a distributed algorithm for computing persistent homology. For this purpose we develop linear algebra of persistence modules. We present bases of persistence modules, together with an operation $⊞$ that leads to a method for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

August 2013

3266 pages

ISBN:9781577356332

Editor:
Francesca Rossi
University of Padova

Sponsors

The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

Publisher

AAAI Press

Publication History

Published: 03 August 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
11
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh RMalott NSauerwein BMcgrogan NWilsey P(2023)Generating High Dimensional Test Data for Topological Data AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_2(18-37)Online publication date: 3-Dec-2023
https://dl.acm.org/doi/10.1007/978-981-97-0316-6_2
Gasarch WFasy BWang B(2017)Open Problems in Computational TopologyACM SIGACT News10.1145/3138860.313886748:3(32-36)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3138860.3138867
Sami IFarrahi KLiu QLienhart RWang HChen SBoll SChen PFriedland GLi JYan S(2017)A Simplified Topological Representation of Text for Local and Global ContextProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123330(1451-1456)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3123266.3123330
Zhu XVartanian ABansal MNguyen DBrandl L(2016)Stochastic multiresolution persistent homology kernelProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060964(2449-2455)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060832.3060964

View Options

View options

Media

Figures

Other

Tables

View Table of Contents