Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2540128.2540408guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Persistent homology: an introduction and a new text representation for natural language processing

Published: 03 August 2013 Publication History

Abstract

Persistent homology is a mathematical tool from topological data analysis. It performs multi-scale analysis on a set of points and identifies clusters, holes, and voids therein. These latter topological structures complement standard feature representations, making persistent homology an attractive feature extractor for artificial intelligence. Research on persistent homology for AI is in its infancy, and is currently hindered by two issues: the lack of an accessible introduction to AI researchers, and the paucity of applications. In response, the first part of this paper presents a tutorial on persistent homology specifically aimed at a broader audience without sacrificing mathematical rigor. The second part contains one of the first applications of persistent homology to natural language processing. Specifically, our Similarity Filtration with Time Skeleton (SIFTS) algorithm identifies holes that can be interpreted as semantic "tie-backs" in a text document, providing a new document structure representation. We illustrate our algorithm on documents ranging from nursery rhymes to novels, and on a corpus with child and adolescent writings.

References

[1]
Sivaraman Balakrishnan, Alessandro Rinaldo, Don Sheehy, Aarti Singh, and Larry A. Wasserman. Minimax rates for homology inference. In The fifteenth international conference on Artificial Intelligence and Statistics (AISTATS), pages 64-72, 2012.
[2]
Sivaraman Balakrishnan, Brittany Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman. Statistical inference for persistent homology. In arXiv:1303.7117, 2013.
[3]
Gunnar Carlsson. Topology and data. Bulletin (New Series) of the American Mathematical Society, 46(2):255-308, 2009.
[4]
Moo K. Chung, Peter Bubenik, Peter T. Kim, Kim M. Dalton, and Richard J. Davidson. Persistence diagrams of cortical surface data. In Information Processing in Medical Imaging, pages 386-397, 2009.
[5]
Vin de Silva and Robert Ghrist. Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7:339-358, 2007.
[6]
Vin de Silva and Robert Ghrist. Homological sensor networks. Notices of the American Mathematical Society, 54, 2007.
[7]
H. Edelsbrunner and J. Harer. Persistent homology -- a survey. In Twenty Years After, eds. J. E. Goodman, J. Pach and R. Pollack, AMS., 2007.
[8]
H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. Applied mathematics. Amer Mathematical Society, 2010.
[9]
Daniel Freedman and Chao Chen. Algebraic topology for computer vision. In Sota R. Yoshida, editor, Computer Vision, chapter 5, pages 239-268. Nova Science Pub. Inc., 2011.
[10]
Jennifer Gamble and Giseon Heo. Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Analysis, 101(9):2184-2199, 2010.
[11]
P. Giblin. Graphs, Surfaces and Homology. Cambridge University Press, 2010.
[12]
Alan Gous. Spherical subfamily models. Technical report, 1999.
[13]
Keith Hall and Thomas Hofmann. Learning curved multinomial subfamilies for natural language processing and information retrieval. In ICML, pages 351-358, 2000.
[14]
Allen Hatcher. Algebraic Topology. Cambridge University Press, first edition, December 2001.
[15]
Peter M. Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, and Vijay S. Pande. Persistent voids: a new structural metric for membrane fusion. Bioinformatics, 23(14):1753-1759, 2007.
[16]
Guy Lebanon, Yi Mao, and Joshua V. Dillon. The locally weighted bag of words framework for document representation. Journal of Machine Learning Research, 8:2405-2441, 2007.
[17]
Guy Lebanon. Sequential document representations and simplicial curves. In UAI. AUAI Press, 2006.
[18]
Günter Rote and Gert Vegter. Computational topology: an introduction. In Jean-Daniel Boissonnat and Monique Teillaud, editors, Effective Computational Geometry for Curves and Surfaces, Mathematics and Visualization, chapter 7, pages 277-312. Springer-Verlag, 2006.
[19]
G. Salton, editor. The SMART Retrieval System Experiments in Automatic Document Processing. Englewood Cliffs: Prentice-Hall, 1971.
[20]
Geoffrey R. Sampson. The structure of children's writing: moving from spoken to adult written norms. In S. Granger and S. Petch-Tyson, editors, Extending the Scope of Corpus-Based Research, pages 177-93. Rodopi, 2003. http://www.grsampson.net/RLucy.html.
[21]
Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L. Ringach. Topological analysis of population activity in visual cortex. J. Vis., 8(8):1-18, 6 2008.
[22]
Andrew Tausz, Mikael Vejdemo-Johansson, and Henry Adams. Javaplex: A research software package for persistent (co)homology. Software available at http://code.google.com/javaplex, 2011.
[23]
Afra Joze Zomorodian. Computing and comprehending topology: persistence and hierarchical Morse complexes. PhD thesis, University of Illinois at Urbana-Champaign, 2001.

Cited By

View all
  • (2023)Generating High Dimensional Test Data for Topological Data AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_2(18-37)Online publication date: 3-Dec-2023
  • (2017)Open Problems in Computational TopologyACM SIGACT News10.1145/3138860.313886748:3(32-36)Online publication date: 7-Sep-2017
  • (2017)A Simplified Topological Representation of Text for Local and Global ContextProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123330(1451-1456)Online publication date: 23-Oct-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
August 2013
3266 pages
ISBN:9781577356332

Sponsors

  • The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

Publisher

AAAI Press

Publication History

Published: 03 August 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Generating High Dimensional Test Data for Topological Data AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_2(18-37)Online publication date: 3-Dec-2023
  • (2017)Open Problems in Computational TopologyACM SIGACT News10.1145/3138860.313886748:3(32-36)Online publication date: 7-Sep-2017
  • (2017)A Simplified Topological Representation of Text for Local and Global ContextProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123330(1451-1456)Online publication date: 23-Oct-2017
  • (2016)Stochastic multiresolution persistent homology kernelProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060964(2449-2455)Online publication date: 9-Jul-2016

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media