Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1008992.1009012acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Locality preserving indexing for document representation

Published: 25 July 2004 Publication History

Abstract

Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI might not be optimal in discriminating documents with different semantics. In this paper, a novel algorithm called Locality Preserving Indexing (LPI) is proposed for document indexing. Each document is represented by a vector with low dimensionality. In contrast to LSI which discovers the global structure of the document space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare the proposed LPI approach with LSI on two standard databases. Experimental results show that LPI provides better representation in the sense of semantic structure.

References

[1]
R. K. Ando, "Latent Semantic Space: Iterative Scaling improves precision of inter-document similarity measurement", in Proc. of the 23rd International ACM SIGIR, Athens, Greece, 2000.
[2]
R. K. Ando, and L. Lee, "Iterative Residual Rescaling: An Analysis and Generalization of LSI", in Proc. of the 24th International ACM SIGIR, New Orleans, LA, 2001.
[3]
B. T. Bartell, G. W. Cottrell, and R. K. Belew, "Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling", in Proc. of 15th International ACM SIGIR, Copenhagen, Denmark, 1992.
[4]
M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering", Advances in Neural Information Processing Systems 14, Vancouver, Canada, 2001.
[5]
E. Bingham and H. Mannila, "Random Projection in dimensionality reduction: applications to image and text data", Proc. Of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 245--250, 2001.
[6]
Fan R. K. Chung, Spectral Graph Theory, Regional Conferences Series in Mathematics, number 92, 1997.
[7]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman, "Indexing by Latent Semantic Analysis", Journal of the American Society of Information Science, 41(6):391--407, 1990.
[8]
L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag New York, Inc., 1996.
[9]
C. H. Ding, "A similarity-based probability model for Latent Semantic Indexing", in Proc. of the 22nd International ACM SIGIR, 1999.
[10]
Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification (2nd Edition), Wiley-Interscience, 2000.
[11]
S. T. Dumais and J. Nielsen, "Automating the assignment of submitted manuscripts to reviewers", in Proc. of the 15th ACM SIGIR, Copenhagen, Denmark, 1992.
[12]
P. W. Foltz and S. T. Dumais, "Personalized information delivery: An analysis of information filtering methods", Communications of the ACM, 35(12):51--60, 1992.
[13]
Xiaofei He and Partha Niyogi, "Locality Preserving Projections", in Advances in Neural Information Processing Systems 16, Vancouver, Canada, 2003.
[14]
T. Hofmann, "Probabilistic Latent Semantic Indexing", in Proc. of the 22nd International ACM SIGIR, Berkeley, California, 1999.
[15]
C. L. Isbell and P. Viola, "Restructuring Sparse High Dimensional Data for Effective Retrieval", Advances in Neural Information Systems, 1999.
[16]
T. G. Kolda and D. P. O'Leary, "A Semidiscrete matrix decomposition for latent semantic indexing in information retrieval", ACM Transactions on Information Systems, 16(4):322--346, 1998.
[17]
K. Lang, "Learning to filter netnews", Proc. Of the 12th Int. Conf. on Machine Learning, 1995.
[18]
C.H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, "Latent semantic indexing: a probabilistic analysis," in Proc. 17th ACM Symp. Principles of Database Systems, Seattle, 1998.
[19]
S. T. Roweis, L. K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding", Science, vol 290, 22 December 2000.
[20]
G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[21]
J. B. Tenenbaum, Vin De Silva, and J. C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction", Science, Vol 290, 22 December 2000.
[22]
W. Xu, X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization", in Proc. of the 26th International ACM SIGIR, Toronto, Canada, 2003.

Cited By

View all
  • (2024)Robust Regularized Locality Preserving Indexing for Fiedler Vector EstimationIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34006835(867-885)Online publication date: 2024
  • (2023)Block-Active ADMM to Minimize NMF with Bregman DivergencesSensors10.3390/s2316722923:16(7229)Online publication date: 17-Aug-2023
  • (2021)A New Classification Algorithm and a New Oversampling Method of Mapping Common Data Elements to the BRIDG Model2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM52615.2021.9669697(2788-2795)Online publication date: 9-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
July 2004
624 pages
ISBN:1581138814
DOI:10.1145/1008992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. document representation and indexing
  3. latent semantic indexing
  4. locality preserving indexing
  5. similarity measure
  6. vector space model

Qualifiers

  • Article

Conference

SIGIR04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Robust Regularized Locality Preserving Indexing for Fiedler Vector EstimationIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34006835(867-885)Online publication date: 2024
  • (2023)Block-Active ADMM to Minimize NMF with Bregman DivergencesSensors10.3390/s2316722923:16(7229)Online publication date: 17-Aug-2023
  • (2021)A New Classification Algorithm and a New Oversampling Method of Mapping Common Data Elements to the BRIDG Model2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM52615.2021.9669697(2788-2795)Online publication date: 9-Dec-2021
  • (2021)Nonlinear Graph Learning-Convolutional Networks for Node ClassificationNeural Processing Letters10.1007/s11063-021-10478-x54:4(2727-2736)Online publication date: 10-Mar-2021
  • (2019)Document Representation using Extended Locality Preserving Indexing2019 IEEE 16th India Council International Conference (INDICON)10.1109/INDICON47234.2019.9030348(1-4)Online publication date: Dec-2019
  • (2019)Refining the Measurement of Topic Similarities Through Bibliographic Coupling and LDAIEEE Access10.1109/ACCESS.2019.29584897(179997-180011)Online publication date: 2019
  • (2019)Using orthogonal locality preserving projections to find dominant features for classifying retinal blood vesselsMultimedia Tools and Applications10.1007/s11042-018-6474-778:10(12783-12803)Online publication date: 1-May-2019
  • (2018)Dimensionality Reduction for Identification of Hepatic Tumor Samples Based on Terahertz Time-Domain SpectroscopyIEEE Transactions on Terahertz Science and Technology10.1109/TTHZ.2018.28130858:3(271-277)Online publication date: May-2018
  • (2018)Indexing-Based Classification: An Approach Toward Classifying Text DocumentsInformation Systems Design and Intelligent Applications10.1007/978-981-10-7512-4_88(894-902)Online publication date: 2-Mar-2018
  • (2017)Image Retrieval Using Deep Convolutional Neural Networks and Regularized Locality Preserving Indexing StrategyJournal of Computer and Communications10.4236/jcc.2017.5300405:03(33-39)Online publication date: 2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media