Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2783258.2783309acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Dimensionality Reduction Via Graph Structure Learning

Published: 10 August 2015 Publication History

Abstract

We present a new dimensionality reduction setting for a large family of real-world problems. Unlike traditional methods, the new setting aims to explicitly represent and learn an intrinsic structure from data in a high-dimensional space, which can greatly facilitate data visualization and scientific discovery in downstream analysis. We propose a new dimensionality-reduction framework that involves the learning of a mapping function that projects data points in the original high-dimensional space to latent points in a low-dimensional space that are then used directly to construct a graph. Local geometric information of the projected data is naturally captured by the constructed graph. As a showcase, we develop a new method to obtain a discriminative and compact feature representation for clustering problems. In contrast to assumptions used in traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. Extensive experiments are performed that demonstrate that the proposed method is able to obtain discriminative feature representations yielding superior clustering performance, and correctly recover the intrinsic structures of various real-world datasets including curves, hierarchies and a cancer progression path.

References

[1]
R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6:1817--1853, 2005.
[2]
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001.
[3]
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor"meaningful? In Database Theory-ICD'99, 1999.
[4]
C. M. Bishop, M. Svensén, and C. K. I. Williams. GTM: the generative topographic mapping. Neural Comput, 10(1):215--234, 1998.
[5]
S. Boyd and L. Vandenberghe. Conex Optimization. Cambridge University Press, 2004.
[6]
C. J. C. Burges. Dimension reduction: a guided tour. FTML, 2(4):275--365, 2009.
[7]
L. Cayton. Algorithms for manifold learning. Technical report, UCSD, 2005.
[8]
Y. Cheng. Mean shift, mode seeking, and clustering. IEEE T-PAMI, 17(8):790--799, 1995.
[9]
M. Cheung. Minimum-cost spanning trees. http://people.orie.cornell.edu/dpw/orie6300fall2008/Recitations/rec09.pdf.
[10]
C. Curtis, S. P. Shah, S. Chin, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403):346--352, 2012.
[11]
C. Ding and X. He. K-means clustering via principal component analysis. In ICML, 2004.
[12]
E. Erwin, K. Obermayer, and K. Schulten. Self-organizing maps: ordering, convergence properties and energy functions. Biol Cybern, 67:47--55, 1992.
[13]
M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of kernel and spectral methods for clustering. Pattern Recogn, 41:176--190, 2008.
[14]
A. Gorban, B. Kégl, D. Wunsch, and A. Zinovyev. Principal Manifolds for Data Visualisation and Dimension Reduction, volume 58. Springer, 2007.
[15]
M. Greaves and C. C. Maley. Clonal evolution in cancer. Nature, 481(7381):306--313, 2012.
[16]
T. Hastie and W. Stuetzle. Principal curves. JASA, 84:502--516, 1989.
[17]
R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 2012.
[18]
J. T. Jolliffe. Principal Component Analysis. Springer-Verlag, Berlin, 1986.
[19]
B. Kégl and A. Kryzak. Piecewise linear skeletonization using principal curves. IEEE T-PAMI, 24(1):59--74, 2002.
[20]
T. Kohonen. Self-organizing Maps. Springer, 1997.
[21]
Q. Mao, I. Tsang, S. Gao, and L. Wang. Generalized multiple kernel learning with data-dependent priors. IEEE T-NNLS, 29(6):1134--1148, 2015.
[22]
Q. Mao, L. Yang, L. Wang, S. Goodison, and Y. Sun. SimplePPT: A simple principal tree algorithm. In SDM, 2015.
[23]
J. Parker, M. Mullins, M. Cheang, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol, 27(8):1160--1167, 2009.
[24]
M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: popular nearest neighbors in high-dimensional data. JMLR, 11:2487--2531, 2010.
[25]
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.
[26]
L. Saul and S. Roweis. Think globally, fit locally: unsupervised learning of low dimensional mainfolds. JMLR, 4:119--155, 2003.
[27]
B. Schölkopf, A. Smola, and K. Muller. Kernel principal component analysis. Advances in Kernel Methods - Support Vector Learning, pages 327--352, 1999.
[28]
A. J. Smola, S. Mika, B. Schölkopf, and R. C. Williamson. Regularized principal manifolds. JMLR, 1:179--209, 2001.
[29]
L. Song, A. Smola, A. Gretton, and K. Borgwardt. A dependence maximization view of clustering. In ICML, 2007.
[30]
Y. Sun, J. Yao, N. Nowak, and S. Goodison. Cancer progression modeling using static sample data. Genome Biol, 15(8):440, 2014.
[31]
J. B. Tenenbaum, V. deSilva, and J. C. Landford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319--2323, 2000.
[32]
R. Tibshirani. Principal curves revisited. Stat Comput, 2:183--190, 1992.
[33]
K. Q. Weinberger and L. K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI, 2006.
[34]
J. Yao, Q. Mao, S. Goodison, V. Mai, and Y. Sun. Feature selection for unsupervised learning through local learning. Pattern Recogn Lett, 53:100--107, 2015.

Cited By

View all
  • (2024)Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation StudyJournal of Medical Internet Research10.2196/5213426(e52134)Online publication date: 11-Jan-2024
  • (2024)Prostate Cancer Progression Modeling Provides Insight into Dynamic Molecular Changes Associated with Progressive Disease StatesCancer Research Communications10.1158/2767-9764.CRC-24-02104:10(2783-2798)Online publication date: 24-Oct-2024
  • (2024)Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive ProcessingACM Transactions on Intelligent Systems and Technology10.1145/3653986Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. dimensionality reduction
  3. graph structure learning
  4. unsupervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '15
Sponsor:

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)193
  • Downloads (Last 6 weeks)27
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation StudyJournal of Medical Internet Research10.2196/5213426(e52134)Online publication date: 11-Jan-2024
  • (2024)Prostate Cancer Progression Modeling Provides Insight into Dynamic Molecular Changes Associated with Progressive Disease StatesCancer Research Communications10.1158/2767-9764.CRC-24-02104:10(2783-2798)Online publication date: 24-Oct-2024
  • (2024)Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive ProcessingACM Transactions on Intelligent Systems and Technology10.1145/3653986Online publication date: 26-Mar-2024
  • (2024)Reverse Graph Learning for Graph Neural NetworkIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316103035:4(4530-4541)Online publication date: Apr-2024
  • (2024)scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq dataBriefings in Bioinformatics10.1093/bib/bbae14825:3Online publication date: 9-Apr-2024
  • (2024)Sparse robust adaptive unsupervised subspace learning for dimensionality reductionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107582129(107582)Online publication date: Mar-2024
  • (2024)Dysfunction of CCT3-associated network signals for the critical state during progression of hepatocellular carcinomaBiochimica et Biophysica Acta (BBA) - Molecular Basis of Disease10.1016/j.bbadis.2024.1670541870:4(167054)Online publication date: Apr-2024
  • (2024)A novel robust adaptive subspace learning framework for dimensionality reductionApplied Intelligence10.1007/s10489-024-05602-y54:19(8939-8967)Online publication date: 6-Jul-2024
  • (2023)Lineage Plasticity and Stemness Phenotypes in Prostate Cancer: Harnessing the Power of Integrated “Omics” Approaches to Explore Measurable MetricsCancers10.3390/cancers1517435715:17(4357)Online publication date: 1-Sep-2023
  • (2023)LVPT: Lazy Velocity Pseudotime Inference MethodBiomolecules10.3390/biom1308124213:8(1242)Online publication date: 12-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media