Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3314545.3314549acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccdaConference Proceedingsconference-collections
research-article
Public Access

Examining Intermediate Data Reduction Algorithms for use with t-SNE

Published: 14 March 2019 Publication History

Abstract

t-distributed Stochastic Neighbor Embedding (t-SNE) is a data visualization tool that was developed to provide a flexible, nonparametric method for mapping high dimensional data onto a two or three dimensional subspace for data visualization. This paper observes the effects of using different intermediate data reduction algorithms (e.g., Principal Component Analysis, Independent Component Analysis, Linear Discriminant Analysis, Sammon Mapping, and Local Linear Embedding) to first reduce the data to an intermediate subspace prior to applying t-SNE for visualization. Our research shows that no intermediate step in the visualization process is trivial, and application dependent knowledge should be utilized to ensure the best possible visualization in lower dimensional spaces. Experimental results are presented for several common data sets where we illustrate that, for clustering applications and visualization of class separation of multi-class data, each algorithm tested results in significantly different mappings.

References

[1]
L. van der Maaten and G. Hinton, "Visualizing data using t-sne," Journal of Machine Learning Research, vol. 9, 2008.
[2]
G. E. Hinton and S. T. Roweis, "Stochastic neighbor embedding," in Advances in neural information processing systems, 2003, pp. 857--864.
[3]
A. R. Jamieson, M. L. Giger, K. Drukker, H. Li, Y. Yuan, and N. Bhooshan, "Exploring nonlinear feature space dimension reduction and data representation in breast cadx with laplacian eigenmaps and t-sne," Medical Physics, vol. 37, no. 1, pp. 339--351, 2010.
[4]
P. Hamel and D. Eck, "Learning features from music audio with deep belief networks," in 11th International Society for Music Information Retrieval Conference (ISMIR 2010).
[5]
J. Sammon, "A nonlinear mapping for data structure analysis," IEEE Transactions on Computers, vol. C-18, no. 5, May 1969.
[6]
I. Borg and P. J. Groenen, Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
[7]
H. Abdi and L. J. Williams, "Principal component analysis," Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433--459, 2010.
[8]
L. Van Der Maaten, "Accelerating t-sne using tree-based algorithms." Journal of machine learning research, vol. 15, no. 1, pp. 3221--3245, 2014.
[9]
L. Hasdorff, "Gradient optimization and nonlinear control," 1976
[10]
M. Turk and A. Pentland, "Eigenfaces for recognition," vol. 3, no. 1, pp. 71--86, Mar. 1991.
[11]
K. Fukunaga, Introduction to Statistical Pattern Recognition. London, U.K.: Academic, 1990.
[12]
A. J. Bell and T. J. Sejnowski, "An information-maximization approach to blind separation and blind deconvolution," Neural Computation, vol. 7, no. 6, p. 1129--1159, 1995.
[13]
A. HyvÃČâĆňrinen, "Fast and robust fixed-point algorithms for independent component analysis," IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 626--634, May 1999.
[14]
A. HyvA.a.rinen and E. Oja, "Independent component analysis: algorithms and applications," Neural Networks, vol. 13, no. 4, pp. 411--430, 2000.
[15]
M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, "Face recognition by independent component analysis," IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1450--1464, Nov 2002.
[16]
R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of human genetics, vol. 7, no. 2, pp. 179--188, 1936.
[17]
P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, "Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection," vol. 19, no. 7, pp. 711--720, July 1997.
[18]
L. Van Der Maaten, E. Postma, and J. Van den Herik, "Dimensionality reduction: a comparative," J Mach Learn Res, vol. 10, pp. 66--71, 2009.
[19]
S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, p. 2323--2326, 2000.
[20]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, November 1998.
[21]
S. A. Nene, S. K. Nayar, and H. Murase, "Columbia object image library (coil-20)," Technical Report from CUCS, vol. 005, no. 96, February 1996.

Cited By

View all
  • (2024)Affective EEG-Based Person Identification With Continual LearningIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.340683673(1-16)Online publication date: 2024
  • (2023)Optimization of t-SNE by Tuning Perplexity for Dimensionality Reduction in NLPProceedings of International Conference on Communication and Computational Technologies10.1007/978-981-99-3485-0_41(519-528)Online publication date: 1-Sep-2023
  • (2021)A Framework for Detecting System Performance Anomalies Using Tracing Data AnalysisEntropy10.3390/e2308101123:8(1011)Online publication date: 3-Aug-2021
  • Show More Cited By

Index Terms

  1. Examining Intermediate Data Reduction Algorithms for use with t-SNE

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCDA '19: Proceedings of the 2019 3rd International Conference on Compute and Data Analysis
    March 2019
    163 pages
    ISBN:9781450366342
    DOI:10.1145/3314545
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. data reduction
    3. data visualization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICCDA 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)155
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Affective EEG-Based Person Identification With Continual LearningIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.340683673(1-16)Online publication date: 2024
    • (2023)Optimization of t-SNE by Tuning Perplexity for Dimensionality Reduction in NLPProceedings of International Conference on Communication and Computational Technologies10.1007/978-981-99-3485-0_41(519-528)Online publication date: 1-Sep-2023
    • (2021)A Framework for Detecting System Performance Anomalies Using Tracing Data AnalysisEntropy10.3390/e2308101123:8(1011)Online publication date: 3-Aug-2021
    • (2020)Machine learning‐based edge‐computing on a multi‐level architecture of WSN and IoT for real‐time fall detectionIET Wireless Sensor Systems10.1049/iet-wss.2020.009110:6(320-332)Online publication date: 21-Oct-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media