Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/956750.956806acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Mining high dimensional data for classifier knowledge

Published: 24 August 2003 Publication History

Abstract

We present in this paper the problem of discovering sets of attribute-value pairs in high dimensional data sets that are of interest not because of co-occurrence alone, but due to their value in serving as cores for potential classifiers of clusters. We present our algorithm in the context of a gene-expression dataset. Gene expression data, in most situations, is insufficient for clustering algorithms and any statistical inference because for 6000+ genes, typically only 10s and at most 100s of data points become available. It is difficult to use statistical techniques to design a classifier for such immensely under-specified data. The observed data, though statistically, insufficient contains some information about the domain. Our goal is to discover as much information about all potential classifiers as possible from the data and then summarize this knowledge. This summarization provides insights into the composition of potential classifiers. We present here algorithms and methods for mining a high dimensional data set, exemplified by a gene expression data set, for mining such information.

References

[1]
Golub, T. R. and Slonim, D. K. Slonim et. al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5349), pages 531--537, 1999.
[2]
D. W. Aha and R. L. Bankert. A comparative evaluation of sequential feature selection algorithms. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pages 1--7, 1995.
[3]
S. I. Gallant. Perceptron based learning algorithms. IEEE Transactions on Neural Networks, 1(2):179--191, 1990.
[4]
Sergios Theodoridis and Konstantinos Koutroumbas. Pattern Recognition. Academic Press, San Diego, CA, 1999.
[5]
N. S. Holter, M. Mitra, A. Maritan, et al. Fundamental patterns underlying gene expression profiles: Simplicity from Complexity. Proceedings of National Academy of Science, V.97, no. 15, pages 8409--8414, July 18, 2000
[6]
A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma et al. Different types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503--511, 2000
[7]
M. B. Eisen, P. T. Spellman, P. O. Brown, D. Botstein. Cluster analysis and display of genome-wide expression patterns Proc. Natl. Acad. Sci., 95:14863:14868, 1998
[8]
Demiriz A., Bennett K., Breneman C., Embrechts M., "Support Vector Machine Regression in Chemometrics?, Computing Science and Statistics, 2001.
[9]
S. Dutoit, J. Fridly and, T. P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Technical report 576, Dept. of Statistics, UC Berkeley.
[10]
Chen Wu Optimal Feature Subset Selection Algorithms for Unsupervised Learning. M. S. Thesis, University of Cincinnati, Cincinnati, 2000.

Index Terms

  1. Mining high dimensional data for classifier knowledge

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2003
    736 pages
    ISBN:1581137370
    DOI:10.1145/956750
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. high dimensional data
    2. pattern recognition

    Qualifiers

    • Article

    Conference

    KDD03
    Sponsor:

    Acceptance Rates

    KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 477
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media