Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1014052.1014130acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A framework for ontology-driven subspace clustering

Published: 22 August 2004 Publication History

Abstract

Traditional clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. While domain knowledge is always the best way to justify clustering, few clustering algorithms have ever take domain knowledge into consideration. In this paper, the domain knowledge is represented by hierarchical ontology. We develop a framework by directly incorporating domain knowledge into clustering process, yielding a set of clusters with strong ontology implication. During the clustering process, ontology information is utilized to efficiently prune the exponential search space of the subspace clustering algorithms. Meanwhile, the algorithm generates automatical interpretation of the clustering result by mapping the natural hierarchical organized subspace clusters with significant categorical enrichment onto the ontology hierarchy. Our experiments on a set of gene expression data using gene ontology demonstrate that our pruning technique driven by ontology significantly improve the clustering performance with minimal degradation of the cluster quality. Meanwhile, many hierarchical organizations of gene clusters corresponding to a sub-hierarchies in gene ontology were also successfully captured.

References

[1]
C. C. Aggarwal, C. Procopiuc, J. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In SIGMOD, 1999.]]
[2]
C. C. Aggarwal and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, pages 70--81, 2000.]]
[3]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, 1998.]]
[4]
A. Ben-Dor, B. Chor, R.Karp, and Z.Yakhini. Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem. In RECOMB 2002.]]
[5]
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbors meaningful. In Proc. of the Int. Conf. Database Theories, pages 217--235, 1999.]]
[6]
C. H. Cheng, A. W. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In SIGKDD, pages 84--93, 1999.]]
[7]
Y. Cheng and G. Church. Biclustering of expression data. In Proc. of 8th International Conference on Intelligent System for Molecular Biology, 2000.]]
[8]
I. S. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning. In SIGKDD, 2001.]]
[9]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-bsed algorithm for discovering clusters in large spatial databases with noise. In SIGKDD, pages 226--231, 1996.]]
[10]
Gene Ontology Consortium, www.geneontology.org.]]
[11]
H.V. Jagadish, J. Madar, and R. Ng. Semantic compression and pattern extraction with fasicicles. In VLDB, pages 186--196, 1999.]]
[12]
J.Liu and W.Wang. Flexible clustering by tendency in high dimensional spaces. Technical Report TR03-009, Computer Science Department, UNC-CH, 2003.]]
[13]
P.T. Spellman, G.Sherlock, M.Q.Zhang, V.R.Lyer, K.Anders, M.B.Eisen, P.O.Brown, D.Botstein, and Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microaray hybidization. Molecular Biology of the Cell, 9:3273--3297, 1998.]]
[14]
S.Tavazoie, Jason D.Hughes, M.J.Campbell, R.J.Cho, G.M.Church. Systematic determination of genetic network architecture, Nature Genetics, 22: 281--285, 1999.]]
[15]
S. Tavazoie, J. Hughes, M. Campbell, R. Cho, and G. Church. Yeast microarry data set. In http://arep.med.harvard.edu/biclustering/yeast.matrix, 2000.]]
[16]
H. Wang, W. Wang, J. Yang, and P. Yu. Clustering by pattern similarity in large data sets, in SIGMOD, pp. 394-405, 2002.]]

Cited By

View all

Index Terms

  1. A framework for ontology-driven subspace clustering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ontology
    2. subspace clustering
    3. tendency preserving

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A Partition Based Framework for Large Scale Ontology MatchingRecent Patents on Engineering10.2174/187221211366619021114141514:3(488-501)Online publication date: 19-Jan-2021
    • (2021)Semantic data mining in the information age: A systematic reviewInternational Journal of Intelligent Systems10.1002/int.2244336:8(3880-3916)Online publication date: 2-May-2021
    • (2018)Incorporating gene ontology into fuzzy relational clustering of microarray gene expression dataBiosystems10.1016/j.biosystems.2017.09.017163(1-10)Online publication date: Jan-2018
    • (2013)Comparative meta-analysis between human and mouse cancer microarray data reveals critical pathwaysInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2013.0560898:3(349-365)Online publication date: 1-Aug-2013
    • (2013)MicroClAnJournal of Parallel and Distributed Computing10.1016/j.jpdc.2012.09.00873:3(360-370)Online publication date: 1-Mar-2013
    • (2013)Identification of K-Tolerance Regulatory Modules in Time Series Gene Expression Data Using a Biclustering AlgorithmProceedings of the 9th International Conference on Active Media Technology - Volume 821010.1007/978-3-319-02750-0_15(146-155)Online publication date: 29-Oct-2013
    • (2013)Spread of Evaluation Measures for Microarray ClusteringBiological Knowledge Discovery Handbook10.1002/9781118617151.ch24(569-590)Online publication date: 27-Dec-2013
    • (2012)Mining the “Voice of the Customer” for Business PrioritizationACM Transactions on Intelligent Systems and Technology10.1145/2089094.20891143:2(1-17)Online publication date: 1-Feb-2012
    • (2010)Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering AlgorithmIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2008.347:1(153-165)Online publication date: 1-Jan-2010
    • (2010)Data clusteringPattern Recognition Letters10.1016/j.patrec.2009.09.01131:8(651-666)Online publication date: 1-Jun-2010
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media