Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles

Published: 01 July 2014 Publication History

Abstract

Tumor clustering is one of the important techniques for tumor discovery from cancer gene expression profiles, which is useful for the diagnosis and treatment of cancer. While different algorithms have been proposed for tumor clustering, few make use of the expert's knowledge to better the performance of tumor discovery. In this paper, we first view the expert's knowledge as constraints in the process of clustering, and propose a feature selection based semi-supervised cluster ensemble framework (FS-SSCE) for tumor clustering from bio-molecular data. Compared with traditional tumor clustering approaches, the proposed framework FS-SSCE is featured by two properties: (1) The adoption of feature selection techniques to dispel the effect of noisy genes. (2) The employment of the binate constraint based K-means algorithm to take into account the effect of experts' knowledge. Then, a double selection based semi-supervised cluster ensemble framework (DS-SSCE) which not only applies the feature selection technique to perform gene selection on the gene dimension, but also selects an optimal subset of representative clustering solutions in the ensemble and improve the performance of tumor clustering using the normalized cut algorithm. DS-SSCE also introduces a confidence factor into the process of constructing the consensus matrix by considering the prior knowledge of the data set. Finally, we design a modified double selection based semi-supervised cluster ensemble framework (MDS-SSCE) which adopts multiple clustering solution selection strategies and an aggregated solution selection function to choose an optimal subset of clustering solutions. The results in the experiments on cancer gene expression profiles show that (i) FS-SSCE, DS-SSCE and MDS-SSCE are suitable for performing tumor clustering from bio-molecular data. (ii) MDS-SSCE outperforms a number of state-of-the-art tumor clustering approaches on most of the data sets.

References

[1]
J. Azimi and X. Fern, "Adaptive cluster ensemble selection," in Proc. Int. Joint Conf. Artif. Intell., 2009, pp. 992-997.
[2]
A. A. Alizadeh, M. B. Eisen, and R. E. Davis et al., "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, vol. 403, pp. 503-511, 2000.
[3]
S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S. J. Korsmeyer, "MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia," Nature Genetic, vol. 30, no. 1, pp. 41-47, 2002.
[4]
R. Avogadri and G. Valentini, "Fuzzy ensemble clustering based on random projections for DNA microarray data analysis," Artif. Intell. Med., vol. 45, no. 2-3, pp. 173-183, 2009.
[5]
S. Basu, A. Banerjee, and R. J. Mooney, "Active semi-supervision for pairwise constrained clustering," in Proc. SIAM Int. Conf. Data Mining, 2004, pp. 1-8.
[6]
A. Bertoni and G. Valentini, "Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses," Artif. Intell. Med., vol. 37, no. 2, pp. 85-109, 2006.
[7]
A. Bhattacharjee, W. G. Richards, and J. Staunton et al., "Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinomas sub-classes," Proc. Nat. Acad. Sci. USA, vol. 98, no. 24, pp. 13790-13795, 2001.
[8]
G. Brown, A. Pocock, M.-J. Zhao, and M. Luján, "Conditional likelihood maximisation: A unifying framework for information theoretic feature selection," J. Mach. Learn. Res., vol. 13, pp. 27-66, 2012.
[9]
Q. Cheng, H. Zhou, and J. Cheng, "The Fisher-Markov selector: Fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 6, pp. 1217-1233, Jun. 2011.
[10]
J.-H. Chiang and S.-H. Ho, "A combination of rough-based feature selection and RBF neural network for classification using gene expression data," IEEE Trans. Nanobiosci., vol. 7, no. 1, pp. 91-99, Mar. 2008.
[11]
M. C. de Souto, I. G. Costa, D. S. de Araujo, T. B. Ludermir, and A. Schliep, "Clustering cancer gene expression data: A comparative study," BMC Bioinformatics, vol. 9, article 497, 2008.
[12]
L. Dyrskjot, T. Thykjaer, and M. Kruhoffer, et al., "Identifying distinct classes of bladder carcinoma using microarrays," Nature Genetic, vol. 33, no. 1, pp. 90-96, 2003.
[13]
X. Z. Fern and W. Lin, "Cluster ensemble selection," Statist. Anal. Data Mining, vol. 1, no. 3, pp. 787-797, 2008.
[14]
T. R. Golub, D. K. Slonim, and P. Tamayo, et al., "Molecular classification of cancer: Class discovery and class prediction by gene expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[15]
J. Gu, W. Feng, J. Zeng, H. Mamitsuka, and S. Zhu, "Efficient semi-supervised MEDLINE document clustering with mesh semantic and global content constraints," IEEE Trans. Cybern., vol. 43, no. 4, pp. 1265-1276, Aug. 2013.
[16]
J. Hartigan, Clustering Algorithms. Hoboken, NJ, USA: Wiley, 1975.
[17]
J. Handl, J. Knowles, and D. B. Kell, "Computational cluster validation in post-genomic data analysis bioinformatics," Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.
[18]
Y. Hoshida, J. P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov, "Subclass mapping: Identifying common subtypes in independent disease data sets," PLoS ONE, vol. 2, no. 11, p. e1195, 2007.
[19]
N. Iam-on, T. Boongoen, and S. Garrett, "LCE: A link-based cluster ensemble method for improved gene expression data analysis," Bioinformatics, vol. 26, no. 12, pp. 1513-1519, 2010.
[20]
A. Jakulin, "Machine learning based on attribute interactions," PhD thesis, Dept. of Computer Science, Univ. of Ljubljana, Ljubljana, Slovenia, 2005.
[21]
M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, and M Tanabe, "KEGG for integration and interpretation of large-scale molecular data sets," Nucleic Acids Res., vol. 40, no. (Database issue), pp. D109-D114, 2012.
[22]
J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nature Med., vol. 7, no. 6, pp. 673-676, 2001.
[23]
K.-S. Leung, K. H. Lee, J.-F. Wang, E. Y. T. Ng, H. L. Y. Chan, S. K. W. Tsui, T. S. K. Mok, P.C.-H. Tse, and J.J.-Y. Sung, "Data mining on DNA sequences of hepatitis B virus," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 8, no. 2, pp. 428-440, Mar./Apr. 2011.
[24]
P. Mahata, "Exploratory consensus of hierarchical clusterings for melanoma and breast cancer," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 7, no. 1, pp. 138-152, Jan. 2010.
[25]
K. Z. Mao and W. Tang, "Recursive mahalanobis separability measure for gene subset selection," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 8, no. 1, pp. 266-272, Jan.-Mar. 2011.
[26]
S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus clustering: A resamlping based method for class discovery and visualization of gene expression microarray data," J. Mach. Learn., vol. 52, pp. 1-20, 2003.
[27]
P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE Trans. Nanobiosci., vol. 9, no. 1, pp. 31-37, Mar. 2010.
[28]
D. A. Notterman, U. Alon, A. J. Sierk, and A. J. Levine, "Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays," Cancer Res., vol. 61, pp. 3124-3130, 2001.
[29]
C. L. Nutt, D. R. Mani, and R. A. Betensky, et al., "Gene expression-based classification of malignant gliomas correlates better with survival than histological classification," Cancer Res., vol. 63, no. 7, pp. 1602-1607, 2003.
[30]
H. Peng, F. Long, and C. H. Q. Ding, "Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[31]
S. L. Pomeroy, P. Tamayo, and M. Gaasenbeek, et al., "Prediction of central nervous system embryonal tumour outcome based on gene expression," Nature, vol. 415, pp. 436-442, 2002.
[32]
A. Sharma, S. Imoto, and S. Miyano, "A top-r feature selection algorithm for microarray gene expression data," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 9, no. 3, pp. 754-764, May/Jun. 2012.
[33]
J. Shi and M. Jitendra, "Normalized cuts and image segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888-905, Aug. 2000.
[34]
J. I. Risinger, G. L. Maxwell, G. V. R. Chandramouli, A. Jazaeri, O. Aprelikova, T. Patterson, A. Berchuck, and J. C. Barrett, "Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer," Cancer Res., vol. 63, no. 1, pp. 6-11, 2003.
[35]
M. Smolkin and D. Ghosh, "Cluster stability scores for microarray data in cancer studies," BMC Bioinformatics, vol. 4, article 36, 2003.
[36]
G. Valentini, "Clusterv: A tool for assessing the reliability of clusters discovered in DNA microarray data," Bioinformatics, vol. 22, no. 3, pp. 369-370, 2006.
[37]
M. West, C. Blanchette, and H. Dressma, et al., "Predicting the clinical status of human breast cancer by using gene expression profiles," in Proc. Nat. Acad. Sci. USA, 2001, vol. 98, no. 20, pp. 11462-11467.
[38]
W.-H. Yang, D.-Q. Dai, and H. Yan, "Finding correlated biclusters from gene expression data," IEEE Trans. Knowl. Data Eng., vol. 23, no. 4, pp. 568-584, Apr. 2011.
[39]
F. Yang and K. Z. Mao, "Robust feature selection for microarray data based on multicriterion fusion," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 8, no. 4, pp. 1080-1092, Jul.-Sep. 2011.
[40]
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo, "Model-based clustering and data transformations for gene expression data," Bioinformatics, vol. 17, no. 10, pp. 977-987, 2001.
[41]
Z. Yu, H.-S. Wong, and H. Wang, "Graph-based consensus clustering for class discovery from gene expression data," Bioinformatics, vol. 23, no. 21, pp. 2888-2896, 2007.
[42]
Z. Yu, H.-S. Wong, J. You, Q. Yang, and H. Liao, "Class discovery from gene expression data based on perturbation and cluster ensemble," IEEE Trans. Nanobiosci., vol. 8, no. 2, pp. 147-160, Jun. 2009.
[43]
Z. Yu, L. Li, J. You, and G. Han, "SC3: Triple spectral clustering based consensus clustering framework for class discovery from cancer gene expression profiles," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 9, no. 6, pp. 1751-1765, Nov. 2012.
[44]
Z. Yu, J. You, L. Li, H.-S. Wong, and G. Han, "Representative distance: A new similarity measure for cancer discovery from gene expression data," IEEE Trans. Nanobiosci., vol. 11, no. 4, pp. 341-351, Dec. 2012.
[45]
L. Yu, Y. Han, and M. E. Berens, "Stable gene selection from microarray data via sample weighting," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 9, no. 1, pp. 262-272, Jan./Feb. 2012.
[46]
C.-H. Zheng, D.-S. Huang, L. Zhang, and X.-Z. Kong, "Tumor clustering using nonnegative matrix factorization with gene selection," IEEE Trans. Inf. Technol. Biomed., vol. 13, no. 4, pp. 599-607, Jul. 2009.
[47]
C.-H. Zheng, L. Zhang, V. T. Ng, C. K. Shiu, and D.-S. Huang, "Molecular pattern discovery based on penalized matrix decomposition," IEEE/ACM Trans. Computat. Biol. Bioinformatics, vol. 8, no. 6, pp. 1592-1603, Nov./Dec. 2011.
[48]
Z. Zhao, F. Morstatter, and S. Sharma, et al., "Advancing feature selection research: ASU feature selection repository," Tempe, AZ, Tech. Rep. TR-10-007, School of Comput., Informatics, Decision Syst. Eng., Arizona State Univ.
[49]
S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, "Feature selection for gene expression using model-based entropy," IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 7, no. 1, pp. 25-36, Jan.-Mar. 2010.

Cited By

View all
  • (2023)Adaptive Ensemble Clustering With Boosting BLS-Based AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327112035:12(12369-12383)Online publication date: 1-Dec-2023
  • (2022)An intelligent disease prediction and monitoring system using feature selection, multi-neural network and fuzzy rulesNeural Computing and Applications10.1007/s00521-022-07527-434:22(19877-19893)Online publication date: 1-Nov-2022
  • (2020)A survey on ensemble learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8208-z14:2(241-258)Online publication date: 1-Apr-2020
  • Show More Cited By

Index Terms

  1. Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
          IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 4
          July/August 2014
          160 pages

          Publisher

          IEEE Computer Society Press

          Washington, DC, United States

          Publication History

          Published: 01 July 2014
          Accepted: 20 March 2014
          Revised: 13 February 2014
          Received: 16 July 2013
          Published in TCBB Volume 11, Issue 4

          Author Tags

          1. cluster ensemble
          2. feature selection
          3. gene expression profiles
          4. semi-supervised clustering
          5. tumor clustering

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)4
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 20 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Adaptive Ensemble Clustering With Boosting BLS-Based AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327112035:12(12369-12383)Online publication date: 1-Dec-2023
          • (2022)An intelligent disease prediction and monitoring system using feature selection, multi-neural network and fuzzy rulesNeural Computing and Applications10.1007/s00521-022-07527-434:22(19877-19893)Online publication date: 1-Nov-2022
          • (2020)A survey on ensemble learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8208-z14:2(241-258)Online publication date: 1-Apr-2020
          • (2020)Multi-population adaptive genetic algorithm for selection of microarray biomarkersNeural Computing and Applications10.1007/s00521-019-04671-232:15(11897-11918)Online publication date: 1-Aug-2020
          • (2020)Multiple clustering and selecting algorithms with combining strategy for selective clustering ensembleSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-020-05264-124:20(15129-15141)Online publication date: 1-Oct-2020
          • (2019)A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clustersApplied Intelligence10.1007/s10489-018-01397-x49:7(2567-2581)Online publication date: 1-Jul-2019
          • (2018)Semi-Supervised Ensemble Clustering Based on Selected Constraint ProjectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.281872930:12(2394-2407)Online publication date: 1-Dec-2018
          • (2018)Bi-level and Bi-objective p-Median Type Problems for Integrative ClusteringIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2016.262269215:1(46-59)Online publication date: 1-Jan-2018
          • (2018)A new validity index adapted to fuzzy clustering algorithmMultimedia Tools and Applications10.1007/s11042-017-5550-877:9(11339-11361)Online publication date: 1-May-2018
          • (2017)Adaptive Ensembling of Semi-Supervised Clustering SolutionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269561529:8(1577-1590)Online publication date: 1-Aug-2017
          • Show More Cited By

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media