Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An expert system to classify microarray gene expression data using gene selection by decision tree

Published: 01 July 2009 Publication History

Abstract

Gene selection can help the analysis of microarray gene expression data. However, it is very difficult to obtain a satisfactory classification result by machine learning techniques because of both the curse-of-dimensionality problem and the over-fitting problem. That is, the dimensions of the features are too large but the samples are too few. In this study, we designed an approach that attempts to avoid these two problems and then used it to select a small set of significant biomarker genes for diagnosis. Finally, we attempted to use these markers for the classification of cancer. This approach was tested the approach on a number of microarray datasets in order to demonstrate that it performs well and is both useful and reliable.

References

[1]
Optimization models for cancer classification: Extracting gene interaction information from microarray expression data. Bioinformatics. v20 i5. 644-652.
[2]
Exploiting scale-free information from expression data for cancer classification. Computational Biology and Chemistry. v29 i4. 288-293.
[3]
Microarray analysis of trophoblast differentiation: Gene expression reprogramming in key gene function categories. Physiological Genomics. v6 i2. 105-116.
[4]
Gene selection using a two-level hierarchical Bayesian model. Bioinformatics. v20 i18. 3423-3430.
[5]
Application of DNA microarray technology in determining breast cancer prognosis and therapeutic response. Expert Opinion on Biological Therapy. v5 i8. 1069-1083.
[6]
Brown, T. A., (2002). Genomes (2nd ed.).
[7]
PCP: A program for supervised classification of gene expression profiles. Bioinformatics. v22 i2. 245-247.
[8]
Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics. v21 i24. 4348-4355.
[9]
Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics. v21 i16. 3385-3393.
[10]
Outcome signature genes in breast cancer: Is there a unique set?. Bioinformatics. v21 i2. 171-178.
[11]
Fisher, R. A. (1932). Statistical methods for research workers.
[12]
Freund, Y., Mason, L. (1999). The alternating decision tree learning algorithm.
[13]
Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. v286 i5439. 531-537.
[14]
Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics. v19 i2. 249-255.
[15]
Large-scale clustering of cDNA-fingerprinting data. Genome Research. i9. 1093-1105.
[16]
John, G. H., Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 338-345).
[17]
Kanehisa, M. (2002). The KEGG database. In Novartis found symposium (Vol. 247, pp. 91-101); discussion 101-103, 119-128, 244-252.
[18]
The KEGG databases at GenomeNet. Nucleic Acids Research. v30 i1. 42-46.
[19]
Keerthi, S. S., et al. (2001). Improvements to Platt's SMO algorithm for SVM classifier design.
[20]
Major developments in adjuvant treatment of early HER2-positive breast cancer. Nature Clinical Practice Oncology. v3 i1. 10-11.
[21]
A case of metastatic breast cancer with outgrowth of HER2-negative cells after eradication of HER2-positive cells by humanized anti-HER2 monoclonal antibody (trastuzumab) combined with docetaxel. Human Pathology. v35 i3. 379-381.
[22]
Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics. v19 iSuppl. 2. II93-II102.
[23]
Gene mining: A novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Research. v32 i9. 2685-2694.
[24]
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. v20 i15. 2429-2437.
[25]
Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. v21 i19. 3787-3793.
[26]
HER2-positive breast cancer: Update on Breast Cancer International Research Group trials. Clin Breast Cancer. v3 iSuppl. 2. S75-S79.
[27]
A phase II study on metastatic breast cancer patients treated with weekly vinorelbine with or without trastuzumab according to HER2 expression: Changing the natural history of HER2-positive disease. Annals of Oncology. v17 i4. 630-636.
[28]
Platt, J., et al. (1998). Fast training of support vector machines using sequential minimal optimization.
[29]
Ensemble dependence model for classification and prediction of cancer and normal gene expression data. Bioinformatics. v21 i14. 3114-3121.
[30]
Quinlan, R. (1993). C4.5: Programs for machine learning.
[31]
Seigel, A. F. (2003). Practical business statistics.
[32]
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. v21 i5. 631-643.
[33]
Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research. v61 i20. 7388-7393.
[34]
Vapnik, V. N. (1998). Statistical learning theory.
[35]
HykGene: A hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics. v21 i8. 1530-1537.
[36]
Witten, I. H., Frank, E. (1999). Data mining: Practical machine learning tools and techniques with Java implementations.
[37]
Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics. v21 i10. 2394-2402.

Cited By

View all
  • (2022)A Novel Feature-Based SHM Assessment and Predication Approach for Robust Evaluation of Damage Data Diagnosis SystemsWireless Personal Communications: An International Journal10.1007/s11277-022-09518-z124:4(3387-3411)Online publication date: 1-Jun-2022
  • (2019)Multiclass Benchmarking Framework for Automated Acute Leukaemia Detection and Classification Based on BWM and Group-VIKORJournal of Medical Systems10.1007/s10916-019-1338-x43:7(1-32)Online publication date: 1-Jul-2019
  • (2019)CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problemsNeural Computing and Applications10.1007/s00521-018-3469-231:10(6345-6363)Online publication date: 1-Oct-2019
  • Show More Cited By
  1. An expert system to classify microarray gene expression data using gene selection by decision tree

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Expert Systems with Applications: An International Journal
        Expert Systems with Applications: An International Journal  Volume 36, Issue 5
        July, 2009
        894 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 July 2009

        Author Tags

        1. Bioinformatics
        2. Decision tree
        3. Expert system
        4. Machine learning
        5. Microarray gene expression

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)A Novel Feature-Based SHM Assessment and Predication Approach for Robust Evaluation of Damage Data Diagnosis SystemsWireless Personal Communications: An International Journal10.1007/s11277-022-09518-z124:4(3387-3411)Online publication date: 1-Jun-2022
        • (2019)Multiclass Benchmarking Framework for Automated Acute Leukaemia Detection and Classification Based on BWM and Group-VIKORJournal of Medical Systems10.1007/s10916-019-1338-x43:7(1-32)Online publication date: 1-Jul-2019
        • (2019)CBR-PSO: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problemsNeural Computing and Applications10.1007/s00521-018-3469-231:10(6345-6363)Online publication date: 1-Oct-2019
        • (2018)Systematic Review of an Automated Multiclass Detection and Classification System for Acute Leukaemia in Terms of Evaluation and Benchmarking, Open Challenges, Issues and Methodological AspectsJournal of Medical Systems10.1007/s10916-018-1064-942:11(1-36)Online publication date: 1-Nov-2018
        • (2015)Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood techniqueExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.08.01442:1(612-627)Online publication date: 1-Jan-2015
        • (2013)Computational intelligence techniques in bioinformaticsComputational Biology and Chemistry10.5555/2772765.277284347:C(37-47)Online publication date: 1-Dec-2013
        • (2010)Partition-conditional ICA for Bayesian classification of microarray dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.05.06837:12(8188-8192)Online publication date: 1-Dec-2010

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media