Active Mining Discriminative Gene Sets

Feng Chu^22,23 &
Lipo Wang^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1262 Accesses

Abstract

Searching for good discriminative gene sets (DGSs) in microarray data is important for many problems, such as precise cancer diagnosis, correct treatment selection, and drug discovery. Small and good DGSs can help researchers eliminate “irrelavent” genes and focus on “critical” genes that may be used as biomarkers or that are related to the development of cancers. In addition, small DGSs will not impose demanding requirements to classifiers, e.g., high-speed CPUs, large memorys, etc. Furthermore, if the DGSs are used as diagnostic measures in the future, small DGSs will simplify the test and therefore reduce the cost. Here, we propose an algorithm of searching for DGSs, which we call active mining discriminative gene sets (AM-DGS). The searching scheme of the AM-DGS is as follows: the gene with a large t-statistic is assigned as a seed, i.e., the first feature of the DGS. We classify the samples in a data set using a support vector machine (SVM). Next, we add the gene with the greatest power to correct the misclassified samples into the DGS, that is the gene with the largest t-statistic evaluated with only the mis-classified samples is added. We keep on adding genes into the DGS according to the SVM’s mis-classified data until no error appears or overfitting occurs. We tested the proposed method with the well-known leukemia data set. In this data set, our method obtained two 2-gene DGSs that achieved 94.1% testing accuracy and a 4-gene DGS that achieved 97.1% testing accuracy. This result showed that our method obtained better accuracy with much smaller DGSs compared to 3 widely used methods, i.e., T-statistics, F-statistics, and SVM-based recursive feature elimination (SVM-RFE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A novel gene selection algorithm for cancer classification using microarray datasets

Article Open access 15 January 2019

Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data

A Comparative Study of Gene Selection Methods for Microarray Cancer Classification

References

Guyon, I., Wecton, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: A Probabilistic Active Support Vector Learning Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 413–418 (2004)
Article Google Scholar
Tong, S., Koller, D.: Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research 2, 45–66 (2002)
Article MATH Google Scholar
Platt, J.C.: Sequential Minimum Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research, Cambridge, U.K., Technical Report (1998)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. USA. 96, 6745–6750 (1999)
Article Google Scholar
Wang, Y., Makedon, F., Ford, J., Pearlman, J.: Hykgene: a Hybrid Approach for Selecting Marker Genes for Phenotype Classification Using Microarray Gene Expression Data. Bioinformatics 21, 1530–1537 (2005)
Article Google Scholar
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformaitcs 17, 1131–1142 (2001)
Article Google Scholar
Cho, J.H., Lee, D., Park, J.H., Lee, I.B.: Gene Selection and Classification from Microarray Data Using Kernel Machine. FEBS Letters 571, 93–98 (2004)
Article Google Scholar
Li, J., Wong, L.: Identifying Good Diagnostic Gene Groups from Gene Expressin Profiles Using the Concept of Emerging Patterns. Bioinformatics 18, 725–734 (2002)
Article Google Scholar
Lai, Y., Wu, B., Chen, L., Zhao, H.: Statistical Method for Identifying Differential Gene-Gene Coexpression Patterns. Bioinformatics 21, 1565–1571 (2005)
Article Google Scholar
Broet, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A Mixture Model-Based Strategy for Selecting Sets of Genes in Multiclass Response Microarray Experiments. Bioinformatics 20, 2562–2571 (2004)
Article Google Scholar
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al.: Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling. Nature 403, 503–511 (2000)
Article Google Scholar
Khan, J.M., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., et al.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7, 673–679 (2001)
Article Google Scholar
Deutsch, J.M.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinformatics 19, 45–52 (2003)
Article Google Scholar
Devore, J., Peck, R.: Statistics: the Exploration and Analysis of Data, 3rd edn. Duxbury Press, Pacific Grove (1997)
Google Scholar
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature Selection for High-Dimensional Genomic Microarray Data. In: Proc. of the 18th International Conference on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wang, L.P. (ed.): Support Vector Machines: Theory and Applications. Springer, Berlin (2005)
MATH Google Scholar
Devijver, P., Kittler, J.: Pattern Recognition: a Statistical Approach. Prentice Hall, London (1982)
MATH Google Scholar
Fu, X., Wang, L.P.: Data Dimensionality Reduction with Application to Simplifying RBF Network Structure and Improving Classification Performance. IEEE Trans. on Systems, Man, and Cybernetics-Part b: Cybernetics 33, 399–409 (2003)
Article Google Scholar
Ji, S., Krishnapuram, B., Carin, L.: Hidden Markov Models and Its Application to Active Learning. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 522–532 (2006)
Article Google Scholar
Riccardi, G., Hakkani-Tur, D.: Active Learning: Theory and Application to Automatic Speech Recognition. IEEE Trans. on Speech and Audio Processing 13, 504–511 (2005)
Article Google Scholar
Liu, X., Krishnan, A., Mondry, A.: An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data. BMC Bioinformatics 6, 76 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Engineering, Xiangtan University, Xiangtan, Hunan, China
Feng Chu & Lipo Wang
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Feng Chu & Lipo Wang

Authors

Feng Chu
View author publications
You can also search for this author in PubMed Google Scholar
Lipo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Berkeley Initiative in Soft Computing (BISC), University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Department of Electrical Engineering, University of Louisville, 40292, Louisville, KY, U.S.A
Jacek M. Żurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chu, F., Wang, L. (2006). Active Mining Discriminative Gene Sets. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_92

Download citation

DOI: https://doi.org/10.1007/11785231_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35748-3
Online ISBN: 978-3-540-35750-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active Mining Discriminative Gene Sets

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A novel gene selection algorithm for cancer classification using microarray datasets

Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data

A Comparative Study of Gene Selection Methods for Microarray Cancer Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Active Mining Discriminative Gene Sets

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A novel gene selection algorithm for cancer classification using microarray datasets

Role of Pre-processing in Gene Selection Using DNA Microarray Gene Expression Data

A Comparative Study of Gene Selection Methods for Microarray Cancer Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation