A Clustering Based Hybrid System for Mass Spectrometry Data Analysis

Pengyi Yang⁴ &
Zili Zhang^4,5

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5265))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1104 Accesses
3 Citations

Abstract

Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.

Download to read the full chapter text

Chapter PDF

MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data

Article Open access 21 September 2023

A Multi-objective Genetic Programming Biomarker Detection Approach in Mass Spectrometry Data

On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples

Article 22 May 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)
Article CAS PubMed Google Scholar
Petricoin, E.F., Liotta, L.A.: SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cencer. Curr. Opin. Biotechnol. 15, 24–30 (2004)
Article CAS PubMed Google Scholar
Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A.M., Hackett, P.S., Hitt, B.A., Velassco, A., Trucco, C., Wiegand, L., Wood, K., Simone, C.B., Levine, P.J., Linehan, W.M., Emmert-Buck, M.R., Steinberg, S.M., Kohn, E.C., Liotta, L.A.: Serum Proteomic Patterns for Detection of Prostate Cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)
Article CAS PubMed Google Scholar
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., SteinBerg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)
Article CAS Google Scholar
Li, L., Umbach, D.M., Terry, P., Taylor, J.A.: Application of the GA/KNN method to SELDI proteomics data. Bioinformatics 20(10), 1638–1640 (2004)
Article CAS PubMed Google Scholar
Yu, J.S., Ongarello, S., Fiedler, R., Chen, X.W., Toffolo, G., Cobelli, C., Trajanoski, Z.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10), 2200–2209 (2005)
Article CAS PubMed Google Scholar
Boguski, M.S., McIntosh, M.W.: Biomedical informatics for proteomics. Nature 422, 233–236 (2003)
Article CAS PubMed Google Scholar
Somorjai, R.L., Dolenko, B., Baumgartner, R.: Class prediction and discovery using gene microarray and protenomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19(12), 1484–1491 (2003)
Article CAS PubMed Google Scholar
Ding, C., Peng, H.: Minimum Redundancy Feature Selection From Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)
Article CAS PubMed Google Scholar
Golub, T.R., Tamayo, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Boomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article CAS PubMed Google Scholar
Liu, H., Li, J., Wang, L.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics 13, 51–60 (2002)
CAS PubMed Google Scholar
Su, Y., Murali, T., Pavlovic, V., Schaffer, M., Kasif, S.: RankGene: Identification of Diagnostic Genes Based on Expression Data. Bioinformatics 19(12), 1578–1579 (2003)
Article CAS PubMed Google Scholar
Kohavi, R., John, G.: Wrapper for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article Google Scholar
Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Clssification of Microarrays. Pac. Symp. Biocomput., 53–64 (2003)
Google Scholar
Jirapech-Umpai, T., Aitken, S.: Feature Selection and Classification for Microarray Data Analysis: Evolutionary Methods for Identifying Predictive Genes. BMC Bioinformatics 6, 146 (2005)
Article Google Scholar
Yang, P.Y., Zhang, Z.L.: Hybrid Methods to Select Informative Gene Sets in Microarray Data Classification. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 811–815. Springer, Heidelberg (2007)
Google Scholar
Yang, P.Y., Zhang, Z.L.: A Hybrid Approach to Selecting Susceptible Single Nucleotide Polymorphisms for Complex Disease Analysis. In: Proceedings of BMEI 2008, pp. 214–218. IEEE, Los Alamitos (2008)
Google Scholar
Quinlan, J.R.: Learning efficient classification procedures and their applicaiton to chess and games. In: Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, San Mateo (1983)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Blum, A.L., Langley, P.: Selection of relevent Features and Examples in Machine Learning. Artificial Intelligence 97(1-2), 245–271 (1997)
Article Google Scholar
Geurts, P., Fillet, M., de Seny, D., Meuwis, M.A., Malaise, M., Merville, M.P., Wehenkel, L.: Proteomic mass spectra classifcation using decision tree based ensemble methods. Bioinformatics 21, 3138–3145 (2005)
Article CAS PubMed Google Scholar
Wang, Y., Makedon, F., Ford, J., Pearlman, J.: HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype Classification using Microarray Gene Expression Data. Bioinformatics 21(8), 1530–1537 (2005)
Article CAS PubMed Google Scholar
Zhang, Z.L., Yang, P.Y.: An Ensemble of Classifier with Genetic Algorithm Based Feature Selection (accepted by IEEE Intelligent Informatics Bulletin)
Google Scholar
Cai, Z., Goebel, R., Salavatipour, M.R., Lin, G.: Selecting Dissimilar Genes for Multi-Class Classification, an Application in Cancer Subtyping. BMC Bioinformatics 8, 206 (2007)
Article PubMed PubMed Central Google Scholar
Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clement, K., Zucker, J.-D.: Improving classification of microarray data using prototype-based feature selection. SIGKDD Explorations 5, 23–30 (2003)
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Software and Software Engineering Laboratory, Faculty of Computer and Information Science, Southwest University, Chongqing, 400715, China
Pengyi Yang & Zili Zhang
School of Engineering and Information Technology, Deakin University, Geelong, Victoria, 3217, Australia
Zili Zhang

Authors

Pengyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zili Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gippsland School of IT, Monash University, 3842, Churchill, Victoria, Australia
Madhu Chetty
University of Windsor, 401 Sunset Avenue, Windsor, N9B 3P4, Ontario, Canada
Alioune Ngom
National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, 5670085, Osaka, Japan
Shandar Ahmad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, P., Zhang, Z. (2008). A Clustering Based Hybrid System for Mass Spectrometry Data Analysis. In: Chetty, M., Ngom, A., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2008. Lecture Notes in Computer Science(), vol 5265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88436-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-88436-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88434-7
Online ISBN: 978-3-540-88436-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)