Abstract
Mass spectrometry is becoming an important tool in biological sciences. Tissue samples or easily obtained biological fluids (serum, plasma, urine) are analysed by a variety of mass spectrometry methods, producing spectra characterized by very high dimensionality and a high level of noise. Here we address a feature exraction method for mass spectra which consists of two main steps : In the first step an algorithm for low level preprocessing of mass spectra is applied, including denoising with the Shift-Invariant Discrete Wavelet Transform (SIDWT), smoothing, baseline correction, peak detection and normalization of the resulting peak-lists. After this step, we claim to have reduced dimensionality and redundancy of the initial mass spectra representation while keeping all the meaningful features (potential biomarkers) required for disease related proteomic patterns to be identified. In the second step, the peak-lists are alligned and fed to a Support Vector Machine (SVM) which classifies the mass spectra. This procedure was applied to SELDI-QqTOF spectral data collected from normal and ovarian cancer serum samples. The classification performance was assessed for distinct values of the parameters involved in the feature extraction pipeline. The method described here for low-level preprocessing of mass spectra results in 98.3% sensitivity, 98.3% specificity and an AUC (Area Under Curve) of 0.981 in spectra classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.-C., Kuerer, H.M.: Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform. Proteomics 5(16), 4107–4117 (2005)
Kalousis, A., Prados, J., Rexhepaj, E., Hilario, M.: Feature extraction from mass spectral data for the classification of pathological states. In: Principles of Data Mining and Knowledge Discoverty, Ninth European Conference. Springer, Heidelberg (2005)
Wolski, W.E., Lalowski, M., Martus, P., Herwig, R., Giavalisco, P., Gobom, J., Sickmann, A., Lehrach, H., Reinert, K.: Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process. BMC Bioinformatics 6, 285 (2005)
Zhang, X., Lu, X., Shi, Q., Xu, X.Q., Leung, H.C., Harris, L.N., Iglehart, J.D., Miron, A., Liu, J.S., Wong, W.H.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics. 7, 197 (2006)
Wagner, M., Naik, D., Pothen, A.: Protocols for disease classification from mass spectrometry data. Proteomics 3(9), 1692–1698 (2003)
Qu, Y., Adam, B.I., Thornquist, M., Potter, J.D., Thompson, M.L., Yasui, Y., Davis, J., Schellhammer, P.F., Cazares, L., Clements, M., Wright, G.L., Feng, Z.: Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensional data. Biometrics 59, 143–151 (2003)
Lee, K.R., Lin, X., Park, D., Eslava, S.: Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method. Proteomics 3 (2003)
Conrads, T.P., Fusaro, V.A., Ross, S., Johann, D., Rajapakse, V., Hitt, B.A., Steinberg, S.M., Kohn, E.C., Fishman, D.A., Whitely, G., Barrett, J.C., Liotta, L.A., Petricoin III, E.F., Veenstra, T.D.: High-resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer 11, 163–178 (2004)
Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Wells Jr., R.O.: Nonlinear processing of a shift invariant DWT for noise reduction. In: Mathematical Imaging: Wavelet Applications for Dual Use, SPIE Proceedings, Orlando FL, vol. 2491 (1995)
Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Wells Jr., R.O.: Noise Reduction Using an Undecimated Discrete Wavelet Transform. IEEE Signal Processing Letters 3, 10–12 (1996)
Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41(3), 613–627 (1995)
Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrica 81, 425–455 (1994), Also Tech.Report 400, Department of Statistics, Stanford University (July 1992)
Beylkin, G.: On the representation of operators in bases of compactly supported wavelets. SIAM J. Numer. Anal. 29(6), 1716–1740 (1996)
Andrade, L., Manolakos, E.: Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing. Journal of VLSI, special issue on Bioinformatics 35(3), 229–243 (2003)
Alfassi Zeen, B.: On the normalization of a mass spectrum for comparison of two spectra (2004)
Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learing Algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3), 299–310 (2005)
Ovarian Cancer DataSet, http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
Rice Wavelet Toolbox Licence, http://www.dsp.rice.edu/software/RWT/LICENSE
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Floros, X.E., Spyrou, G.M., Vougas, K.N., Tsangaris, G.T., Nikita, K.S. (2006). Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_35
Download citation
DOI: https://doi.org/10.1007/11946465_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68063-5
Online ISBN: 978-3-540-68065-9
eBook Packages: Computer ScienceComputer Science (R0)