Abstract
To guarantee meaningful interpretation of data in basic and translational medicine, it is critical to ensure the quality of biological samples. Mass spectrometers have become promising instruments to acquire proteomic information that is known to be associated with the quality of samples. However, a universally applicable mass spectrometry data analysis platform for quality assessment remains of great need. We present a comprehensive pattern recognition study to facilitate the development of such a platform. This study involves feature extraction, binary classification, and feature ranking. In this study, we develop classifiers with classification accuracy higher than 90% in distinguishing human serum samples stored for different amounts of time. We also derive fingerprint patterns of serum peptides that can be conveniently used for temporal classification.
Similar content being viewed by others
References
Ayache S et al (2006) Effects of storage time and exogenous protease inhibitors on plasma protein levels. Am J Clin Pathol 126(2):174. https://doi.org/10.1309/3WM7XJ7RD8BCLNKX
Baggerly KA, Morris JS, Coombes KR (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20(5):777–785
Ball G et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., Secaucus isbn: 0387310738
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Carvalho PC et al (2008) Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res 7(2):342
Chaigneau C et al (2007) Serum biobank certification and the establishment of quality controls for biological fluids: examples of serum biomarker stability after temperature variation. Clin Chem Lab Med 45(10):1390–1395
Datta S, DePadilla LM (2006) Feature selection and machine learning with mass spectrometry data for distinguishing cancer and noncancer samples. Stat Methodol 3(1):79–92
Jackson DH, Banks RE (2010) Banking of clinical samples for proteomic biomarker studies: a consideration of logistical issues with a focus on pre-analytical variation. Proteomics Clin Appl 4(3):250–270
Jenkins MA (2004) Quality control and quality assurance aspects of the routine use of capillary electrophoresis for serum and urine proteins in clinical laboratories. Electrophoresis 25(10–11):1555–1560
Kozak KR et al (2003) Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: potential use in diagnosis and prognosis. Proc Natl Acad Sci 100(21):12343–12348
Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6(1):1
Liang K et al (2016) Mesoporous silica chip: enabled peptide profiling as an effective platform for controlling bio-sample quality and optimizing handling procedure. Clin Proteomics 13(1):34. issn: 1559–0275. https://doi.org/10.1186/s12014-016-9134-9
Ostroff R et al (2010) The stability of the circulating human proteome to variations in sample collection and handling procedures measured with an aptamer-based proteomics array. J Proteomics 73(3):649–666
Papadopoulos MC et al (2004) A novel and accurate diagnostic test for human African trypanosomiasis. Lancet 363(9418):1358–1363
Petricoin EF et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577
Pieragostino D et al (2010) Pre-analytical factors in clinical proteomics investigations: impact of ex vivo protein modifications for multiple sclerosis biomarker discovery. J Proteomics 73(3):579–592. Blood Proteomics, issn: 1874–3919. https://doi.org/10.1016/j.jprot.2009.07.014 http://www.sciencedirect.com/science/article/pii/S1874391909002395
Rai AJ et al (2005) HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5(13):3262–3277
Russell SJ et al (2003) Artificial intelligence: a modern approach. Vol. 2. Prentice hall, Upper Saddle River
Sorace JM, Zhan M (2003) A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4(1):1
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Society Ser B (Methodol) 267–288
Tibshirani R et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20(17):3034–3044
Veenstra TD et al (2005) Biomarkers: mining the biofluid proteome. Mol Cell Proteomics 4(4):409–418. https://doi.org/10.1074/mcp.M500006-MCP200 eprint: http://www.mcponline.org/content/4/4/409.full.pdf+html. url: http://www.mcponline.org/content/4/4/409.abstract
Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L, Fleisher M, Robbins RJ, Tempst P (2005) Correcting common errors in identifying cancer-specific serum peptide signatures. J Proteome Res 4(4):1060–1072
Wagner M, Naik D, Pothen A (2003) Protocols for disease classification from mass spectrometry data. Proteomics 3(9):1692–1698
Won Y et al (2003) Pattern analysis of serum proteome distinguishes renal cell carcinoma from other urologic diseases and healthy persons. Proteomics 3(12):2310–2316
Wu B et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643
Yasui Y et al (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3):449–463
Yu JS et al (2005) Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10):2200–2209
Zhang X et al (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7(1):1
Funding
This study received financial support from NSF grant DMS#1246818 and an industry grant from the Chinese Academy of Sciences Holding Co., Ltd.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Before storage, the samples were left at room temperature for 1 h in order to allow coagulation and then centrifuged at 4° C for 15 min at 1400×g. In order to avoid fluid in the buffy-coat layer, serum was aspirated and collected in polypropylene tubes. After aliquoting, the samples were then stored in one of two conditions, room temperature or 4° C. For both cohorts, each sample’s mass spectrometer data was collected the day the sample was taken and then 1, 2, 5, and 10 days after that. This data was collected using a 1-μL sample that was processed by a mesoporous silicon wafer that was prepared by pre-baking in an oven at 120° C. This sample was spotted on the MALDI target plate and then allowed to air-dry. Afterwards, a 1-μL matrix in 50% acetonitrile containing 0.1% TFA was spotted on the dried sample spot. This sample was allowed to co-crystallize. The mass spectrum data was obtained by using a SHIMADZU AXIMA Resonance MALDI-IT-TOF equipped with a nitrogen laser emitting light at 337 nm. It had an adjustable mass range of 800 to 4000 Da. The positive ion was detected under reflective mode. After taking 500 laser shots, the spectra were usually averaged to find the final sample spectrum. The optimized accelerating voltage was 50 kV.
Rights and permissions
About this article
Cite this article
Manchanda, S., Meyer, M., Li, Q. et al. On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples. J Healthc Inform Res 2, 305–318 (2018). https://doi.org/10.1007/s41666-018-0022-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-018-0022-0