Abstract
Tandem mass spectrometry is a well-known technique for identification of protein sequences from an ”in vitro” sample. To identify the sequences from spectra captured by a spectrometer, the similarity search in a database of hypothetical mass spectra is often used. For this purpose, a database of known protein sequences is utilized to generate the hypothetical spectra. Since the number of sequences in the databases grows rapidly over the time, several approaches have been proposed to index the databases of mass spectra. In this paper, we improve an approach based on the non-metric similarity search where the M-tree and the TriGen algorithm are employed for fast and approximative search. We show that preprocessing of mass spectra by clustering speeds up the identification of sequences more than 100× with respect to the sequential scan of the entire database. Moreover, when the protein candidates are refined by sequential scan in the postprocessing step, the whole approach exhibits precision similar to that of sequential scan over the entire database (over 90%).
This work was supported by Czech Science Foundation (GAČR) projects P202/11/0968, P202/12/P297, 201/09/H057 and by the Grant Agency of Charles University (GAUK) project Nr. 430711.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alfassi, Z.B.: On the Normalization of a Mass Spectrum for Comparison of Two Spectra. Journal of the Am. Soc. for Mass Spec. 15(3), 385–387 (2004)
Beer, I., Barnea, E., Ziv, T., Admon, A.: Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4, 950–960 (2004)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: VLDB, pp. 426–435 (1997)
Dutta, D., Chen, T.: Speeding up Tandem Mass Spectrometry Database Search: Metric Embeddings and Fast Near Neighbor Search. Bioinf. 23(5), 612–618 (2007)
Falkner, J.A., Falkner, J.W., Yocum, A.K., Andrews, P.C.: A spectral clustering approach to MS/MS identification of post-translational modifications. Journal of Proteome Research 7(11), 4614–4622 (2008)
Flikka, K., et al.: Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics 6, 2086–2094 (2006)
Flikka, K., et al.: Implementation and application of a versatile clustering tool for tandem mass spectrometry data. Proteomics 7, 3245–3258 (2007)
Frank, A.M., et al.: Clustering millions of tandem mass spectra. Journal of Proteome Research 7(1), 113–122 (2008)
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Proc. of KDD 1998, pp. 58–65 (1998)
Keller, A., et al.: Experimental Protein Mixture for Validating Tandem Mass Spectral Analysis. OMICS: A Journal of Integrative Biology 6(2), 207–212 (2002)
Li, Y., et al.: Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing. Rapid Comm. Mass Spec. 24(6), 807–814 (2010)
Liu, J., et al.: Methods for peptide identification by spectral comparison. Proteome Science 5(3) (2007)
Lu, B., Chen, T.: A Suffix Tree Approach to the Interpretation of Tandem Mass Spectra: Applications to Peptides of Non-specific Digestion and Post-translational Modifications. Bioinformatics 19(suppl.2), ii113–ii121 (2003)
Mao, R., Ramakrishnan, S.R., Nuckolls, G., Miranker, D.P.: An inverted index for mass spectra similarity query and comparison with a metric-space method: case study. In: SISAP 2010, pp. 93–99 (2010)
Nesvizhskii, A.I.: A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics 73(11), 2092–2123 (2010)
Nesvizhskii, A.I., et al.: Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data. Molecular & Cellular Proteomics 5, 652–670 (2006)
Novák, J., Hoksza, D.: Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry. In: CEUR Proc. DATESO, pp. 1–12 (2010)
Novák, J., Skopal, T., Hoksza, D., Lokoč, J.: Non-metric Similarity Search of Tandem Mass Spectra Including Posttranslational Modifications. Journal of Discrete Algorithms (2011), http://dx.doi.org/10.1016/j.jda.2011.10.003
Park, C.Y., et al.: Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome research 7(7), 3022–3027 (2008)
Pevzner, P.A., Mulyukov, Z., Dančík, V., Tang, C.L.: Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry. Genome Research 11(2), 290–299 (2001)
Ramakrishnan, S.R., et al.: A Fast Coarse Filtering Method for Peptide Identification by Mass Spectrometry. Bioinformatics 22(12), 1524–1531 (2006)
Renard, B.Y., et al.: When less can yield more - Computational preprocessing of MS/MS spectra for peptide identification. Proteomics 9, 4978–4984 (2009)
Sadygov, R.G., et al.: Large-scale Database Searching Using Tandem Mass Spectra: Looking up the Answer in the Back of the Book. Nature Met. 1(3), 195–202 (2004)
Salmi, J., Nyman, T.A., Nevalainen, O.S., Aittokallio, T.: Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 9, 848–860 (2009)
Skopal, T.: Unified Framework for Fast Exact and Approximate Search in Dissimilarity Spaces. ACM Transactions on Database Systems 32(4), 29 (2007)
Skopal, T., Lokoč, J.: NM-Tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 312–325. Springer, Heidelberg (2008)
Tabb, D.L., et al.: Similarity among Tandem Mass Spectra from Proteomic Experiments: Detection, Significance and Utility. Anal. Chem. 75(10) (2003)
Wang, J., et al.: Peptide identification from mixture tandem mass spectra. Molecular & Cellular Proteomics 9(7), 1476–1485 (2010)
Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Transactions on neural networks 16(3), 645–678 (2005)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems. Springer, USA (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Novák, J., Hoksza, D., Lokoč, J., Skopal, T. (2012). On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering. In: Bleris, L., Măndoiu, I., Schwartz, R., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2012. Lecture Notes in Computer Science(), vol 7292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30191-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-30191-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30190-2
Online ISBN: 978-3-642-30191-9
eBook Packages: Computer ScienceComputer Science (R0)