Indexing and Searching a Mass Spectrometry Database

Søren Besenbacher¹⁹,
Benno Schwikowski²⁰ &
Jens Stoye²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6060))

1141 Accesses
1 Citations

Abstract

Database preprocessing in order to create an index often permits considerable speedup in search compared to the iterated query of an unprocessed database. In this paper we apply index-based database lookup to a range search problem that arises in mass spectrometry-based proteomics: given a large collection of sparse integer sets and a sparse query set, find all the sets from the collection that have at least k integers in common with the query set. This problem arises when searching for a mass spectrum in a database of theoretical mass spectra using the shared peaks count as similarity measure. The algorithms can easily be modified to use the more advanced shared peaks intensity measure instead of the shared peaks count. We introduce three different algorithms solving these problems. We conclude by presenting some experiments using the algorithms on realistic data showing the advantages and disadvantages of the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

PepQuery2 democratizes public MS proteomics data for rapid peptide searching

Article Open access 18 April 2023

Comparing Peptide Spectra Matches Across Search Engines

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

Article 10 April 2017

References

Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422(6928), 198–207 (2003)
Article Google Scholar
Dutta, D., Chen, T.: Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search. Bioinformatics 23(5), 612–618 (2007)
Article Google Scholar
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P., Gygi, S.P.: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22(2), 214–219 (2004)
Article Google Scholar
Frank, A., Tanner, S., Pevzner, P.A.: Peptide sequence tags for fast database search in mass-spectrometry. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 326–341. Springer, Heidelberg (2005)
Google Scholar
Havilio, M., Haddad, Y., Smilansky, Z.: Intensity-based statistical scorer for tandem mass spectrometry. Anal. Chem. 75(3), 435–444 (2003)
Article Google Scholar
Izumi, T., Yokomaru, T., Takahashi, A., Kajitani, Y.: Computational complexity analysis of set-bin-packing problem. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E81-A(5), 842–849 (1998)
Google Scholar
Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., Shoemaker, D.D.: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302(5653), 2141–2144 (2003)
Article Google Scholar
Kersey, P.J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E., Apweiler, R.: The international protein index: An integrated database for proteomics experiments. proteomics 4(7), 1985–1988 (2004)
Article Google Scholar
Mann, M., Jensen, O.N.: Proteomic analysis of post-translational modifications. Nature Biotechnol. 21(3), 255–261 (2003)
Article Google Scholar
Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66(24), 4390–4399 (1994)
Article Google Scholar
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., Rothberg, J.M.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005)
Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Article MATH MathSciNet Google Scholar
Palagi, P.M., Hernandez, P., Walther, D., Appel, R.D.: Proteome informatics I: Bioinformatics tools for processing experimental data. Proteomics 6(20), 5435–5444 (2006)
Article Google Scholar
Ramakrishnan, S.R., Mao, R., Nakorchevskiy, A.A., Prince, J.T., Willard, W.S., Xu, W., Marcotte, E.M., Miranker, D.P.: A fast coarse filtering method for peptide identification by mass spectrometry. Bioinformatics 22(12), 1524–1531 (2006)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11. IEEE Press, Los Alamitos (1973)
Chapter Google Scholar
Whitfield, E.J., Pruess, M., Apweiler, R.: Bioinformatics database infrastructure for biotechnology research. J. Biotechnol. 124(4), 629–639 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

deCODE Genetics, Reykjavik, Iceland
Søren Besenbacher
Institut Pasteur, Systems Biology Group, Paris, France
Benno Schwikowski
Technische Fakultät, Universität Bielefeld, Germany
Jens Stoye

Authors

Søren Besenbacher
View author publications
You can also search for this author in PubMed Google Scholar
Benno Schwikowski
View author publications
You can also search for this author in PubMed Google Scholar
Jens Stoye
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Systems, Tampere University of Technology, P. O. Box 553, 33101, Tampere, Finland
Tapio Elomaa
Department of Information and Computer Science, Aalto University School of Science and Technology, P.O. Box 17800, 00076, Aalto, Finland
Heikki Mannila
Department of Information and Computer Science, Aalto University School of Science and Technology, P.O. Box 15400, 00076, Aalto, Finland
Pekka Orponen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Besenbacher, S., Schwikowski, B., Stoye, J. (2010). Indexing and Searching a Mass Spectrometry Database. In: Elomaa, T., Mannila, H., Orponen, P. (eds) Algorithms and Applications. Lecture Notes in Computer Science, vol 6060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12476-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-12476-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12475-4
Online ISBN: 978-3-642-12476-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics