Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3402))

Abstract

Parallel processing is frequently used in bioinformatics programs and in Database Management Systems to improve their performance. Parallelism can be also used to improve performance of a combination of programs in bioinformatics workflows. This work presents a characterization of parallel processing in scientific workflows and shows real experimental results with different configurations for data and programs distribution within bioinformatics workflow execution. The implementation was done with real structural genomic and automatic comparative annotation workflows and the experiments run on a cluster of PCs.

This work is partially funded by CNPQ and CAPES Brazilian Agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aalst, V., Hofstede, A., Kiepuszewski, B., Barros, A.: Workflow patterns. Distributed and Parallel Databases 14(3), 5–51 (2003)

    Article  Google Scholar 

  2. Altintas, I., Bhagwanani, S., Buttler, D., Chandra, S., Cheng, Z., Coleman, M., Critchlow, T., Gupta, A., Han, W., Liu, L., Ludascher, B., Pu, C., Shoshani, A., Vouk, M.: A Modelling and Execution Environment for Distributed Scientific Workflows. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management, SSDBM 2003, Massachusetts, USA, pp. 247–250 (2003)

    Google Scholar 

  3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol., 215–403 (1990)

    Google Scholar 

  4. Bausch, W., Pautasso, C., Schaeppi, R., Alonso, G.: BioOpera: Cluster-aware Computing. In: Proceedings of the 4th IEEE International Conference on Cluster Computing, Chicago, USA (2002)

    Google Scholar 

  5. Bausch, W., Pautasso, C., Schaeppi, R., Alonso, G.: Programming for Dependability in a Service-Based Grid. In: 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan (2003)

    Google Scholar 

  6. Bhowmick, S., Singh, D., Laud, A.: Data Management in Metaboloinformatics: Issues and Challenges. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 392–402. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Bio Grid, http://www.eurogrid.org/wp1.html

  8. BLAST, http://www.ncbi.nlm.nih.gov/BLAST

  9. Braun, R., Pedretti, K., Casavant, T., Scheetz, T., Birkett, C., Roberts, C.: Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Cluster. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662, pp. 271–282. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  10. Cavalcanti, M., Baião, F., Rössle, S., Bisch, P., Targino, R., Pires, P., Campos, M., Mattoso, M.: Structural Genomic Workflows Supported by Web Services. In: DEXA 2003, International Workshop on Biological Data Management (BIDM 2003), Prague, Czech Republic, pp. 45–50. IEEE CS Press, Los Alamitos (2003)

    Google Scholar 

  11. Cavalcanti, M., Targino, R., Baião, F., Rössle, S., Bisch, P., Pires, P., Campos, M., Mattoso, M.: Managing Structural Genomic Workflows Using Web Services. submitted to Data and Knowledge Engineering Journal special issue on Bioinformatics (2004)

    Google Scholar 

  12. Costa, R., Lifschitz, S.: Database Allocation Strategies for Parallel BLAST Evaluation on Clusters. Distributed and Parallel Databases 13(1), 99–127 (2003)

    Article  MATH  Google Scholar 

  13. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping Abstract Complex Workflows onto Grid Environments. Journal of Grid Computing 1(1), 25–39 (2003)

    Article  Google Scholar 

  14. Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A Virtual Data System for Representing, Querying and Automating Data Derivation. In: Proceedings of the 14th Conference on Scientific and Statistical Database Management, Edinburgh, Scotland (2002)

    Google Scholar 

  15. Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. In: Proceedings of the First Bienal Conference on Innovative Data System Research – CIDR 2003, Asilomar, CA, USA (2003)

    Google Scholar 

  16. GenBank, http://www.ncbi.nlm.nih.gov/Genbank

  17. Goble, C., De Roure, D.: The Grid: An Application of the Semantic Web. SIGMOD Record 31(4), 65–70 (2002)

    Article  Google Scholar 

  18. Greenwood, M., Wroe, C., Stevens, R., Goble, C., Addis, M.: Are bioinformaticians doing e-Business? In: Proceedings Euroweb 2002: The Web and the GRID - from e-science to e-business, Oxford, UK (2002)

    Google Scholar 

  19. Hall, D., Miller, J., Arnold, J., Kochut, K., Sheth, A., Weise, M.: Using Workflow to Build an Information Management System for a Geographically Distributed Genome Sequence Initiative. In: Dekker, M. (ed.) Genomics of Plants and Fungi, New York,pp. 359–371 (2003)

    Google Scholar 

  20. Kennedy, K., Mazina, M., Crummey, J., Cooper, K., Torczon, L., Berman, F., Chien, A., Dail, H., Sievert, O., Angulo, D., Foster, I., Gannon, D., Johnson, L., Kesselman, C., Aydt, R., Reed, D., Dongarra, J., Vadhiyar, S., Wolski, R.: Toward a Framework for preparing and Executing Adaptative Grid Programs. In: Proceedings of Next Generation Systems Program Workshop, International Parallel and Distributed Processing Symposium, Fort Lauderdale, Florida, USA (2002)

    Google Scholar 

  21. Kochut, K., Arnold, J., Seth, A., Miller, J., Kraemer, E., Arpinar, B., Cardoso, J.: IntelliGEN: A Distributed Workflow System for Discovering Protein-Protein Interactions. Distributed and Parallel Databases 13(1), 43–72 (2003)

    Article  MATH  Google Scholar 

  22. Miller, R.T., Jones, D.T., Thornton, J.M.: Protein fold recognition by sequence threading tools and assessment techniques. FASEB Journal 10, 171–178 (1996)

    Google Scholar 

  23. ModBase, http://salilab.org/modbase/

  24. NT, ftp://www.ncbi.nih.gov/blast/db/FASTA/nt.gz

  25. Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, Englewood Cliffs (1999)

    Google Scholar 

  26. Pappas, A.: Parallelizing the Blast applications on a network of Dec Alpha Workstations, available at http://www.cslab.ece.ntua.gr/~pappas

  27. PDBAA, ftp://www.ncbi.nih.gov/blast/db

  28. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. In: Proceedings of National Academy of Sciences of United States of America (1988)

    Google Scholar 

  29. Protein Data Bank, http://www.rcsb.org/pdb/

  30. Rössle, S., Carvalho, P., Dardenne, L., Bisch, P.: Development of a Computational Environment for Protein Structure Prediction and Functional Analysis. In: Second Brazilian Workshop on Informatics, Macaé, Brazil (2003)

    Google Scholar 

  31. Sali, A., Blundell, T.L.: Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993)

    Article  Google Scholar 

  32. Stevens, R., Robinson, A., Goble, C.: myGrid: Personalised Bioinformatics on the Information Grid. In: Bioinformatics, Eleventh International Conference on Intelligent Systems for Molecular Biology, vol. 19(1) (2003)

    Google Scholar 

  33. SwissProt, http://www.ebi.ac.uk/swissprot

  34. Waugh, A., Willians, G., Wei, L., Altman, R.: Using Metacomputing Tools To Facilitate Large-Scale Analyses of Biological Databases. In: Proceedings of Pacific Symposium of Biocomputing, pp. 360–371 (2001)

    Google Scholar 

  35. Yarkhan, A., Dongarra, J.: Biological Sequence Alignment On The Computational Grid Using The Grads Framework. Submitted to Journal on Grid Computing (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meyer, L.A.V.C., Rössle, S.C., Bisch, P.M., Mattoso, M. (2005). Parallelism in Bioinformatics Workflows . In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds) High Performance Computing for Computational Science - VECPAR 2004. VECPAR 2004. Lecture Notes in Computer Science, vol 3402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11403937_44

Download citation

  • DOI: https://doi.org/10.1007/11403937_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25424-9

  • Online ISBN: 978-3-540-31854-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics