Abstract
Parallel processing is frequently used in bioinformatics programs and in Database Management Systems to improve their performance. Parallelism can be also used to improve performance of a combination of programs in bioinformatics workflows. This work presents a characterization of parallel processing in scientific workflows and shows real experimental results with different configurations for data and programs distribution within bioinformatics workflow execution. The implementation was done with real structural genomic and automatic comparative annotation workflows and the experiments run on a cluster of PCs.
This work is partially funded by CNPQ and CAPES Brazilian Agencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aalst, V., Hofstede, A., Kiepuszewski, B., Barros, A.: Workflow patterns. Distributed and Parallel Databases 14(3), 5–51 (2003)
Altintas, I., Bhagwanani, S., Buttler, D., Chandra, S., Cheng, Z., Coleman, M., Critchlow, T., Gupta, A., Han, W., Liu, L., Ludascher, B., Pu, C., Shoshani, A., Vouk, M.: A Modelling and Execution Environment for Distributed Scientific Workflows. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management, SSDBM 2003, Massachusetts, USA, pp. 247–250 (2003)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol., 215–403 (1990)
Bausch, W., Pautasso, C., Schaeppi, R., Alonso, G.: BioOpera: Cluster-aware Computing. In: Proceedings of the 4th IEEE International Conference on Cluster Computing, Chicago, USA (2002)
Bausch, W., Pautasso, C., Schaeppi, R., Alonso, G.: Programming for Dependability in a Service-Based Grid. In: 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan (2003)
Bhowmick, S., Singh, D., Laud, A.: Data Management in Metaboloinformatics: Issues and Challenges. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 392–402. Springer, Heidelberg (2003)
Bio Grid, http://www.eurogrid.org/wp1.html
Braun, R., Pedretti, K., Casavant, T., Scheetz, T., Birkett, C., Roberts, C.: Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Cluster. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662, pp. 271–282. Springer, Heidelberg (1999)
Cavalcanti, M., Baião, F., Rössle, S., Bisch, P., Targino, R., Pires, P., Campos, M., Mattoso, M.: Structural Genomic Workflows Supported by Web Services. In: DEXA 2003, International Workshop on Biological Data Management (BIDM 2003), Prague, Czech Republic, pp. 45–50. IEEE CS Press, Los Alamitos (2003)
Cavalcanti, M., Targino, R., Baião, F., Rössle, S., Bisch, P., Pires, P., Campos, M., Mattoso, M.: Managing Structural Genomic Workflows Using Web Services. submitted to Data and Knowledge Engineering Journal special issue on Bioinformatics (2004)
Costa, R., Lifschitz, S.: Database Allocation Strategies for Parallel BLAST Evaluation on Clusters. Distributed and Parallel Databases 13(1), 99–127 (2003)
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping Abstract Complex Workflows onto Grid Environments. Journal of Grid Computing 1(1), 25–39 (2003)
Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A Virtual Data System for Representing, Querying and Automating Data Derivation. In: Proceedings of the 14th Conference on Scientific and Statistical Database Management, Edinburgh, Scotland (2002)
Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. In: Proceedings of the First Bienal Conference on Innovative Data System Research – CIDR 2003, Asilomar, CA, USA (2003)
GenBank, http://www.ncbi.nlm.nih.gov/Genbank
Goble, C., De Roure, D.: The Grid: An Application of the Semantic Web. SIGMOD Record 31(4), 65–70 (2002)
Greenwood, M., Wroe, C., Stevens, R., Goble, C., Addis, M.: Are bioinformaticians doing e-Business? In: Proceedings Euroweb 2002: The Web and the GRID - from e-science to e-business, Oxford, UK (2002)
Hall, D., Miller, J., Arnold, J., Kochut, K., Sheth, A., Weise, M.: Using Workflow to Build an Information Management System for a Geographically Distributed Genome Sequence Initiative. In: Dekker, M. (ed.) Genomics of Plants and Fungi, New York,pp. 359–371 (2003)
Kennedy, K., Mazina, M., Crummey, J., Cooper, K., Torczon, L., Berman, F., Chien, A., Dail, H., Sievert, O., Angulo, D., Foster, I., Gannon, D., Johnson, L., Kesselman, C., Aydt, R., Reed, D., Dongarra, J., Vadhiyar, S., Wolski, R.: Toward a Framework for preparing and Executing Adaptative Grid Programs. In: Proceedings of Next Generation Systems Program Workshop, International Parallel and Distributed Processing Symposium, Fort Lauderdale, Florida, USA (2002)
Kochut, K., Arnold, J., Seth, A., Miller, J., Kraemer, E., Arpinar, B., Cardoso, J.: IntelliGEN: A Distributed Workflow System for Discovering Protein-Protein Interactions. Distributed and Parallel Databases 13(1), 43–72 (2003)
Miller, R.T., Jones, D.T., Thornton, J.M.: Protein fold recognition by sequence threading tools and assessment techniques. FASEB Journal 10, 171–178 (1996)
ModBase, http://salilab.org/modbase/
Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
Pappas, A.: Parallelizing the Blast applications on a network of Dec Alpha Workstations, available at http://www.cslab.ece.ntua.gr/~pappas
Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. In: Proceedings of National Academy of Sciences of United States of America (1988)
Protein Data Bank, http://www.rcsb.org/pdb/
Rössle, S., Carvalho, P., Dardenne, L., Bisch, P.: Development of a Computational Environment for Protein Structure Prediction and Functional Analysis. In: Second Brazilian Workshop on Informatics, Macaé, Brazil (2003)
Sali, A., Blundell, T.L.: Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993)
Stevens, R., Robinson, A., Goble, C.: myGrid: Personalised Bioinformatics on the Information Grid. In: Bioinformatics, Eleventh International Conference on Intelligent Systems for Molecular Biology, vol. 19(1) (2003)
SwissProt, http://www.ebi.ac.uk/swissprot
Waugh, A., Willians, G., Wei, L., Altman, R.: Using Metacomputing Tools To Facilitate Large-Scale Analyses of Biological Databases. In: Proceedings of Pacific Symposium of Biocomputing, pp. 360–371 (2001)
Yarkhan, A., Dongarra, J.: Biological Sequence Alignment On The Computational Grid Using The Grads Framework. Submitted to Journal on Grid Computing (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Meyer, L.A.V.C., Rössle, S.C., Bisch, P.M., Mattoso, M. (2005). Parallelism in Bioinformatics Workflows . In: Daydé, M., Dongarra, J., Hernández, V., Palma, J.M.L.M. (eds) High Performance Computing for Computational Science - VECPAR 2004. VECPAR 2004. Lecture Notes in Computer Science, vol 3402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11403937_44
Download citation
DOI: https://doi.org/10.1007/11403937_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25424-9
Online ISBN: 978-3-540-31854-5
eBook Packages: Computer ScienceComputer Science (R0)