Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An adaptive parallel execution strategy for cloud-based scientific workflows

Published: 01 September 2012 Publication History

Abstract

Many of the existing large-scale scientific experiments modeled as scientific workflows are compute-intensive. Some scientific workflow management systems already explore parallel techniques, such as parameter sweep and data fragmentation, to improve performance. In those systems, computing resources are used to accomplish many computational tasks in high performance environments, such as multiprocessor machines or clusters. Meanwhile, cloud computing provides scalable and elastic resources that can be instantiated on demand during the course of a scientific experiment, without requiring its users to acquire expensive infrastructure or to configure many pieces of software. In fact, because of these advantages some scientists have already adopted the cloud model in their scientific experiments. However, this model also raises many challenges. When scientists are executing scientific workflows that require parallelism, it is hard to decide a priori the amount of resources to use and how long they will be needed because the allocation of these resources is elastic and based on demand. In addition, scientists have to manage new aspects such as initialization of virtual machines and impact of data staging. SciCumulus is a middleware that manages the parallel execution of scientific workflows in cloud environments. In this paper, we introduce an adaptive approach for executing parallel scientific workflows in the cloud. This approach adapts itself according to the availability of resources during workflow execution. It checks the available computational power and dynamically tunes the workflow activity size to achieve better performance. Experimental evaluation showed the benefits of parallelizing scientific workflows using the adaptive approach of SciCumulus, which presented an increase of performance up to 47.1%. Copyright © 2011 John Wiley & Sons, Ltd.

References

[1]
Hey T, Tansley S, Tolle K. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research: Seattle/WA, United States, 2009.
[2]
Cooperstock FI. General relativistic dynamics: extending Einstein's legacy throughout the universe. World Scientific: New Jersey, 2009.
[3]
Lin C, Lu S, Fei X, Pai D, Hua J. A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows. Proceedings of the International Conference on Services Computing 2009, Bangalore, India, 2009; 284–291.
[4]
Taylor IJ, Deelman E, Gannon DB, Shields M. Workflows for e-Science: Scientific Workflows for Grids, 1st ed. Springer: London, 2007.
[5]
Mattoso M, Werner C, Travassos GH, Braganholo V, Murta L, Ogasawara E, Oliveira D, Cruz SMSD, Martinho W. Towards Supporting the Life Cycle of Large-scale Scientific Experiments. International Journal of Business Process Integration and Management 2010; 5(1): 79–92.
[6]
Walker E, Guiang C. Challenges in executing large parameter sweep studies across widely distributed computing environments. Workshop on Challenges of large applications in distributed environments, Monterey, California, USA, 2007; 11–18.
[7]
Raicu I, Foster I, Zhao Y. Many-task computing for grids and supercomputers. Workshop on Many-Task Computing on Grids and Supercomputers, Austin, Texas, 2008; 1–11.
[8]
Vaquero LM, Rodero-Merino L, Caceres J, Lindner M. A break in the clouds: towards a cloud definition. SIGCOMM Computer Communication Review 2009; 39(1): 50–55.
[9]
Geelan J. Twenty-One Experts Define Cloud Computing. Cloud Computing Journal 2009. http://cloudcomputing.sys-con.com/node/612375?page=0,2
[10]
El-Khamra Y, Kim H, Jha S, Parashar M. Exploring the Performance Fluctuations of HPC Workloads on Clouds, Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Washington, DC, USA, 2010; 383–387.
[11]
Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ. Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Washington, DC, USA, 2010; 159–168.
[12]
He Q, Zhou S, Kobler B, Duffy D, McGlynn T. Case study for running HPC applications in public clouds, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA, 2010; 395–401.
[13]
Amazon EC2. Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/, 2010.
[14]
Zhao Y, Raicu I, Foster I. Scientific Workflow Systems for 21st Century, New Bottle or New Wine?, IEEE Congress on Services, 2008; 467–471.
[15]
Oliveira D, Ogasawara E, Baião F, Mattoso M. SciCumulus: A Lightweigth Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows, International Conference on Cloud Computing, Miami, FL, 2010; 378–385.
[16]
Oliveira D, Ogasawara E, Baiao F, Mattoso M. An Adaptive Approach for Workflow Activity Execution in Clouds, International Workshop on Challenges in e-Science - SBAC, Petrópolis, RJ - Brazil, 2010; 9–16.
[17]
Oliveira D, Ocana K, Ogasawara E, Dias J, Baião F, Mattoso M. A Performance Evaluation of X-ray Crystallography Scientific Workflow using SciCumulus, International Conference on Cloud ComputingInternational Conference on Cloud Computing, Washington D.C, 2011.
[18]
Buyya R, Ranjan R, Calheiros R. Modeling and Simulation of Scalable Cloud Computing Environments and the CloudSim Toolkit: Challenges and Opportunities. Proceedings of High Performance Computing & Simulation 2009, Leipzig, Germany, 2009.
[19]
Meyer LA, Rössle SC, Bisch PM, Mattoso M. Parallelism in Bioinformatics Workflows, High Performance Computing for Computational Science - VECPAR 2004, 2005; 583–597.
[20]
Callahan SP, Freire J, Santos E, Scheidegger CE, Silva CT, Vo HT. VisTrails: visualization meets data management. SIGMOD, Chicago, Illinois, USA, 2006; 745–747.
[21]
Freire J, Koop D, Santos E, Silva CT. Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 2008; 10(3): 11–21.
[22]
Carpenter B, Getov V, Judd G, Skjellum A, Fox G. MPJ: MPI-like message passing for Java. Concurrency: Practice and Experience 2000; 12(11): 1019–1038.
[23]
SubCloud. Shared Enterprise File System for Amazon S3 Cloud Storage | SubCloud, http://www.subcloud.com/, 2011.
[24]
Jain D, Lamour V. Computational tools in protein crystallography. Methods in Molecular Biology (Clifton, N.J.) 2010; 673: 129–156.
[25]
Azevedo W. Prof. Dr. Walter Filgueira de Azevedo Jr. URL: http://wfdaj.sites.uol.com.br/. Access: 30 Jul 2011, 2011.
[26]
Lima A, Mattoso M, Valduriez P. Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster. SBBDBrazilian Symposium on Databases, 2004; 92–105.
[27]
Abramson D, Enticott C, Altinas I. Nimrod/K: towards massively parallel dynamic grid workflows. Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, Texas, 2008; 1–11.
[28]
Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S. Kepler: an extensible system for design and execution of scientific workflows, Scientific and Statistical Database Management, Greece, 2004; 423–424.
[29]
Hoffa C, Mehta G, Freeman T, Deelman E, Keahey K, Berriman B, Good J. On the use of cloud computing for scientific workflows. IEEE Fourth International Conference on eScience (eScience 2008), Indianapolis, USA, 2008; 7–12.
[30]
Deelman E, Mehta G, Singh G, Su M, Vahi K. Pegasus: Mapping Large-Scale Workflows to Distributed Resources, Workflows for e-Science. Springer, 2007; 376–394.
[31]
Warneke D, Kao O. Nephele: efficient parallel data processing in the cloud. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, New York, NY, USA, 2009; 1–10.
[32]
Lee C, Suzuki J, Vasilakos A, Yamamoto Y, Oba K. An evolutionary game theoretic approach to adaptive and stable application deployment in clouds. Proceeding of the 2nd workshop on Bio-inspired algorithms for distributed systems, Washington, DC, USA, 2010; 29–38.
[33]
Ogasawara E, Oliveira D, Chirigati F, Barbosa CE, Elias R, Braganholo V, Coutinho A, Mattoso M. Exploring many task computing in scientific workflows. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, Oregon, 2009; 1–10.

Cited By

View all
  • (2021)Basin futures, a novel cloud-based system for preliminary river basin modelling and planningEnvironmental Modelling & Software10.1016/j.envsoft.2021.105049141:COnline publication date: 1-Jul-2021
  • (2016)A method for trust quantification in cloud computing environmentsInternational Journal of Distributed Sensor Networks10.1155/2016/50526142016(1-1)Online publication date: 1-Jan-2016
  • (2015)A Survey of Data-Intensive Scientific Workflow ManagementJournal of Grid Computing10.1007/s10723-015-9329-813:4(457-493)Online publication date: 1-Dec-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Concurrency and Computation: Practice & Experience
Concurrency and Computation: Practice & Experience  Volume 24, Issue 13
September 2012
158 pages
ISSN:1532-0626
EISSN:1532-0634
Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 01 September 2012

Author Tags

  1. cloud computing
  2. parallelism
  3. scientific workflows

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Basin futures, a novel cloud-based system for preliminary river basin modelling and planningEnvironmental Modelling & Software10.1016/j.envsoft.2021.105049141:COnline publication date: 1-Jul-2021
  • (2016)A method for trust quantification in cloud computing environmentsInternational Journal of Distributed Sensor Networks10.1155/2016/50526142016(1-1)Online publication date: 1-Jan-2016
  • (2015)A Survey of Data-Intensive Scientific Workflow ManagementJournal of Grid Computing10.1007/s10723-015-9329-813:4(457-493)Online publication date: 1-Dec-2015
  • (2014)A particle swarm optimisation algorithm for cloud-oriented workflow scheduling based on reliabilityInternational Journal of Computer Applications in Technology10.1504/IJCAT.2014.06673150:3/4(220-225)Online publication date: 1-Jan-2014
  • (2013)Dimensioning the virtual cluster for parallel scientific workflows in cloudsProceedings of the 4th ACM workshop on Scientific cloud computing10.1145/2465848.2465852(5-12)Online publication date: 17-Jun-2013
  • (2013)Capturing and querying workflow runtime provenance with PROVProceedings of the Joint EDBT/ICDT 2013 Workshops10.1145/2457317.2457365(282-289)Online publication date: 18-Mar-2013
  • (2012)A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in CloudsJournal of Grid Computing10.1007/s10723-012-9227-210:3(521-552)Online publication date: 1-Sep-2012
  • (2012)Enabling re-executions of parallel scientific workflows using runtime provenance dataProceedings of the 4th international conference on Provenance and Annotation of Data and Processes10.1007/978-3-642-34222-6_22(229-232)Online publication date: 19-Jun-2012
  • (2012)Using domain-specific data to enhance scientific workflow steering queriesProceedings of the 4th international conference on Provenance and Annotation of Data and Processes10.1007/978-3-642-34222-6_12(152-167)Online publication date: 19-Jun-2012

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media