Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CCGRID.2010.91acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
Article

SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems

Published: 17 May 2010 Publication History

Abstract

The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.

References

[1]
S. Jha et al., Programming Abstractions for Large-scale Distributed Applications, Submitted to ACM Computing Surveys; draft at http: //www.cct.lsu.edu/~sjha/publications/dpa_surveypaper.pdf.
[2]
Critical Perspectives on Large-Scale Distributed Applications and Production Grids, Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (GRID09), 2009 http://www.cct.lsu.edu/~sjha/dpa_publications/dpa_grid2009.pdf.
[3]
"An Autonomic Approach to Integrated HPC Grid and Cloud Usage", H. Kim, Y. el-Khamra, S. Jha and M. Parashar, accepted for IEEE eScience 2009, Oxford.
[4]
http://saga.cct.lsu.edu/projects/tools-and-infrastructure.
[5]
A. Tsaregorodtsev et al., DIRAC: A community Grid solution, J. Phys.: Conf. Ser. 119 (2008), p. 062048.
[6]
J. T. Moscicki, "Distributed analysis environment for HEP and interdisciplinary applications," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 502, no. 2-3, pp. 426-429, 2003.
[7]
I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde, "Falkon: A Fast and Light-Weight TasK ExecutiON Framework," in SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing. New York, NY, USA: ACM, 2007, pp. 1-12.
[8]
J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke, "Condor-G: A Computation Management Agent for Multi-Institutional Grids," Cluster Computing, vol. 5, no. 3, pp. 237- 246, July 2002. {Online}. Available: http://dx.doi.org/10.1023/A: 1015617019423
[9]
K. Keahey, M. Tsugawa, A. Matsunaga, and J. Fortes, "Sky computing," IEEE Internet Computing, vol. 13, no. 5, pp. 43-51, 2009.
[10]
http://saga.cct.lsu.edu.
[11]
Y. El-Khamra and S. Jha, "Developing autonomic distributed scientific applications: a case study from history matching using ensemblekalman-filters," in GMAC '09: Proceedings of the 6th international conference industry session on Grids meets autonomic computing. New York, NY, USA: ACM, 2009, pp. 19-28.
[12]
A. Merzky, K. Stamou, S. Jha and D. S. Katz, A Fresh Perspective on Developing and Executing DAG-Based Distributed Applications: A Case-Study of SAGA-based Montage, accepted for IEEE Conference on eScience 2009, Oxford. http://www.cct.lsu.edu/~sjha/dpa_ publications/saga_montage_G09.pdf.
[13]
BigJob Tutorials, http://saga.cct.lsu.edu/projects/abstractions/ bigjob-a-saga-based-pilot-job-implementation.
[14]
A. Luckow, S. Jha, J. Kim, A. Merzky, and B. Schnor, "Adaptive Distributed Replica-Exchange Simulations," in Theme Issue of the Philosophical Transactions of the Royal Society A, vol. 367, 2009.
[15]
A. Merzky, K. Stamou, and S. Jha, "Application level interoperability between clouds and grids," Grid and Pervasive Computing Conference, Workshops at the, vol. 0, pp. 143-150, 2009.
[16]
Thijs Metsch et. al, "Open Cloud Computing Interface Specification," http://forge.ogf.org/sf/go/doc15731, 2009, open Grid Forum.
[17]
H. Casanova, G. Obertelli, F. Berman, and R. Wolski, "The apples parameter sweep template: User-level middleware for the grid," Sci. Program., vol. 8, no. 3, pp. 111-126, 2000.
[18]
http://www.loni.org.
[19]
L. Kale and S. Krishnan, "CHARM++: A Portable Concurrent Object Oriented System Based on C++," Champaign, IL, USA, 1993.
[20]
http://aws.typepad.com/aws/2009/10/ two-new-ec2-instance-types-additional-memory.html.
[21]
E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, "The cost of doing science on the cloud: the montage example," in SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, 2008, pp. 1-12.
[22]
Z. Li and M. Parashar, "A computational infrastructure for grid-based asynchronous parallel applications," in HPDC '07: Proceedings of the 16th international symposium on High performance distributed computing, 2007, pp. 229-230.

Cited By

View all
  • (2022)Using unusedProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571938(1-15)Online publication date: 13-Nov-2022
  • (2018)A Comprehensive Perspective on Pilot-Job SystemsACM Computing Surveys10.1145/317785151:2(1-32)Online publication date: 17-Apr-2018
  • (2017)MaDaTSProceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3078597.3078611(41-52)Online publication date: 26-Jun-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CCGRID '10: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
May 2010
863 pages
ISBN:9780769540399

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 May 2010

Check for updates

Author Tags

  1. Cloud
  2. Distributed Computing
  3. Grid
  4. Pilot-Job
  5. SAGA

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Using unusedProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571938(1-15)Online publication date: 13-Nov-2022
  • (2018)A Comprehensive Perspective on Pilot-Job SystemsACM Computing Surveys10.1145/317785151:2(1-32)Online publication date: 17-Apr-2018
  • (2017)MaDaTSProceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3078597.3078611(41-52)Online publication date: 26-Jun-2017
  • (2016)Programming Scalable Cloud Services with AEONProceedings of the 17th International Middleware Conference10.1145/2988336.2988352(1-14)Online publication date: 28-Nov-2016
  • (2015)SIM-CITYProcedia Computer Science10.1016/j.procs.2015.05.39951:C(2327-2336)Online publication date: 1-Sep-2015
  • (2015)Pilot-DataJournal of Parallel and Distributed Computing10.1016/j.jpdc.2014.09.00979:C(16-30)Online publication date: 1-May-2015
  • (2015)GWpilotFuture Generation Computer Systems10.1016/j.future.2014.10.00345:C(25-52)Online publication date: 1-Apr-2015
  • (2015)Elastic grid resource provisioning with WoBinGOFuture Generation Computer Systems10.1016/j.future.2014.09.00442:C(44-54)Online publication date: 1-Jan-2015
  • (2014)Advancing next-generation sequencing data analytics with scalable distributed infrastructureConcurrency and Computation: Practice & Experience10.1002/cpe.301326:4(894-906)Online publication date: 25-Mar-2014
  • (2013)Stretch optimization for virtual screening on multi-user pilot-agent platforms on grid/cloudProceedings of the 4th Symposium on Information and Communication Technology10.1145/2542050.2542063(301-310)Online publication date: 5-Dec-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media