Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3150994.3151000acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

On the use of burst buffers for accelerating data-intensive scientific workflows

Published: 12 November 2017 Publication History

Abstract

Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world workflow applications is not always clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.

References

[1]
Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K Lockwood, Vakho Tsulaia, et al. 2016. Accelerating science with the NERSC burst buffer early user program. CUG2016 Proceedings (2016).
[2]
Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS) 7, 3 (2011), 8.
[3]
Cori - NERSC 2017. http://www.nersc.gov/users/computational-systems/cori/. (2017).
[4]
Lauro Beltrào Costa, Hao Yang, Emalayan Vairavanathan, Abmar Barros, Ketan Maheshwari, Gilles Fedak, D Katz, Michael Wilde, Matei Ripeanu, and Samer Al-Kiswany. 2015. The case for workflow-aware storage: An opportunity study. Journal of Grid Computing 13, 1 (2015), 95--113.
[5]
Yifeng Cui, Efecan Poyraz, Jun Zhou, Scott Callaghan, Phil Maechling, Thomas H. Jordan, Liwen Shih, and Po Chen. 2013. Accelerating CyberShake Calculations on XE6/XK7 Platforms of Blue Waters. In Proceedings of Extreme Scaling Workshop 2013.
[6]
Christopher S Daley, Devarshi Ghoshal, Glenn K Lockwood, Sudip S Dosanjh, Lavanya Ramakrishnan, and Nicholas J Wright. 2016. Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers. In 11th Workflows in Support of Large-Scale Science, WORKS'16. 69--73.
[7]
Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. 2015. Pegasus: a Workflow Management System for Science Automation. Future Generation Computer Systems 46 (2015), 17--35.
[8]
Matthieu Dreher and Bruno Raffin. 2014. A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 277--286.
[9]
Rafael Ferreira da Silva, Ewa Deelman, Rosa Filgueira, Karan Vahi, Mats Rynge, RajivMayani, and Benjamin Mayer. 2016. Automating Environmental Computing Applications with Scientific Workflows. In Environmental Computing Workshop (ECW'16), IEEE 12th International Conference on e-Science. 400--406.
[10]
Rafael Ferreira da Silva, Rosa Filgueira, Ewa Deelman, Erola Pairo-Castineira, Ian Michael Overton, and Malcolm Atkinson. 2016. Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows. In 11th Workflows in Support of Large-Scale Science (WORKS 16). 15--24.
[11]
Rafael Ferreira da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and Ewa Deelman. 2017. A Characterization of Workflow Management Systems for Extreme-Scale Applications. Future Generation Computer Systems 75 (2017), 228--238.
[12]
Brad Fitzpatrick. 2004. Distributed caching with memcached. Linux journal 2004, 124 (2004), 5.
[13]
James Frey. 2002. Condor DAGMan: Handling inter-job dependencies. (2002).
[14]
Robert Graves, Thomas H Jordan, Scott Callaghan, Ewa Deelman, Edward Field, Gideon Juve, Carl Kesselman, Philip Maechling, Gaurang Mehta, Kevin Milner, et al. 2011. CyberShake: A physics-based seismic hazard model for southern California. Pure and Applied Geophysics 168, 3--4 (2011), 367--381.
[15]
Dave Henseler, Benjamin Landsteiner, Doug Petesch, Cornell Wright, and Nicholas J Wright. 2016. Architecture and Design of Cray DataWarp. In Proc. Cray Users Group Technical Conference (CUG).
[16]
Stephen Herbein, Dong H Ahn, Don Lipari, Thomas RW Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, and Michela Taufer. 2016. Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. ACM, 69--80.
[17]
Travis Johnston, Boyu Zhang, Adam Liwo, Silvia Crivelli, and Michela Taufer. 2017. In situ data analytics and indexing of protein trajectories. Journal of computational chemistry 38, 16 (2017), 1419--1430.
[18]
Chee Sun Liew, Malcolm P Atkinson, Michelle Galea, Tan Fong Ang, Paul Martin, and Jano I Van Hemert. 2016. Scientific workflows: moving across paradigms. ACM Computing Surveys (CSUR) 49, 4 (2016), 66.
[19]
Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A survey of data-intensive scientific workflow management. Journal of Grid Computing 13, 4 (2015), 457--493.
[20]
Li Liu, Miao Zhang, Yuqing Lin, and Liangjuan Qin. 2014. A survey on workflow management and scheduling in cloud computing. In Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on. IEEE, 837--846.
[21]
Jianwei Ma, Wanyu Liu, and Tristan Glatard. 2013. A classification of file placement and replication methods on grids. Future Generation Computer Systems 29, 6 (2013), 1395--1406.
[22]
Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, Claris Castillo, Gideon Juve, Mats Rynge, Ewa Deelman, and Jeff Chase. 2015. Adapting Scientific Workflows on Networked Clouds Using Proactive Introspection. In IEEE/ACM Utility and Cloud Computing (UCC).
[23]
2007 Working Group on California Earthquake Probabilities. 2008. The Uniform California Earthquake Rupture Forecast, Version 2. (2008). https://pubs.usgs.gov/of/2007/1437/
[24]
Wolfram Schenck, Salem El Sayed, Maciej Foszczynski, Wilhelm Homberg, and Dirk Pleiter. 2017. Evaluation and Performance Modeling of a Burst Buffer Solution. ACM SIGOPS Operating Systems Review 50, 1 (2017), 12--26.
[25]
Southern California Earthquake Center 2017. http://www.scec.org. (2017).
[26]
Ian J Taylor, Ewa Deelman, Dennis B Gannon, and Matthew Shields. 2007. Workflows for e-Science: scientific workflows for grids. Springer Publishing Company, Incorporated.
[27]
Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience 17, 2--4 (2005), 323--356.
[28]
Teng Wang. 2017. Exploring Novel Burst Buffer Management on Extreme-Scale HPC Systems. Ph.D. Dissertation. The Florida State University.
[29]
Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, and Weikuan Yu. 2016. An ephemeral burst-buffer file system for scientific applications. In High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for. IEEE, 807--818.
[30]
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, and Weikuan Yu. 2014. Burstmem: A high-performance burst buffer system for scientific applications. In Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 71--79.
[31]
White House National Strategic Computing Initiative Workshop Proceedings 2015. https://www.nitrd.gov/nsci/files/NSCI2015WorkshopReport06142016.pdf. (2015).

Cited By

View all
  • (2021)A lightweight method for evaluating in situ workflow efficiencyJournal of Computational Science10.1016/j.jocs.2020.10125948(101259)Online publication date: Jan-2021
  • (2021)Data‐aware and simulation‐driven planning of scientific workflows on IaaS cloudsConcurrency and Computation: Practice and Experience10.1002/cpe.671934:14Online publication date: 14-Nov-2021
  • (2019)In-situ Data Analysis System for High Resolution Meteorological Large Eddy Simulation ModelProceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies10.1145/3365109.3368769(155-158)Online publication date: 2-Dec-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WORKS '17: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science
November 2017
87 pages
ISBN:9781450351294
DOI:10.1145/3150994
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. burst buffers
  2. high-performance computing
  3. in transit processing
  4. scientific workflows

Qualifiers

  • Research-article

Funding Sources

Conference

SC '17
Sponsor:

Acceptance Rates

WORKS '17 Paper Acceptance Rate 8 of 25 submissions, 32%;
Overall Acceptance Rate 30 of 54 submissions, 56%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)22
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A lightweight method for evaluating in situ workflow efficiencyJournal of Computational Science10.1016/j.jocs.2020.10125948(101259)Online publication date: Jan-2021
  • (2021)Data‐aware and simulation‐driven planning of scientific workflows on IaaS cloudsConcurrency and Computation: Practice and Experience10.1002/cpe.671934:14Online publication date: 14-Nov-2021
  • (2019)In-situ Data Analysis System for High Resolution Meteorological Large Eddy Simulation ModelProceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies10.1145/3365109.3368769(155-158)Online publication date: 2-Dec-2019
  • (2019)Toward an Elastic Data Transfer Infrastructure2019 15th International Conference on eScience (eScience)10.1109/eScience.2019.00036(262-265)Online publication date: Sep-2019
  • (2019)Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00072(631-640)Online publication date: May-2019
  • (2019)Elastic Data Transfer Infrastructure (DTI) on the Chameleon Cloud2019 IEEE 27th International Conference on Network Protocols (ICNP)10.1109/ICNP.2019.8888073(1-2)Online publication date: Oct-2019
  • (2019)Measuring the impact of burst buffers on data-intensive scientific workflowsFuture Generation Computer Systems10.1016/j.future.2019.06.016101:C(208-220)Online publication date: 1-Dec-2019
  • (2019)Approaches of enhancing interoperations among high performance computing and big data analytics via augmentationCluster Computing10.1007/s10586-019-02960-yOnline publication date: 3-Aug-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media