Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

I/O Scheduling Strategy for Periodic Applications

Published: 23 July 2019 Publication History

Abstract

With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in supercomputers. Architectural enhancement such as burst buffers and pre-fetching are added to machines but are not sufficient to prevent congestion. Recent online I/O scheduling strategies have been put in place, but they add an additional congestion point and overheads in the computation of applications.
In this work, we show how to take advantage of the periodic nature of HPC applications to develop efficient periodic scheduling strategies for their I/O transfers. Our strategy computes once during the job scheduling phase a pattern that defines the I/O behavior for each application, after which the applications run independently, performing their I/O at the specified times. Our strategy limits the amount of congestion at the I/O node level and can be easily integrated into current job schedulers. We validate this model through extensive simulations and experiments on an HPC cluster by comparing it to state-of-the-art online solutions, showing that not only does our scheduler have the advantage of being de-centralized and thus overcoming the overhead of online schedulers, but also that it performs better than the other solutions, improving the application dilation up to 16% and the maximum system efficiency up to 18%.

References

[1]
{n.d.}. The Trinity project. Retrieved from http://www.lanl.gov/projects/trinity/.
[2]
Guillaume Aupy, Olivier Beaumont, and Lionel Eyraud-Dubois. 2018. What size should your Buffers to Disk be? In Proceedings of the 32nd International Parallel Processing Symposium (IPDPS’18). IEEE.
[3]
Guillaume Aupy, Olivier Beaumont, and Lionel Eyraud-Dubois. 2019. Sizing and partitioning strategies for burst-buffers to reduce IO contention. In Proceedings of the 33rd International Parallel Processing Symposium (IPDPS’19). IEEE.
[4]
Guillaume Aupy, Ana Gainaru, and Valentin Le Fèvre. 2017. Periodic I/O scheduling for supercomputers. In Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS’17).
[5]
Behzad et al. 2013. Taming parallel I/O complexity with auto-tuning. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’13).
[6]
Rupak Biswas, M. J. Aftosmis, Cetin Kiris, and Bo-Wen Shen. 2007. Petascale computing: Impact on future NASA missions. Petascale Computing: Architectures and Algorithms (2007), 29--46.
[7]
George H. Bryan and J. Michael Fritsch. 2002. A benchmark simulation for moist nonhydrostatic numerical models. Month. Weather Rev. 130, 12 (2002).
[8]
Greg L. Bryan et al. 2013. Enzo: An adaptive mesh refinement code for astrophysics. The Astrophysical Journal Supplement Series 211.
[9]
S. Byna, Y. Chen, X. Sun, R. Thakur, and W. Gropp. 2008. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08). 1--12.
[10]
P. Carns, Rob Latham, Robert Ross, K. Iskra, S. Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. Proceedings of the IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09) (01 2009), 1--10.
[11]
Jonathan Carter, Julian Borrill, and Leonid Oliker. 2005. Performance characteristics of a cosmology package on leading HPC architectures. In Proceedings of the IEEE International Conference on High Performance Computing (HiPC’05). Springer, 176--188.
[12]
P. Colella et al. 2005. Chombo infrastructure for adaptive mesh refinement. Retrieved from https://seesar.lbl.gov/ANAG/chombo/.
[13]
J. T. Daly. 2004. A higher order estimate of the optimum checkpoint interval for restart dumps. Fut. Gen. Comput. Sci. 22, 3 (2004).
[14]
Matthieu Dorier, Gabriel Antoniu, Robert Ross, Dries Kimpe, and Shadi Ibrahim. 2014. CALCioM: Mitigating I/O interference in HPC systems through cross-application coordination. In Proceedings of the International Parallel Processing Symposium (IPDPS’14).
[15]
Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, and Rob Ross. 2014. Omnisc’IO: A grammar-based approach to spatial and temporal I/O patterns prediction. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’14). IEEE Press, 623--634.
[16]
Stephane Ethier, Mark Adams, Jonathan Carter, and Leonid Oliker. 2012. Petascale parallelization of the gyrokinetic toroidal code. In Proceedings of the International Meeting on High Performance Computing for Computational Science (VECPAR’12).
[17]
Valentin Le Fèvre. 2017. Source code. Retrieved from https://github.com/vlefevre/IO-scheduling-simu.
[18]
Ana Gainaru, Guillaume Aupy, Anne Benoit, Franck Cappello, Yves Robert, and Marc Snir. 2015. Scheduling the I/O of HPC applications under congestion. In Proceedings of the International Parallel Processing Symposium (IPDPS’15). IEEE, 1013--1022.
[19]
Robert G. Gallager. 1968. Information Theory and Reliable Communication. Vol. 2. Springer, Berlin.
[20]
Salman Habib et al. 2012. The universe at extreme scale: Multi-petaflop sky simulation on the BG/Q. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’12). IEEE Computer Society, Los Alamitos, CA, 4.
[21]
Claire Hanen and Alix Munier. 1993. Cyclic Scheduling on Parallel Processors: An Overview. Citeseer.
[22]
Bill Harrod. 2014. Big data and scientific discovery. https://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/talk4-Harrod.pdf.
[23]
Jun He, John Bent, Aaron Torres, Gary Grider, Garth Gibson, Carlos Maltzahn, and Xian-He Sun. 2013. I/O acceleration with pattern detection. In Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing (HPDC’13). ACM, New York, NY, 25--36.
[24]
Thomas Herault, Yves Robert, Aurélien Bouteiller, Dorian Arnold, Kurt Ferreira, George Bosilca, and Jack Dongarra. 2018. Optimal cooperative checkpointing for shared high-performance computing platforms. In Proceedings of the Workshop on Advances in Parallel and Distributed Computational Models (APDCM’18).
[25]
Wei Hu, Guang-ming Liu, Qiong Li, Yan-huang Jiang, and Gui-lin Cai. 2016. Storage wall for exascale supercomputing. Frontiers of Information Technology 8 Electronic Engineering 17 (2016), 1154--1175.
[26]
Florin Isaila and Jesus Carretero. 2015. Making the case for data staging coordination and control for parallel applications. In Proceedings of the Workshop on Exascale MPI at Supercomputing Conference.
[27]
Florin Isaila, Jesus Carretero, and Rob Ross. 2016. Clarisse: A middleware for data-staging coordination and control on large-scale hpc platforms. In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’16). IEEE, 346--355.
[28]
Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, and Xian-He Sun. 2016. Leveraging burst buffer coordination to prevent I/O interference. In Proceedings of the IEEE International Conference on eScience. IEEE.
[29]
Sidharth Kumar et al. 2013. Characterization and modeling of PIDX parallel I/O for performance optimization. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’13). ACM.
[30]
Albert Lazzarini. 2003. Advanced LIGO Data 8 Computing. https://labcit.ligo.caltech.edu/∼dhs/Adv-LIGO/review-june03/breakouts/Astrophysics,%20Data%20Analysis%20Hardware,%20Data%20Acquisition/lazz-ldas-newer.pdf.
[31]
N. Liu et al. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the MSST/SNAPI.
[32]
Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren Byna, Philip Carns, and Nicholas J. Wright. 2018. A year in the life of a parallel file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 74.
[33]
Jay Lofstead et al. 2010. Managing variability in the IO performance of petascale storage systems. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’10). IEEECS.
[34]
Jay Lofstead and Robert Ross. 2013. Insights for exascale IO APIs from building a petascale IO API. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’13). ACM, 87.
[35]
RD Nair and HM Tufo. 2007. Petascale atmospheric general circulation models. In Journal of Physics: Conference Series, Vol. 78. IOP Publishing.
[36]
Sankaran et al. 2006. Direct numerical simulations of turbulent lean premixed combustion. In Journal of Physics: Conference Series, Vol. 46. IOP Publishing, 38.
[37]
Seetharami R. Seelam and Patricia J. Teller. 2007. Virtual I/O scheduler: A scheduler of schedulers for performance virtualization. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environment (VEE’07). ACM, 105--115.
[38]
H. Shan and J. Shalf. 2007. Using IOR to analyze the I/O performance for HPC platforms. Cray User Group (2007).
[39]
D. Skinner and W. Kramer. 2005. Understanding the causes of performance variability in HPC workloads. Proceedings of the IEEE Workload Characterization Symposium (2005), 137--149.
[40]
Kun Tang, Ping Huang, Xubin He, Tao Lu, Sudharshan S. Vazhkudai, and Devesh Tiwari. 2017. Toward managing HPC burst buffers effectively: Draining strategy to regulate bursty I/O behavior. In Proceedings of the IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’17). IEEE, 87--98.
[41]
François Tessier, Preeti Malakar, Venkatram Vishwanath, Emmanuel Jeannot, and Florin Isaila. 2016. Topology-aware data aggregation for intensive I/O on large-scale supercomputers. In Proceedings of the 1st Workshop on Optimization of Communication in HPC. IEEE Press, 73--81.
[42]
Xuechen Zhang, Kei Davis, and Song Jiang. 2012. Opportunistic data-driven execution of parallel programs for efficient I/O services. In Proceedings of the International Parallel Processing Symposium (IPDPS’12). IEEE, 330--341.
[43]
Z. Zhou, X. Yang, D. Zhao, P. Rich, W. Tang, J. Wang, and Z. Lan. 2015. I/O-aware batch scheduling for petascale computing systems. In Proceedings of the IEEE Cluster Conference (Cluster’15). 254--263.

Cited By

View all
  • (2024)Timing-accurate scheduling and allocation for parallel I/O operations in real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2024.103158(103158)Online publication date: May-2024
  • (2024)Revisiting I/O bandwidth-sharing strategies for HPC applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104863(104863)Online publication date: Feb-2024
  • (2024)Probabilistic scheduling of dynamic I/O requests via application clustering for burst‐buffers equipped high‐performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.814236:19Online publication date: 27-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 6, Issue 2
June 2019
109 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3343018
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2019
Accepted: 01 May 2019
Revised: 01 March 2019
Received: 01 June 2018
Published in TOPC Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC
  2. I/O
  3. periodicity
  4. scheduling
  5. supercomputers

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Timing-accurate scheduling and allocation for parallel I/O operations in real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2024.103158(103158)Online publication date: May-2024
  • (2024)Revisiting I/O bandwidth-sharing strategies for HPC applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104863(104863)Online publication date: Feb-2024
  • (2024)Probabilistic scheduling of dynamic I/O requests via application clustering for burst‐buffers equipped high‐performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.814236:19Online publication date: 27-Jun-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2022)Adaptively Periodic I/O Scheduling for Concurrent HPC ApplicationsElectronics10.3390/electronics1109131811:9(1318)Online publication date: 21-Apr-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media