Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3295500.3356192acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Preparation and optimization of a diverse workload for a large-scale heterogeneous system

Published: 17 November 2019 Publication History

Abstract

Productivity from day one on supercomputers that leverage new technologies requires significant preparation. An institution that procures a novel system architecture often lacks sufficient institutional knowledge and skills to prepare for it. Thus, the "Center of Excellence" (CoE) concept has emerged to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500. This paper documents CoE experiences that prepared a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for using different programming approaches. Our early science and performance results show that the project enabled significant early seismic science with up to a l4X throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.

References

[1]
M.J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl. 2015. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1--2 (2015).
[2]
A.H. Baker, R.D. Falgout, T.V. Kolev, and U.M. Yang. 2012. Scaling hypre's multigrid solvers to 100,000 cores. In High-Performance Scientific Computing. Springer, New York, NY.
[3]
A.H. Baker, A. Klawonn, T. Kolev, M. Lanser, O. Rheinbach, and U.M. Yang. 2016. Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks. In Software for Exascale Computing - SPPEXA 2013-2015, H.-J. Bungartz, P. Neumann, and W.E. Nagel (Eds.). Springer, New York, NY.
[4]
J. Carreira and A. Zisserman. 2017. Quo vadis, action recognition? A new model and the Kinetics dataset. CoRR (2017). arXiv:1705.07750 http://arxiv.org/abs/1705.07750
[5]
G. Cong, G. Domeniconi, J. Shapiro, C.C. Yang, and B. Chen. 2019. Video action recognition with an additional end-to-end trained temporal stream. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV '19). IEEE.
[6]
J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A.Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 1 (NIPS '12). Curran Associates, Red Hook, NY.
[7]
N. Dryden, N. Maruyama, T. Benson, T. Moon, M. Snir, and B. van Essen. 2019. Improving strong-scaling of CNN training by exploiting finer-grained parallelism. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '19). IEEE Press, Piscataway, NJ. To appear.
[8]
A.C. Hindmarsh, P.N. Brown, K.E. Grant, S.L. Lee, R. Serban, D.E. Shumaker, and C.S. Woodward. 2005. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software (TOMS) 31, 3 (2005).
[9]
hypre Team. 2019. hypre: High Performance Preconditioners. https://www.llnl.gov/CASC/hypre/. (2019).
[10]
H. Johansen, A. Rodgers, N.A. Petersson, D. McCallen, B. Sjogreen, and M. Miah. 2017. Toward exascale earthquake ground motion simulations for near-fault engineering analysis. Computing in Science Engineering 19, 5 (2017).
[11]
W. Joubert, R. Archibald, M. Berrill, W.M. Brown, M. Eisenbach, R. Grout, J. Larkin, J. Levesque, B. Messer, M. Norman, B. Philip, R. Sankaran, A. Tharrington, and J. Turner. 2015. Accelerated application development: The ORNL Titan experience. Computers and Electrical Engineering 46 (2015).
[12]
N.L. Petroni Jr., T. Fraser, J. Molina, and W.A. Arbaugh. 2004. Copilot - A coprocessor-based kernel runtime integrity monitor. In Proceedings of the Conference on USENIX Security Symposium - Volume 13 (SSYM '04). USENIX Association, Berkeley, CA.
[13]
I. Karlin, T. Scogland, A.C. Jacob, S.F. Antao, G.-T. Bercea, C. Bertolli, B.R. de Supinski, E.W. Draeger, A.E. Eichenberger, J. Glosli, H. Jones, A. Kunen, D. Poliakoff, and D.F. Richards. 2016. Early experiences porting three applications to OpenMP 4.5. In OpenMP: Memory, Devices, and Tasks, N. Maruyama, B.R. de Supinski, and M. Wahib (Eds.). Springer, New York, NY.
[14]
X. Lian, Y. Huang, Y. Li, and J. Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 2 (NIPS '15). MIT Press, Cambridge, MA.
[15]
S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, and A.H. de Vries. 2007. The MARTINI force field: Coarse grained model for niomolecular simulations. The Journal of Physical Chemistry B 111, 27 (July 2007).
[16]
A.A. Mirin, D.F. Richards, J.N. Glosli, E.W. Draeger, B. Chan, J.-L. Fattebert, W.D. Krauss, T. Oppelstrup, J.J. Rice, J.A. Gunnels, V. Gurev, C. Kim, J. Magerlein, M. Reumann, and H.-F. Wen. 2012. Toward real-time modeling of human heart ventricles at cellular resolution: Simulation of drug-induced arrhythmias. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '12). IEEE Press, Piscataway, NJ.
[17]
H. Nam, G. Rockefeller, M. Glass, S. Dawson, J. Levesque, and V. Lee. 2017. The Trinity Center of Excellence co-design best practices. Computing in Science Engineering 19, 05 (2017).
[18]
F. Di Natale, H.I. Ingólfsson, H. Bhatia, T. Carpenter, T. Oppelstrup, S. Kokkila Schumacher, X. Zhang, S. Sundram, T. Scogland, G. Dharuman, T. Bremer, L. Stanton, M. Surh, C. Neale, C. Lopez, S. Gnanakaran, C. Misale, L. Schneidenbach, C. Kim, B. D'Amora, D. Nissley, F. Streitz, F. Lightstone, and J.N. Glosli. 2019. A massively parallel infrastructure for adaptive multiscale simulation: Modeling RAS initiation pathway for cancer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '19). IEEE Press, Piscataway, NJ. Submitted.
[19]
R.J. Neely and B.R. de Supinski. 2017. Application modernization at LLNL and the Sierra Center of Excellence. Computing in Science Engineering 9, 5 (2017).
[20]
B. Nicolae, C.H.A. Costa, C. Misale, K. Katrinis, and Y. Park. 2016. Towards memory-optimized data shuffling patterns for big data analytics. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing (CCGRID '16). IEEE Press, Piscataway, NJ.
[21]
B. Nicolae, C.H.A. Costa, C. Misale, K. Katrinis, and Y. Park. 2017. Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Transactions on Parallel and Distributed Systems 28, 6 (June 2017).
[22]
S. Poudel, R. Pearce, and M. Gokhale. 2015. Towards scalable graph analytics on time dependent graphs. Technical Report. Lawrence Livermore National Lab (LLNL), Livermore, CA.
[23]
D.F. Richards, O. Aaziz, J. Cook, H. Finkel, B. Homerding, P. McCorquodale, T. Mintz, S. Moore, A. Bhatele, and R. Pavel. 2018. FY18 proxy app suite release. Milestone report for the ECP proxy app project. Technical Report LLNL-TR-760903. Lawrence Livermore National Lab, Livermore, CA.
[24]
R. Sacks, K. Mccandless, E. Feigenbaum, J.M.G. Di Nicola, K.J. Luke, W. Riedel, R.J. Learn, and B.J. Kraines. 2015. The virtual beamline (VBL) laser simulation code. Proceedings of SPIE - The International Society for Optical Engineering 9345 (Feb. 2015).
[25]
L. Schneidenbach, C. Misale, B. D'Amora, and C.H.A Costa. 2019. IBM Data Broker. https://github.com/IBM/data-broker. (2019).
[26]
H.A. Scott. 2001. Cretin-A radiative transfer capability for laboratory plasmas. Journal of Quantitative Spectroscopy and Radiative Transfer 71, 2 (2001).
[27]
F.H. Streitz, J.N. Glosli, and M.V. Patel. 2006. Beyond finite-size scaling in solidification simulations. Physical Review Letters 96 (June 2006). Issue 22.
[28]
F.H. Streitz, J.N. Glosli, M.V. Patel, B. Chan, R.K. Yates, B.R. de Supinski, J. Sexton, and J.A. Gunnels. 2005. 100+ TFlop solidification simulations on Blue Gene/L. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '05). IEEE Press, Piscataway, NJ.
[29]
MFEM Team. 2019. MFEM: Modular Finite Element Methods Library. https://mfem.org. (2019).
[30]
SAMRAI Team. 2019. SAMRAI: Structured Adaptive Mesh Refinement Application Infrastructure. https://computation.llnl.gov/projects/samrai. (2019).
[31]
SUNDIALS Team. 2019. SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers. https://www.llnl.gov/CASC/sundials/. (2019).
[32]
S.S. Vazhkudai, B.R. de Supinski, A.S. Bland, A. Geist, J. Sexton, J. Kahle, C.J. Zimmer, S. Atchley, S. Oral, D.E. Maxwell, V.G.V. Larrea, A. Bertsch, R. Goldstone, W. Joubert, C. Chambreau, D. Appelhans, R. Blackmore, B. Casses, G. Chochia, G. Davison, M.A. Ezell, T. Gooding, E. Gonsiorowski, L. Grinberg, B. Hanson, B. Hartner, I. Karlin, M. L. Leininger, D. Leverman, C. Marroquin, A. Moody, M. Ohmacht, R. Pankajakshan, F. Pizzano, J. H. Rogers, B. Rosenburg, D. Schmidt, M. Shankar, F. Wang, P. Watson, B. Walkup, L. D. Weems, and J. Yin. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ.
[33]
S. Zhang, A. Choromanska, and Y. LeCun. 2015. Deep learning with elastic averaging SGD. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 1 (NIPS '15). MIT Press, Cambridge, MA.
[34]
F. Zhou and G. Cong. 2018. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI '18). International Joint Conferences on Artificial Intelligence Organization.

Cited By

View all
  • (2022)Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation SolversACM Transactions on Mathematical Software10.1145/353980148:3(1-24)Online publication date: 10-Sep-2022
  • (2021)Enabling GPU accelerated computing in the SUNDIALS time integration libraryParallel Computing10.1016/j.parco.2021.102836108:COnline publication date: 1-Dec-2021
  • (2020)Porting a 3D seismic modeling code (SW4) to CORAL machinesIBM Journal of Research and Development10.1147/JRD.2019.296021864:3/4(17:1-17:11)Online publication date: 1-May-2020

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPUs
  2. heterogeneous systems
  3. large-scale applications
  4. performance
  5. programming models
  6. project management

Qualifiers

  • Research-article

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)162
  • Downloads (Last 6 weeks)29
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation SolversACM Transactions on Mathematical Software10.1145/353980148:3(1-24)Online publication date: 10-Sep-2022
  • (2021)Enabling GPU accelerated computing in the SUNDIALS time integration libraryParallel Computing10.1016/j.parco.2021.102836108:COnline publication date: 1-Dec-2021
  • (2020)Porting a 3D seismic modeling code (SW4) to CORAL machinesIBM Journal of Research and Development10.1147/JRD.2019.296021864:3/4(17:1-17:11)Online publication date: 1-May-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media