Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3468044.3468046acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
invited-talk

On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations

Published: 21 June 2021 Publication History

Abstract

High-Performance Computing (HPC) is at an inflection point in its evolution. General-purpose architectures approach limits in terms of speed and power/energy, requiring the development of specialized architectures to deliver accelerated performance. Additionally, the arrival of new user communities and workloads---including machine learning, data analytics, and quantum simulation---increases the breadth of application characteristics we need to support, putting pressure on the complexity of the architectural portfolio. At the same time, data movement has been identified as a main culprit of energy waste, pushing hardware designers towards a tighter integration of the different technologies. The resulting integrated systems offer great opportunities in terms of power/performance tradeoffs, but also lead to challenges on the software side.
In this position paper, we highlight the trends leading us to integrated systems and describe their substantial advantages over simpler, single accelerated designs. Further, we highlight its impact on the corresponding software stack and its challenges and impact on the user. This introduces a different way to design, program and operate HPC systems, and ultimately the need to drop some long-held dogmas or believes in HPC systems.

References

[1]
Andreas Agne, Markus Happe, Achim Lösch, Christian Plessl, and Marco Platzner. 2014. Self-Awareness as a Model for Designing and Operating Heterogeneous Multicores. ACM Transactions on Reconfigurable Technology and Systems 7 (07 2014), 18. https://doi.org/10.1145/2617596
[2]
Alécio Pedro Delazari Binotto, Dionisio Doering, Thorsten Stetzelberger, Patrick McVittie, Sergio Zimmermann, and Carlos Eduardo Pereira. 2013. A CPU, GPU, FPGA System for X-Ray Image Processing Using High-Speed Scientific Cameras. In 2013 25th International Symposium on Computer Architecture and High Performance Computing. 113--119. https://doi.org/10.1109/SBAC-PAD.2013.1
[3]
Jens Breitbart, Josef Weidendorfer, and Carsten Trinitis. 2015. Case Study on Co-scheduling for HPC Applications. 277--285. https://doi.org/10.1109/ICPPW.2015.38
[4]
Gianluca C. Durelli, Marcello Pogliani, Antonio Miele, Christian Plessl, Heinrich Riebler, Marco D. Santambrogio, Gavin Vaz, and Cristiana Bolchini. 2014. Runtime Resource Management in Heterogeneous System Architectures: The SAVE Approach. In 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications. 142--149. https://doi.org/10.1109/ISPA.2014.27
[5]
Carla Guillen, Carmen Navarrete, David Brayford, Wolfram Hesse, and Matthias Brehm. 2017. Energy Model Derivation for the DVFS Automatic Tuning Plugin: Tuning Energy and Power Related Tuning Objectives. Computing 99, 8 (Aug. 2017), 747--764. https://doi.org/10.1007/s00607-016-0536-3
[6]
Utz-Uwe Haus. 2021. The Brave New World of Exascale Computing: Computation is Free, Data Movement is Not. Invited Talk at the TRR154/MINOA conference "Trends in Modelling, Simulation and Optimisation: Theory and Applications", https://minoa-itn.fau.de/wp-content/uploads/2021/03/TRR154-MINOA20210303.pdf. https://doi.org/10.1109/ICPPW.2015.38
[7]
Mario Kicherer, Fabian Nowak, Rainer Buchty, and Wolfgang Karl. 2012. Seamlessly Portable Applications: Managing the Diversity of Modern Heterogeneous Systems. ACM Trans. Archit. Code Optim. 8, 4, Article 42 (Jan. 2012), 20 pages. https://doi.org/10.1145/2086696.2086721
[8]
Peter Kogge and John Shalf. 2013. Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture. Computing in Science & Engineering 15 (11 2013), 16--26. https://doi.org/10.1109/MCSE.2013.95
[9]
Achim Lösch, Tobias Beisel, Tobias Kenter, Christian Plessl, and Marco Platzner. 2016. Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE). 912--917.
[10]
Achim Lösch, Alex Wiens, and Marco Platzner. 2018. Ampehre: An Open Source Measurement Framework for Heterogeneous Compute Nodes. 73--84. https://doi.org/10.1007/978-3-319-77610-1_6
[11]
Satoshi Matsuoka. 2018. Cambrian Explosion of Computing and Big Data in the Post-Moore Era. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC '18). Association for Computing Machinery, New York, NY, USA, 105. https://doi.org/10.1145/3208040.3225055
[12]
Alessio Netti, Micha Müller, Axel Auweter, Carla Guillen, Michael Ott, Daniele Tafani, and Martin Schulz. 2019. From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '19). Association for Computing Machinery, New York, NY, USA, Article 64, 27 pages. https://doi.org/10.1145/3295500.3356191
[13]
Alessio Netti, Micha Müller, Carla Guillen, Michael Ott, Daniele Tafani, Gence Ozer, and Martin Schulz. 2020. DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC '20). Association for Computing Machinery, New York, NY, USA, 101--112. https://doi.org/10.1145/3369583.3392674
[14]
Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, Denis Dutoit, Pierre-Yves Martinez, Chris Doran, Luca Benini, Iakovos Mavroidis, Manolis Marazakis, Valeria Bartsch, Guy Lonsdale, Antoniu Pop, John Goodacre, Annaik Colliot, Paul Carpenter, Petar Radojković, Dirk Pleiter, Dominique Drouin, and Benoît Dupont de Dinechin. 2017. Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach. In 2017 Euromicro Conference on Digital System Design (DSD). 486--493. https://doi.org/10.1109/DSD.2017.37
[15]
Michael Showerman, Jeremy Enos, Avneesh Pant, Volodymyr Kindratenko, Craig Steffen, Robert Pennington, and Wen-mei Hwu. 2009. QP: A Heterogeneous Multi-Accelerator Cluster. (01 2009).
[16]
Estela Suarez, Norbert Eicker, and Thomas Lippert. 2019. Modular Supercomputing Architecture: From Idea to Production. 223--255. https://doi.org/10.1201/9781351036863-9
[17]
Carsten Trinitis and Josef Weidendorfer (Eds.). 2018. Proceedings of the 3rd Workshop on Co-Scheduling of HPC Applications, COSH@HiPEAC 2018, Manchester, United Kingdom, January 23, 2018. TUM Library.
[18]
Kuen Hung Tsoi and Wayne Luk. 2010. Axel: A Heterogeneous Cluster with FPGAs and GPUs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, California, USA) (FPGA '10). Association for Computing Machinery, New York, NY, USA, 115--124. https://doi.org/10.1145/1723112.1723134

Cited By

View all
  • (2024)Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future OpportunitiesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340676435:9(1551-1564)Online publication date: Sep-2024
  • (2023)Sustainability in HPC: Vision and OpportunitiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624271(1876-1880)Online publication date: 12-Nov-2023
  • (2022)Analyzing the Energy Consumption of Synchronous and Asynchronous Checkpointing Strategies2022 IEEE/ACM Third International Symposium on Checkpointing for Supercomputing (SuperCheck)10.1109/SuperCheck56652.2022.00006(1-9)Online publication date: Nov-2022

Index Terms

  1. On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HEART '21: Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
      June 2021
      76 pages
      ISBN:9781450385497
      DOI:10.1145/3468044
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      In-Cooperation

      • German Research Foundation: German Research Foundation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 June 2021

      Check for updates

      Author Tags

      1. Adaptive Systems
      2. Co-Design
      3. HPC Architectures

      Qualifiers

      • Invited-talk
      • Research
      • Refereed limited

      Conference

      HEART '21

      Acceptance Rates

      Overall Acceptance Rate 22 of 50 submissions, 44%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future OpportunitiesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340676435:9(1551-1564)Online publication date: Sep-2024
      • (2023)Sustainability in HPC: Vision and OpportunitiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624271(1876-1880)Online publication date: 12-Nov-2023
      • (2022)Analyzing the Energy Consumption of Synchronous and Asynchronous Checkpointing Strategies2022 IEEE/ACM Third International Symposium on Checkpointing for Supercomputing (SuperCheck)10.1109/SuperCheck56652.2022.00006(1-9)Online publication date: Nov-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media