Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2304576.2304609acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
keynote

Blue Gene/Q: design for sustained multi-petaflop computing

Published: 25 June 2012 Publication History

Abstract

The Blue Gene/Q system represents the third generation of optimized high-performance computing Blue Gene solution servers and provides a platform for continued growth in HPC performance and capability. Blue Gene/Q started with a new design of the hardware platform, while retaining and significantly expanding an established, trusted and successful software environment.
To deliver a system that enables users to fully exploit the promise of high-performance computing for both traditional HPC applications and new commercial application areas, the Blue Gene/Q system architecture combines hardware and software innovations to overcome traditional bottlenecks, most famously the memory and power walls which have become emblematic of modern computing systems. At the same time, to deliver a platform for sustainable petascale computing, and beyond to exascale, we had to address a new set of "walls" with the many innovations described below: a scalability wall, a communication wall, and a reliability wall.
The new Blue Gene/Q system increases overall system performance with a new node architecture: Each node offers more thread-level-parallelism with a coherent SMP node consisting of eighteen 64-bit PowerPC cores with 4-way simultaneous multithreading. Each core provides for better exploitation of data-level parallelism with a new 4-way quad-vector processing unit (QPU). The memory subsystem integrates memory speculation support which can be used to implement both Transactional Memory and Speculative Execution programming models.
The compute nodes are connected in a five dimensional torus configuration using 10 point-to-point links, and a total network bandwidth of 44 GB/s per node. The on-chip messaging unit provides an optimized interface between the network routing logic and the memory subsystem, with enough bandwidth to keep all the links busy. It also offloads communication protocol processing by implementing collective broadcast and reduction operations, including integer and floating point sum, min and max.
Built on the Blue Gene hardware design is an efficient software stack that builds on several generations of Blue Gene software interfaces, while extending these capabilities and adding new functions to support new hardware capabilities. The hardware functions were designed with a focus on providing efficient primitives upon which to build the rich software environment.
To ensure reliable operation of a petascale system, reliability has to be a pervasive design consideration. At the architecture level, new QPX store-and-indicate instructions support the detection of programming errors. To ensure reliable operation in the presence of transient faults, we conducted exhaustive single event upset simulations based on fault injection into the simulated design. The operating system was structured to use firmware in a small on-chip boot eDRAM to avoid silent system hangs.
Together, the hardware and software innovations pioneered in Blue Gene/Q give application developers a platform and framework to develop and deploy sustained petascale computing applications. These petascale applications will allow its users to make new scientific discoveries and gain new business insights, which will be the true measure of the success of the new Blue Gene/Q systems.

Cited By

View all
  • (2019)FusedOSOperating Systems for Supercomputers and High Performance Computing10.1007/978-981-13-6624-6_14(227-239)Online publication date: 16-Oct-2019
  • (2017)Optimizing the efficiency of deep learning through accelerator virtualizationIBM Journal of Research and Development10.1147/JRD.2017.271659861:4-5(12:1-12:11)Online publication date: 1-Jul-2017
  • (2017)Parallel Deep Neural Network Training for Big Data on Blue Gene/QIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.262628928:6(1703-1714)Online publication date: 1-Jun-2017
  • Show More Cited By

Index Terms

  1. Blue Gene/Q: design for sustained multi-petaflop computing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '12: Proceedings of the 26th ACM international conference on Supercomputing
    June 2012
    400 pages
    ISBN:9781450313162
    DOI:10.1145/2304576

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SIMD
    2. blue gene
    3. blue gene/q
    4. communication wall
    5. design for reliability
    6. interconnection networks
    7. memory wall
    8. petascale computing
    9. power wall
    10. quad-vector processing extensions (QPX)
    11. quad-vector processing unit (QPU)
    12. reliability wall
    13. scalability wall
    14. speculative execution
    15. supercomputing applications
    16. transactional memory

    Qualifiers

    • Keynote

    Conference

    ICS'12
    Sponsor:
    ICS'12: International Conference on Supercomputing
    June 25 - 29, 2012
    San Servolo Island, Venice, Italy

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)FusedOSOperating Systems for Supercomputers and High Performance Computing10.1007/978-981-13-6624-6_14(227-239)Online publication date: 16-Oct-2019
    • (2017)Optimizing the efficiency of deep learning through accelerator virtualizationIBM Journal of Research and Development10.1147/JRD.2017.271659861:4-5(12:1-12:11)Online publication date: 1-Jul-2017
    • (2017)Parallel Deep Neural Network Training for Big Data on Blue Gene/QIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.262628928:6(1703-1714)Online publication date: 1-Jun-2017
    • (2016)The BLIS FrameworkACM Transactions on Mathematical Software10.1145/275556142:2(1-19)Online publication date: 3-Jun-2016
    • (2014)Parallel deep neural network training for big data on blue gene/QProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.66(745-753)Online publication date: 16-Nov-2014
    • (2014)A system software approach to proactive memory-error avoidanceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.63(707-718)Online publication date: 16-Nov-2014
    • (2013)Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel PlatformsProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.207(1107-1116)Online publication date: 20-May-2013
    • (2012)FusedOSProceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2012.14(211-218)Online publication date: 24-Oct-2012

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media