research-article

Entering the petaflop era: the architecture and performance of Roadrunner

Authors:

Kevin J. Barker,

Darren J. Kerbyson,

Jose C. SanchoAuthors Info & Claims

SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Article No.: 1, Pages 1 - 11

Published: 15 November 2008 Publication History

Abstract

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.

References

[1]

HPCS, High Productivity Computing Systems initiative in DARPA, Available from http://www.darpa.mil/ipto/programs/hpcs

[2]

Michael Gschwind, IBM, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, Takeshi Yamazaki, "Synergistic Processing in Cell's Multicore Architecture", IEEE Micro, 26(22):10--24, March 2006.

Digital Library

[3]

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4):pp. 589--604, 2005.

Digital Library

[4]

Catherine Crawford, Paul Henning, Michael Kistler, Cornell Wright, Accelerating Computing with the Cell Broadband Engine Processor. ACM Computing Frontiers 2008, Ischia, Italy.

Digital Library

[5]

Michael Gschwind, The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor. International Journal of Parallel Programming, 35(3):pp. 233--262, 2007.

Digital Library

[6]

Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. The Potential of the Cell Processor for Scientific Computing. In proc. ACM Int. Conf. on Computing Frontiers, 2006, Ischia, Italy.

Digital Library

[7]

Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick, Scientific Computing Kernels on the Cell Processor, Int. J. of Parallel Programming, 35(3):263--298.

Digital Library

[8]

Sriram Swaminarayan, Kai Kadau, and Timothy C. Germann. 350--450 Tflops Molecular Dynamics Simulations on the Roadrunner General-purpose Heterogeneous Supercomputer. ACM Gordon Bell Prize finalist, n proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.

[9]

Brian J. Albright, Benjamin K. Bergen, Lin Yin, Kevin J. Barker, and Darren J. Kerbyson. 0.365 Pflop/s Trillion-particle Particle-in-cell Modeling of Laser Plasma Interactions on Roadrunner. ACM Gordon Bell Prize finalist, in proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.

[10]

Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: Past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, August 2003.

[11]

International Business Machines Corporation. Accelerated Library Framework for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8406-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/5cyfc9.

[12]

International Business Machines Corporation. C/C++ Language Extensions for Cell Broadband Architecture, Version 2.5. February 27, 2008. Available from http://tinyurl.com/5stuga.

[13]

International Business Machines Corporation. Data Communication and Synchronization Library for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8408-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/6kn98k.

[14]

Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference, volume 1, The MPI Core. The MIT Press, Cambridge, Massachusetts, 2nd edition, September 1998. 1st edition is available from http://www.netlib.org/utk/papers/mpi-book/mpi-book.ps.

Digital Library

[15]

John A. Turner, Roadrunner Applications Team: Cell and Hybrid Results to Date. Los Alamos National Laboratory presentation. Available from http://www.lanl.gov/orgs/hpc/roadrunner/rrinfo/RR%20webPDFs/Turner_Apps_v6_LA -UR.pdf.

[16]

John McCalpin. "Memory bandwidth and machine balance in current high performance computers", in IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.

[17]

International Business Machines Corporation. Data Communication and Sychronization for Hybrid-x86 Programmer's Guide and API Reference, version 3.0. October 19, 2007.

[18]

Richard L. Graham, Galen M. Shipman, Brian W. Barrett, Ralph H. Castain, George Bosilca, and Andrew Lumsdaine. Open MPI: A high-performance, heterogeneous MPI. In Fifth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar '06), pp. 1--9, Barcelona, Spain, September 2006.

[19]

Adolfy Hoisie, Olaf M. Lubek, Harvey J. Wasserman, Performance and Scalabilty Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications. Int. J. High Performance Computing Applications, 14 (4): 330--347, November 2000. Available from http://www.c3.lanl.gov/pal/publications/papers/terraflop00:parallel.pdf

Digital Library

[20]

F. Petrini, G. Fossum, J. Fernandez, A. L. Varbanescu, N. Kistler, M. Perrone, Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine, in proc. Int. Parallel and Distributed Processing Symposium, Long Beach, California, 2007.

[21]

A. Eichenberger et. al., Optimizing Compiler for the Cell Processor, PACT 2005, September 2005.

Digital Library

[22]

Kevin Krewell. Cell moves into the limelight. Microprocessor Report, pp. 1--9, February 14, 2005.

[23]

Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008. Available from http://www.c3.lanl.gov/PAL/publications/papers/Pakin2008:cellmsg.pdf.

[24]

Thomas Chen, Ram Raghavan, Jason N. Dale, and Eiji Iwata. Cell Broadband Engine Architecture and its first implementation---a performance view. IBM Journal of Research and Development, 51(5):559--572, September 2007. Available from http://www.research.ibm.com/journal/rd/515/chen.pdf.

Digital Library

[25]

Ashwini K. Nanda, J. Randal Moulic, Robert E. Hanson, Gottfried Goldrian, Michael N. Day, Bruce D. D'Amora, and Sreeni Kesavarapu. Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers. IBM Journal of Research and Development, 51(5):573--582, September 2007. Available from http://www.research.ibm.com/journal/rd/515/nanda.pdf.

Digital Library

Cited By

Pearce O(2018)Experiences Using CPUs and GPUs for Cooperative Computation in a Multi-Physics SimulationWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229711(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3229710.3229711
Tang HByna STessier FWang TDong BMu JKoziol QSoumagne JVishwanath VLiu JWarren REl-Araby EEl-Ghazawi TPanda D(2018)Toward scalable and asynchronous object-centric data management for HPCProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00026(113-122)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00026
Grass TAllande CArmejach ARico AAyguadé ELabarta JValero MCasas MMoreto MWest J(2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014965
Show More Cited By

Index Terms

Recommendations

369 Tflop-s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’
Exploring the Frontiers of Computing Science and Technology: Adapting Emerging Multi-and Many-core Processors

We describe the implementation of a short-range parallel molecular dynamics (MD) code, SPaSM, on the heterogeneous general-purpose Roadrunner supercomputer. Each Roadrunner ‘TriBlade’ compute node consists of two AMD Opteron dual-core microprocessors ...
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
CF '12: Proceedings of the 9th conference on Computing Frontiers

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these ...
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

Multicore architectures provide scalable performance with a lower hardware design effort than single core processors. Our article presents a design methodology and an embedded multicore architecture, focusing on reducing the software design complexity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

November 2008

739 pages

ISBN:9781424428359

Conference Chair:
Patricia Teller

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '08

Sponsor:

SIGARCH
IEEE-CS

SC '08: International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 21, 2008

Texas, Austin

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
1,139
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pearce O(2018)Experiences Using CPUs and GPUs for Cooperative Computation in a Multi-Physics SimulationWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229711(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3229710.3229711
Tang HByna STessier FWang TDong BMu JKoziol QSoumagne JVishwanath VLiu JWarren REl-Araby EEl-Ghazawi TPanda D(2018)Toward scalable and asynchronous object-centric data management for HPCProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00026(113-122)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00026
Grass TAllande CArmejach ARico AAyguadé ELabarta JValero MCasas MMoreto MWest J(2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014965
Brown WSemin AHebenstreit MKhvostov SRaman KPlimpton SWest J(2016)Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiencyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014915(1-14)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014915
Trobec RVasiljević RTomašević MMilutinović VBeivide RValero M(2016)Interconnection Networks in Petascale Computer SystemsACM Computing Surveys10.1145/298338749:3(1-24)Online publication date: 16-Sep-2016
https://dl.acm.org/doi/10.1145/2983387
Kerbyson DBarker KVishnu AHoisie A(2014)A performance comparison of current HPC systemsFuture Generation Computer Systems10.5555/2747903.274819530:C(291-304)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.5555/2747903.2748195
Silva GCorrea JBentes CGuedes SGabioux M(2014)The Experience in Designing and Evaluating the High Performance Cluster NetunoInternational Journal of Parallel Programming10.1007/s10766-012-0224-742:2(265-286)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1007/s10766-012-0224-7
Rico ACabarcas FVillavieja CPavlovic MVega AEtsion YRamirez AValero M(2012)On the simulation of large-scale architectures using multiple application abstraction levelsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867158:4(1-20)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086715
David TJacquelin MMarchal L(2012)Scheduling streaming applications on a complex multicore platformConcurrency and Computation: Practice & Experience10.1002/cpe.187424:15(1726-1750)Online publication date: 1-Oct-2012
https://dl.acm.org/doi/10.1002/cpe.1874
Rajovic NPuzovic NVilanova LVillavieja CRamirez AAlexandrov VGeist ADongarra J(2011)The low-power architecture approach towards exascale computingProceedings of the second workshop on Scalable algorithms for large-scale systems10.1145/2133173.2133175(1-2)Online publication date: 14-Nov-2011
https://dl.acm.org/doi/10.1145/2133173.2133175
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents