Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1413370.1413372acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Entering the petaflop era: the architecture and performance of Roadrunner

Published: 15 November 2008 Publication History

Abstract

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.

References

[1]
HPCS, High Productivity Computing Systems initiative in DARPA, Available from http://www.darpa.mil/ipto/programs/hpcs
[2]
Michael Gschwind, IBM, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, Takeshi Yamazaki, "Synergistic Processing in Cell's Multicore Architecture", IEEE Micro, 26(22):10--24, March 2006.
[3]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4):pp. 589--604, 2005.
[4]
Catherine Crawford, Paul Henning, Michael Kistler, Cornell Wright, Accelerating Computing with the Cell Broadband Engine Processor. ACM Computing Frontiers 2008, Ischia, Italy.
[5]
Michael Gschwind, The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor. International Journal of Parallel Programming, 35(3):pp. 233--262, 2007.
[6]
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. The Potential of the Cell Processor for Scientific Computing. In proc. ACM Int. Conf. on Computing Frontiers, 2006, Ischia, Italy.
[7]
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick, Scientific Computing Kernels on the Cell Processor, Int. J. of Parallel Programming, 35(3):263--298.
[8]
Sriram Swaminarayan, Kai Kadau, and Timothy C. Germann. 350--450 Tflops Molecular Dynamics Simulations on the Roadrunner General-purpose Heterogeneous Supercomputer. ACM Gordon Bell Prize finalist, n proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
[9]
Brian J. Albright, Benjamin K. Bergen, Lin Yin, Kevin J. Barker, and Darren J. Kerbyson. 0.365 Pflop/s Trillion-particle Particle-in-cell Modeling of Laser Plasma Interactions on Roadrunner. ACM Gordon Bell Prize finalist, in proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
[10]
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: Past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, August 2003.
[11]
International Business Machines Corporation. Accelerated Library Framework for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8406-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/5cyfc9.
[12]
International Business Machines Corporation. C/C++ Language Extensions for Cell Broadband Architecture, Version 2.5. February 27, 2008. Available from http://tinyurl.com/5stuga.
[13]
International Business Machines Corporation. Data Communication and Synchronization Library for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8408-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/6kn98k.
[14]
Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference, volume 1, The MPI Core. The MIT Press, Cambridge, Massachusetts, 2nd edition, September 1998. 1st edition is available from http://www.netlib.org/utk/papers/mpi-book/mpi-book.ps.
[15]
John A. Turner, Roadrunner Applications Team: Cell and Hybrid Results to Date. Los Alamos National Laboratory presentation. Available from http://www.lanl.gov/orgs/hpc/roadrunner/rrinfo/RR%20webPDFs/Turner_Apps_v6_LA -UR.pdf.
[16]
John McCalpin. "Memory bandwidth and machine balance in current high performance computers", in IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
[17]
International Business Machines Corporation. Data Communication and Sychronization for Hybrid-x86 Programmer's Guide and API Reference, version 3.0. October 19, 2007.
[18]
Richard L. Graham, Galen M. Shipman, Brian W. Barrett, Ralph H. Castain, George Bosilca, and Andrew Lumsdaine. Open MPI: A high-performance, heterogeneous MPI. In Fifth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar '06), pp. 1--9, Barcelona, Spain, September 2006.
[19]
Adolfy Hoisie, Olaf M. Lubek, Harvey J. Wasserman, Performance and Scalabilty Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications. Int. J. High Performance Computing Applications, 14 (4): 330--347, November 2000. Available from http://www.c3.lanl.gov/pal/publications/papers/terraflop00:parallel.pdf
[20]
F. Petrini, G. Fossum, J. Fernandez, A. L. Varbanescu, N. Kistler, M. Perrone, Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine, in proc. Int. Parallel and Distributed Processing Symposium, Long Beach, California, 2007.
[21]
A. Eichenberger et. al., Optimizing Compiler for the Cell Processor, PACT 2005, September 2005.
[22]
Kevin Krewell. Cell moves into the limelight. Microprocessor Report, pp. 1--9, February 14, 2005.
[23]
Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008. Available from http://www.c3.lanl.gov/PAL/publications/papers/Pakin2008:cellmsg.pdf.
[24]
Thomas Chen, Ram Raghavan, Jason N. Dale, and Eiji Iwata. Cell Broadband Engine Architecture and its first implementation---a performance view. IBM Journal of Research and Development, 51(5):559--572, September 2007. Available from http://www.research.ibm.com/journal/rd/515/chen.pdf.
[25]
Ashwini K. Nanda, J. Randal Moulic, Robert E. Hanson, Gottfried Goldrian, Michael N. Day, Bruce D. D'Amora, and Sreeni Kesavarapu. Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers. IBM Journal of Research and Development, 51(5):573--582, September 2007. Available from http://www.research.ibm.com/journal/rd/515/nanda.pdf.

Cited By

View all
  • (2018)Experiences Using CPUs and GPUs for Cooperative Computation in a Multi-Physics SimulationWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229711(1-10)Online publication date: 13-Aug-2018
  • (2018)Toward scalable and asynchronous object-centric data management for HPCProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00026(113-122)Online publication date: 1-May-2018
  • (2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing
November 2008
739 pages
ISBN:9781424428359

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Author Tags

  1. Roadrunner
  2. accelerators
  3. heterogeneous
  4. performance analysis
  5. petascale computing

Qualifiers

  • Research-article

Conference

SC '08
Sponsor:

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Experiences Using CPUs and GPUs for Cooperative Computation in a Multi-Physics SimulationWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229711(1-10)Online publication date: 13-Aug-2018
  • (2018)Toward scalable and asynchronous object-centric data management for HPCProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00026(113-122)Online publication date: 1-May-2018
  • (2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
  • (2016)Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiencyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014915(1-14)Online publication date: 13-Nov-2016
  • (2016)Interconnection Networks in Petascale Computer SystemsACM Computing Surveys10.1145/298338749:3(1-24)Online publication date: 16-Sep-2016
  • (2014)A performance comparison of current HPC systemsFuture Generation Computer Systems10.5555/2747903.274819530:C(291-304)Online publication date: 1-Jan-2014
  • (2014)The Experience in Designing and Evaluating the High Performance Cluster NetunoInternational Journal of Parallel Programming10.1007/s10766-012-0224-742:2(265-286)Online publication date: 1-Apr-2014
  • (2012)On the simulation of large-scale architectures using multiple application abstraction levelsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867158:4(1-20)Online publication date: 26-Jan-2012
  • (2012)Scheduling streaming applications on a complex multicore platformConcurrency and Computation: Practice & Experience10.1002/cpe.187424:15(1726-1750)Online publication date: 1-Oct-2012
  • (2011)The low-power architecture approach towards exascale computingProceedings of the second workshop on Scalable algorithms for large-scale systems10.1145/2133173.2133175(1-2)Online publication date: 14-Nov-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media