research-article

Instruction-level simulation of a cluster at scale

Authors:

Edgar A. León,

Arthur B. Maccabe,

Patrick G. BridgesAuthors Info & Claims

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Article No.: 3, Pages 1 - 12

https://doi.org/10.1145/1654059.1654063

Published: 14 November 2009 Publication History

Abstract

Instruction-level simulation is necessary to evaluate new architectures. However, single-node simulation cannot predict the behavior of a parallel application on a supercomputer. We present a scalable simulator that couples a cycle-accurate node simulator with a supercomputer network model. Our simulator executes individual instances of IBM's Mambo PowerPC simulator on hundreds of cores. We integrated a NIC emulator into Mambo and model the network instead of fully simulating it. This decouples the individual node simulators and makes our design scalable.

Our simulator runs unmodified parallel message-passing applications on hundreds of nodes. We can change network and detailed node parameters, inject network traffic directly into caches, and use different policies to decide when that is an advantage.

This paper describes our simulator in detail, evaluates it, and demonstrates its scalability. We show its suitability for architecture research by evaluating the impact of cache injection on parallel application performance.

References

[1]

J. Appavoo, M. Auslander, M. Burtico, D. D. Silva, O. Krieger, M. Mergen, M. Ostrowski, B. Rosenburg, R. W. Wisniewski, and J. Xenidis. K42: an open-source Linux-compatible scalable operating system kernel. IBM Systems Journal, 44(2):427--440, 2005.

Digital Library

[2]

E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega. Cotson: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev., 43(1):52--61, 2009.

Digital Library

[3]

J. Banks, J. S. C. II, B. L. Nelson, and D. Nicol. Discrete-Event System Simulation. Prentice-Hall, Inc., 3rd edition, 2000.

[4]

P. Bohrer, R. Rajamony, and H. Shafi. Method and apparatus for accelerating Input/Output processing using cache injections, Mar. 2004. US Patent No. US 6,711,650 B1.

[5]

D. Burger and D. A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, pages 22--31, Washington, DC, USA, 1995. IEEE Computer Society.

Digital Library

[6]

K. M. Chandy and J. Misra. Asynchronous distributed simulation via a sequence of parallel computations. Commun. ACM, 24(4):198--206, 1981.

Digital Library

[7]

W. E. Denzel, J. Li, P. Walker, and Y. Jin. A framework for end-to-end simulation of high-performance computing systems. In Simutools '08: Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems&workshops, pages 1--10, 2008.

Digital Library

[8]

R. F. V. der Wijngaart. NAS parallel benchmarks version 2.4. NAS Technical Report NAS-02-007, Computer Science Corporation, NASA Advanced Supercomputing(NAS) Division, NASA Ames Research Center, 2002.

[9]

A. Falcon, P. Faraboschi, and D. Ortega. An adaptive synchronization technique for parallel simulation of networked clusters. Performance Analysis of Systems and software, 2008. ISPASS 2008. IEEE International Symposium on, pages 22--31, April 2008.

Digital Library

[10]

M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.

[11]

R. M. Fujimoto. Parallel discrete event simulation. Commun. ACM, 33(10):30--53, 1990.

Digital Library

[12]

R. M. Fujimoto, K. Perumalla, A. Park, H. Wu, M. H. Ammar, and G. F. Riley. Large-scale network simulation: How big? How fast? In 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems (MASCOTS), pages 116--123, Oct. 2003.

[13]

R. L. Graham, T. S. Woodall, and J. M. Squyres. Open MPI: A flexible high performance MPI. In Proceedings of the 6th Annual International Conference on Parallel Processing and Applied Mathematics, September 2005.

Digital Library

[14]

A. G. Greenberg, B. D. Lubachevsky, and I. Mitrani. Superfast parallel discrete event simulations. ACM Trans. Model. Comput. Simul., 6(2):107--136, 1996.

Digital Library

[15]

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, Sept. 1996.

Digital Library

[16]

A. Hoisie, G. Johnson, D. J. Kerbyson, M. Lang, and S. Pakin. A Performance Compariosn through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple. In Proceedings of the IEEE/ACM Conference on Supercomputing, Nov. 2006.

Digital Library

[17]

A. Hoisie, O. Lubeck, H. Wasserman, F. Petrini, and H. Alme. A general predictive performance model for wavefront algorithms on clusters of SMPs. In ICPP '00: Proceedings of the International Conference on Parallel Processing, pages 219--228, 2000.

Digital Library

[18]

R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture (ISCA'05), pages 50--59, June 2005.

Digital Library

[19]

D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 37--48, Nov. 2001.

Digital Library

[20]

Lawrence Livermore National Laboratory. ASC Sequoia benchmark codes. https://asc.llnl.gov/sequoia/benchmarks, Apr. 22 2008.

[21]

E. A. León, K. B. Ferreira, and A. B. Maccabe. Reducing the impact of the memory wall for I/O using cache injection. In 15th IEEE Symposium on High-Performance Interconnects (HOTI'07), Aug. 2007.

Digital Library

[22]

J. L. Peterson, P. J. Bohrer, L. Chen, E. N. Elnozahy, A. Gheith, R. H. Jewell, M. D. Kistler, T. R. Maeurer, S. A. Malone, D. B. Murrell, N. Needel, K. Rajamani, M. A. Rinaldi, R. O. Simpson, K. Sudeep, and L. Zhang. Application of full-system simulation in exploratory system design and development. IBM Journal of Research and Development, 50(2/3), Mar. 2006.

Digital Library

[23]

S. J. Plimpton. Fast parallel algorithms for short-range molecular dynamics. J Comp Phys, 117(1):1--19, 1995.

Digital Library

[24]

R. Riesen. A hybrid MPI simulator. In IEEE International Conference on Cluster Computing (CLUSTER'06), 2006.

[25]

Sandia National Laboratory. LAMMPS molecular dynamics simulator. http://lammps.sandia.gov, Nov. 6 2008.

[26]

Sandia National Laboratory. Mantevo project home page. https://software.sandia.gov/mantevo, Nov. 6 2008.

[27]

B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. POWER5 system microarchitecture. IBM Journal of Research and Development, 49(4/5), 2005.

Digital Library

[28]

J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, Jan. 2002.

Digital Library

[29]

G. Zheng, T. Wilmarth, P. Jagadishprasad, and L. V. Kalé. Simulation-based performance prediction for large parallel machines. Int. J. Parallel Program., 33(2):183--207, 2005.

Digital Library

Cited By

Zou HYu XLiu LZou T(2023)Dependency-Driven Interconnection Network Simulation Using MPI Traces2023 International Conference on Ubiquitous Communication (Ucom)10.1109/Ucom59132.2023.10257644(227-231)Online publication date: 7-Jul-2023
https://doi.org/10.1109/Ucom59132.2023.10257644
Casanova HTanaka RKoch WFerreira da Silva R(2021)Teaching parallel and distributed computing concepts in simulation with WRENCHJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.009156(53-63)Online publication date: Oct-2021
https://doi.org/10.1016/j.jpdc.2021.05.009
Tanaka RFerreira da Silva RCasanova H(2019)Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC)10.1109/EduHPC49559.2019.00006(1-9)Online publication date: Nov-2019
https://doi.org/10.1109/EduHPC49559.2019.00006
Show More Cited By

Index Terms

Instruction-level simulation of a cluster at scale

Recommendations

Parallel simulation techniques for large-scale discrete-event models
On the parallel simulation of scale-free networks
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Scale-free networks have received much attention in recent years due to their prevalence in many important applications such as social networks, biological systems, and the Internet. We consider the use of conservative parallel discrete event simulation ...
Instruction set compiled simulation: a technique for fast and flexible instruction set simulation
DAC '03: Proceedings of the 40th annual Design Automation Conference

Instruction set simulators are critical tools for the exploration and validation of new programmable architectures. Due to increasing complexity of the architectures and time-to-market pressure, performance is the most important feature of an ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

November 2009

778 pages

ISBN:9781605587448

DOI:10.1145/1654059

Conference Chair:
Wilfred Pinfold

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

International Business Machines Corporation

Conference

SC '09

Sponsor:

SIGARCH
IEEE-CS

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 20, 2009

Oregon, Portland

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
25
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zou HYu XLiu LZou T(2023)Dependency-Driven Interconnection Network Simulation Using MPI Traces2023 International Conference on Ubiquitous Communication (Ucom)10.1109/Ucom59132.2023.10257644(227-231)Online publication date: 7-Jul-2023
https://doi.org/10.1109/Ucom59132.2023.10257644
Casanova HTanaka RKoch WFerreira da Silva R(2021)Teaching parallel and distributed computing concepts in simulation with WRENCHJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.009156(53-63)Online publication date: Oct-2021
https://doi.org/10.1016/j.jpdc.2021.05.009
Tanaka RFerreira da Silva RCasanova H(2019)Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC)10.1109/EduHPC49559.2019.00006(1-9)Online publication date: Nov-2019
https://doi.org/10.1109/EduHPC49559.2019.00006
Casanova HPandey SOeth JTanaka RSuter FFerreira da Silva R(2018)WRENCH: A Framework for Simulating Workflow Management Systems2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS.2018.00013(74-85)Online publication date: Nov-2018
https://doi.org/10.1109/WORKS.2018.00013
Degomme ALegrand AMarkomanolis GQuinson MStillwell MSuter F(2017)Simulating MPI Applications: The SMPI ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.266930528:8(2387-2400)Online publication date: 1-Aug-2017
https://doi.org/10.1109/TPDS.2017.2669305
Chen QWang KBian ZCremer IXu GGuo Y(2017)Cluster Performance Simulation for Spark Deployment Planning, Evaluation and OptimizationSimulation and Modeling Methodologies, Technologies and Applications10.1007/978-3-319-69832-8_3(34-51)Online publication date: 28-Oct-2017
https://doi.org/10.1007/978-3-319-69832-8_3
Grass TAllande CArmejach ARico AAyguadé ELabarta JValero MCasas MMoreto MWest J(2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014965
(2016)Virtualized I/OAttaining High Performance Communications10.1201/b10249-17(261-282)Online publication date: 19-Apr-2016
https://doi.org/10.1201/b10249-17
Wu JZhu XLi TSui X(2016)WBSPIEEE Transactions on Computers10.1109/TC.2015.243925365:3(992-1005)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1109/TC.2015.2439253
Grass TAllande CArmejach ARico AAyguade ELabarta JValero MCasas MMoreto M(2016)MUSA: A Multi-level Simulation Approach for Next-Generation HPC MachinesSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.44(526-537)Online publication date: Nov-2016
https://doi.org/10.1109/SC.2016.44
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten