Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1654059.1654063acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Instruction-level simulation of a cluster at scale

Published: 14 November 2009 Publication History

Abstract

Instruction-level simulation is necessary to evaluate new architectures. However, single-node simulation cannot predict the behavior of a parallel application on a supercomputer. We present a scalable simulator that couples a cycle-accurate node simulator with a supercomputer network model. Our simulator executes individual instances of IBM's Mambo PowerPC simulator on hundreds of cores. We integrated a NIC emulator into Mambo and model the network instead of fully simulating it. This decouples the individual node simulators and makes our design scalable.
Our simulator runs unmodified parallel message-passing applications on hundreds of nodes. We can change network and detailed node parameters, inject network traffic directly into caches, and use different policies to decide when that is an advantage.
This paper describes our simulator in detail, evaluates it, and demonstrates its scalability. We show its suitability for architecture research by evaluating the impact of cache injection on parallel application performance.

References

[1]
J. Appavoo, M. Auslander, M. Burtico, D. D. Silva, O. Krieger, M. Mergen, M. Ostrowski, B. Rosenburg, R. W. Wisniewski, and J. Xenidis. K42: an open-source Linux-compatible scalable operating system kernel. IBM Systems Journal, 44(2):427--440, 2005.
[2]
E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega. Cotson: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev., 43(1):52--61, 2009.
[3]
J. Banks, J. S. C. II, B. L. Nelson, and D. Nicol. Discrete-Event System Simulation. Prentice-Hall, Inc., 3rd edition, 2000.
[4]
P. Bohrer, R. Rajamony, and H. Shafi. Method and apparatus for accelerating Input/Output processing using cache injections, Mar. 2004. US Patent No. US 6,711,650 B1.
[5]
D. Burger and D. A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, pages 22--31, Washington, DC, USA, 1995. IEEE Computer Society.
[6]
K. M. Chandy and J. Misra. Asynchronous distributed simulation via a sequence of parallel computations. Commun. ACM, 24(4):198--206, 1981.
[7]
W. E. Denzel, J. Li, P. Walker, and Y. Jin. A framework for end-to-end simulation of high-performance computing systems. In Simutools '08: Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems&workshops, pages 1--10, 2008.
[8]
R. F. V. der Wijngaart. NAS parallel benchmarks version 2.4. NAS Technical Report NAS-02-007, Computer Science Corporation, NASA Advanced Supercomputing(NAS) Division, NASA Ames Research Center, 2002.
[9]
A. Falcon, P. Faraboschi, and D. Ortega. An adaptive synchronization technique for parallel simulation of networked clusters. Performance Analysis of Systems and software, 2008. ISPASS 2008. IEEE International Symposium on, pages 22--31, April 2008.
[10]
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.
[11]
R. M. Fujimoto. Parallel discrete event simulation. Commun. ACM, 33(10):30--53, 1990.
[12]
R. M. Fujimoto, K. Perumalla, A. Park, H. Wu, M. H. Ammar, and G. F. Riley. Large-scale network simulation: How big? How fast? In 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems (MASCOTS), pages 116--123, Oct. 2003.
[13]
R. L. Graham, T. S. Woodall, and J. M. Squyres. Open MPI: A flexible high performance MPI. In Proceedings of the 6th Annual International Conference on Parallel Processing and Applied Mathematics, September 2005.
[14]
A. G. Greenberg, B. D. Lubachevsky, and I. Mitrani. Superfast parallel discrete event simulations. ACM Trans. Model. Comput. Simul., 6(2):107--136, 1996.
[15]
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, Sept. 1996.
[16]
A. Hoisie, G. Johnson, D. J. Kerbyson, M. Lang, and S. Pakin. A Performance Compariosn through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple. In Proceedings of the IEEE/ACM Conference on Supercomputing, Nov. 2006.
[17]
A. Hoisie, O. Lubeck, H. Wasserman, F. Petrini, and H. Alme. A general predictive performance model for wavefront algorithms on clusters of SMPs. In ICPP '00: Proceedings of the International Conference on Parallel Processing, pages 219--228, 2000.
[18]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture (ISCA'05), pages 50--59, June 2005.
[19]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 37--48, Nov. 2001.
[20]
Lawrence Livermore National Laboratory. ASC Sequoia benchmark codes. https://asc.llnl.gov/sequoia/benchmarks, Apr. 22 2008.
[21]
E. A. León, K. B. Ferreira, and A. B. Maccabe. Reducing the impact of the memory wall for I/O using cache injection. In 15th IEEE Symposium on High-Performance Interconnects (HOTI'07), Aug. 2007.
[22]
J. L. Peterson, P. J. Bohrer, L. Chen, E. N. Elnozahy, A. Gheith, R. H. Jewell, M. D. Kistler, T. R. Maeurer, S. A. Malone, D. B. Murrell, N. Needel, K. Rajamani, M. A. Rinaldi, R. O. Simpson, K. Sudeep, and L. Zhang. Application of full-system simulation in exploratory system design and development. IBM Journal of Research and Development, 50(2/3), Mar. 2006.
[23]
S. J. Plimpton. Fast parallel algorithms for short-range molecular dynamics. J Comp Phys, 117(1):1--19, 1995.
[24]
R. Riesen. A hybrid MPI simulator. In IEEE International Conference on Cluster Computing (CLUSTER'06), 2006.
[25]
Sandia National Laboratory. LAMMPS molecular dynamics simulator. http://lammps.sandia.gov, Nov. 6 2008.
[26]
Sandia National Laboratory. Mantevo project home page. https://software.sandia.gov/mantevo, Nov. 6 2008.
[27]
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. POWER5 system microarchitecture. IBM Journal of Research and Development, 49(4/5), 2005.
[28]
J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, Jan. 2002.
[29]
G. Zheng, T. Wilmarth, P. Jagadishprasad, and L. V. Kalé. Simulation-based performance prediction for large parallel machines. Int. J. Parallel Program., 33(2):183--207, 2005.

Cited By

View all
  • (2023)Dependency-Driven Interconnection Network Simulation Using MPI Traces2023 International Conference on Ubiquitous Communication (Ucom)10.1109/Ucom59132.2023.10257644(227-231)Online publication date: 7-Jul-2023
  • (2021)Teaching parallel and distributed computing concepts in simulation with WRENCHJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.009156(53-63)Online publication date: Oct-2021
  • (2019)Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC)10.1109/EduHPC49559.2019.00006(1-9)Online publication date: Nov-2019
  • Show More Cited By

Index Terms

  1. Instruction-level simulation of a cluster at scale

                                Recommendations

                                Comments

                                Please enable JavaScript to view thecomments powered by Disqus.

                                Information & Contributors

                                Information

                                Published In

                                cover image ACM Conferences
                                SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
                                November 2009
                                778 pages
                                ISBN:9781605587448
                                DOI:10.1145/1654059
                                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                Sponsors

                                Publisher

                                Association for Computing Machinery

                                New York, NY, United States

                                Publication History

                                Published: 14 November 2009

                                Permissions

                                Request permissions for this article.

                                Check for updates

                                Qualifiers

                                • Research-article

                                Funding Sources

                                Conference

                                SC '09
                                Sponsor:

                                Acceptance Rates

                                SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
                                Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

                                Contributors

                                Other Metrics

                                Bibliometrics & Citations

                                Bibliometrics

                                Article Metrics

                                • Downloads (Last 12 months)4
                                • Downloads (Last 6 weeks)0
                                Reflects downloads up to 30 Sep 2024

                                Other Metrics

                                Citations

                                Cited By

                                View all
                                • (2023)Dependency-Driven Interconnection Network Simulation Using MPI Traces2023 International Conference on Ubiquitous Communication (Ucom)10.1109/Ucom59132.2023.10257644(227-231)Online publication date: 7-Jul-2023
                                • (2021)Teaching parallel and distributed computing concepts in simulation with WRENCHJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.009156(53-63)Online publication date: Oct-2021
                                • (2019)Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC)10.1109/EduHPC49559.2019.00006(1-9)Online publication date: Nov-2019
                                • (2018)WRENCH: A Framework for Simulating Workflow Management Systems2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS.2018.00013(74-85)Online publication date: Nov-2018
                                • (2017)Simulating MPI Applications: The SMPI ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.266930528:8(2387-2400)Online publication date: 1-Aug-2017
                                • (2017)Cluster Performance Simulation for Spark Deployment Planning, Evaluation and OptimizationSimulation and Modeling Methodologies, Technologies and Applications10.1007/978-3-319-69832-8_3(34-51)Online publication date: 28-Oct-2017
                                • (2016)MUSAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014965(1-12)Online publication date: 13-Nov-2016
                                • (2016)Virtualized I/OAttaining High Performance Communications10.1201/b10249-17(261-282)Online publication date: 19-Apr-2016
                                • (2016)WBSPIEEE Transactions on Computers10.1109/TC.2015.243925365:3(992-1005)Online publication date: 1-Mar-2016
                                • (2016)MUSA: A Multi-level Simulation Approach for Next-Generation HPC MachinesSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.44(526-537)Online publication date: Nov-2016
                                • Show More Cited By

                                View Options

                                Get Access

                                Login options

                                View options

                                PDF

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader

                                Media

                                Figures

                                Other

                                Tables

                                Share

                                Share

                                Share this Publication link

                                Share on social media