research-article

ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Authors:

Daniel Sanchez,

Christos KozyrakisAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 41, Issue 3

Pages 475 - 486

https://doi.org/10.1145/2508148.2485963

Published: 23 June 2013 Publication History

Abstract

Architectural simulation is time-consuming, and the trend towards hundreds of cores is making sequential simulation even slower. Existing parallel simulation techniques either scale poorly due to excessive synchronization, or sacrifice accuracy by allowing event reordering and using simplistic contention models. As a result, most researchers use sequential simulators and model small-scale systems with 16-32 cores. With 100-core chips already available, developing simulators that scale to thousands of cores is crucial.

We present three novel techniques that, together, make thousand-core simulation practical. First, we speed up detailed core models (including OOO cores) with instruction-driven timing models that leverage dynamic binary translation. Second, we introduce bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy. Third, we implement lightweight user-level virtualization to support complex workloads, including multiprogrammed, client-server, and managed-runtime applications, without the need for full-system simulation, sidestepping the lack of scalable OSs and ISAs that support thousands of cores.

We use these techniques to build zsim, a fast, scalable, and accurate simulator. On a 16-core host, zsim models a 1024-core chip at speeds of up to 1,500 MIPS using simple cores and up to 300 MIPS using detailed OOO cores, 2-3 orders of magnitude faster than existing parallel simulators. Simulator performance scales well with both the number of modeled cores and the number of host cores. We validate zsim against a real Westmere system on a wide variety of workloads, and find performance and microarchitectural events to be within a narrow range of the real system.

References

[1]

Computer architecture simulation and modeling. IEEE Micro Special Issue, 26(4), 2006.

[2]

A. Alameldeen and D. Wood. IPC considered harmful for multiprocessor workloads. IEEE Micro, 26(4), 2006.

Digital Library

[3]

C. Bienia, S. Kumar, J. P. Singh, et al. The PARSEC benchmark suite: Characterization and architectural implications. In PACT-17, 2008.

Digital Library

[4]

N. Binkert, B. Beckmann, G. Black, et al. The gem5 simulator. SIGARCH Comp. Arch. News, 39(2), 2011.

Digital Library

[5]

N. Binkert, R. Dreslinski, L. Hsu, et al. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4), 2006.

Digital Library

[6]

E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs CISC Debate on Contemporary ARM and x86 Architectures. In HPCA-19, 2013.

Digital Library

[7]

S. Boyd-Wickizer, H. Chen, R. Chen, et al. Corey: An operating system for many cores. In OSDI-8, 2008.

Digital Library

[8]

T. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Supercomputing, 2011.

Digital Library

[9]

S. Chandrasekaran and M. D. Hill. Optimistic simulation of parallel architectures using program executables. In PADS, 1996.

Digital Library

[10]

J. Chen, L. K. Dabbiru, D. Wong, et al. Adaptive and speculative slack simulations of CMPs on CMPs. In MICRO-43, 2010.

Digital Library

[11]

M. Chidester and A. George. Parallel simulation of chip-multiprocessor architectures. TOMACS, 12(3), 2002.

Digital Library

[12]

D. Chiou, D. Sunwoo, J. Kim, et al. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In MICRO-40, 2007.

Digital Library

[13]

A. Fog. Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, http://www.agner.org/optimize/.

[14]

R. Fujimoto. Parallel discrete event simulation. CACM, 33--10, 1990.

Digital Library

[15]

D. R. Hower, P. Montesinos, L. Ceze, et al. Two hardware-based approaches for deterministic multiprocessor replay. CACM, 52--6, 2009.

Digital Library

[16]

X. Huang, J. Moss, K. McKinley, et al. Dynamic simplescalar: Simulating java virtual machines. Technical report, UT Austin, 2003.

[17]

Intel. Intel Xeon E3-1200 Family. Datasheet, 2011.

[18]

A. Jaleel, R. Cohn, C. Luk, and B. Jacob. CMPSim: A Pin-based on-the-fly multi-core cache simulator. In MoBS-4, 2008.

[19]

A. Khan, M. Vijayaraghavan, S. Boyd-Wickizer, and Arvind. Fast cycle-accurate modeling of a multicore processor. In ISPASS, 2012.

Digital Library

[20]

G. Kurian, J. Miller, J. Psota, et al. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In PACT-19, 2010.

Digital Library

[21]

R. Liu, K. Klues, S. Bird, et al. Tessellation: Spacetime partitioning in a manycore client os. In HotPar, 2009.

Digital Library

[22]

C.-K. Luk, R. Cohn, R. Muth, et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.

Digital Library

[23]

K. T. Malladi, B. C. Lee, F. A. Nothaft, et al. Towards energy-proportional datacenter memory with mobile DRAM. In ISCA-39, 2012.

Digital Library

[24]

K. T. Malladi, I. Shaeffer, L. Gopalakrishnan, et al. Rethinking DRAM power modes for energy proportionality. In MICRO-45, 2012.

Digital Library

[25]

M. Martin, D. Sorin, B. Beckmann, et al. Multi-facet's general execution driven multiprocessor simulator (gems) toolset. Comp. Arch. News, 33--4, 2005.

Digital Library

[26]

C. J. Mauer, M. D. Hill, and D. A. Wood. Full-system timing-first simulation. In SIGMETRICS conf., 2002.

Digital Library

[27]

J. Miller, H. Kasture, G. Kurian, et al. Graphite: A distributed parallel simulator for multicores. In HPCA-16, 2010.

[28]

H. Pan, B. Hindman, and K. Asanovic. Lithe: Enabling efficient composition of parallel libraries. HotPar, 2009.

Digital Library

[29]

A. Patel, F. Afram, S. Chen, and K. Ghose. MARSS: A full system simulator for multicore x86 CPUs. In DAC-48, 2011.

Digital Library

[30]

A. Patel, F. Afram, K. Ghose, et al. MARSS: Micro Architectural Systems Simulator. In ISCA tutorial 6, 2012.

[31]

M. Pellauer, M. Adler, M. Kinsy, et al. HAsim: FPGA-based high detail multicore simulation using time-division multiplexing. In HPCA-17, 2011.

Digital Library

[32]

A. Pesterev, J. Strauss, N. Zeldovich, and R. Morris. Improving network connection locality on multicore systems. In EuroSys-7, 2012.

Digital Library

[33]

S. K. Reinhardt, M. D. Hill, J. R. Larus, et al. The Wisconsin Wind Tunnel: virtual prototyping of parallel computers. In SIGMETRICS conf., 1993.

Digital Library

[34]

P. Ren, M. Lis, M. Cho, et al. HORNET: A Cycle-Level Multicore Simulator. IEEE TCAD, 31(6), 2012.

[35]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAM-Sim2: A Cycle Accurate Memory System Simulator. CAL, 10(1), 2011.

Digital Library

[36]

D. Sanchez and C. Kozyrakis. The ZCache: Decoupling Ways and Associativity. In MICRO-43, 2010.

Digital Library

[37]

D. Sanchez and C. Kozyrakis. Vantage: Scalable and Efficient Fine-Grain Cache Partitioning. In ISCA-38, 2011.

Digital Library

[38]

D. Sanchez and C. Kozyrakis. Scalable and Efficient Fine-Grained Cache Partitioning with Vantage. IEEE Micro's Top Picks, 32(3), 2012.

Digital Library

[39]

D. Sanchez and C. Kozyrakis. SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding. In HPCA-18, 2012.

Digital Library

[40]

D. Sanchez, D. Lo, R. Yoo, et al. Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In PACT-20, 2011.

Digital Library

[41]

D. Sanchez, G. Michelogiannakis, and C. Kozyrakis. An Analysis of Interconnection Networks for Large Scale Chip-Multiprocessors. TACO, 7(1), 2010.

Digital Library

[42]

E. Schnarr and J. R. Larus. Fast out-of-order processor simulation using memoization. In ASPLOS-8, 1998.

Digital Library

[43]

E. C. Schnarr, M. D. Hill, and J. R. Larus. Facile: A language and compiler for high-performance processor simulators. In PLDI, 2001.

Digital Library

[44]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS-10, 2002.

Digital Library

[45]

J. Shin, K. Tam, D. Huang, et al. A 40nm 16-core 128-thread CMT SPARC SoC processor. In ISSCC, 2010.

[46]

S. Srinivasan, L. Zhao, B. Ganesh, et al. CMP Memory Modeling: How Much Does Accuracy Matter? In MoBS-5, 2009.

[47]

Z. Tan, A. Waterman, R. Avizienis, et al. RAMP Gold: An FPGA-based architecture simulator for multiprocessors. In DAC-47, 2010.

Digital Library

[48]

Tilera. TILE-Gx 3000 Series Overview. Technical report, 2011.

[49]

T. von Eicken, A. Basu, V. Buch, et al. U-net: a user-level network interface for parallel and distributed computing. In SOSP-15, 1995.

Digital Library

[50]

J. Wawrzynek, D. Patterson, M. Oskin, et al. RAMP: Research accelerator for multiple processors. IEEE Micro, 27(2), 2007.

Digital Library

[51]

T. Wenisch, R. Wunderlich, M. Ferdman, et al. Simflex: statistical sampling of computer system simulation. IEEE Micro, 26(4), 2006.

Digital Library

[52]

E. Witchel and M. Rosenblum. Embra: Fast and flexible machine simulation. In SIGMETRICS Perf. Eval. Review, volume 24, 1996.

Digital Library

[53]

R. Wunderlich, T. Wenisch, B. Falsafi, and J. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In ISCA-30, 2003.

Digital Library

Cited By

Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Sun WYue JHe YHuang ZWang JJia WLi YLei LJia HLiu Y(2024)A Survey of Computing-in-Memory Processor: From Circuit to ApplicationIEEE Open Journal of the Solid-State Circuits Society10.1109/OJSSCS.2023.33282904(25-42)Online publication date: 2024
https://doi.org/10.1109/OJSSCS.2023.3328290
Xu KTziantzioulis GWentzlaff D(2024)MindPalace: A Framework for Studying Microarchitecture Design of Function-as-a-Service2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00042(313-315)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00042
Show More Cited By

Recommendations

ZSim: fast and accurate microarchitectural simulation of thousand-core systems
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Architectural simulation is time-consuming, and the trend towards hundreds of cores is making sequential simulation even slower. Existing parallel simulation techniques either scale poorly due to excessive synchronization, or sacrifice accuracy by ...
Table-based modeling of delta-sigma modulators using ZSIM

ZSIM, a nonlinear Z -domain simulator for sampled-data systems, is presented and verified. ZSIM integrates analytic tools, a difference equation simulator, a table-based nonlinear Z -domain simulator, and digital signal processing into a workstation ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 41, Issue 3

ICSA '13

June 2013

666 pages

ISSN:0163-5964

DOI:10.1145/2508148

Issue’s Table of Contents

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Published in SIGARCH Volume 41, Issue 3

Check for updates

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

432
Total Citations
View Citations
2,267
Total Downloads

Downloads (Last 12 months)254
Downloads (Last 6 weeks)41

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Sun WYue JHe YHuang ZWang JJia WLi YLei LJia HLiu Y(2024)A Survey of Computing-in-Memory Processor: From Circuit to ApplicationIEEE Open Journal of the Solid-State Circuits Society10.1109/OJSSCS.2023.33282904(25-42)Online publication date: 2024
https://doi.org/10.1109/OJSSCS.2023.3328290
Xu KTziantzioulis GWentzlaff D(2024)MindPalace: A Framework for Studying Microarchitecture Design of Function-as-a-Service2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00042(313-315)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00042
Orenes-Vera MTureci EMartonosi MWentzlaff D(2024)MuchiSim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00015(48-60)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00015
Bakhshalipour MGibbons P(2024)Tartan: Microarchitecting a Robotic Processor2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00047(548-565)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00047
Åleskog CGrahn HBorg A(2024)A Comparative Study on Simulation Frameworks for AI Accelerator Evaluation2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00073(321-328)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00073
Hwang IJang JKim H(2024)An Architecture-Level Framework for Enabling Processing-Using-Memory Simulations in Deep Neural Networks2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457163(1-3)Online publication date: 28-Jan-2024
https://doi.org/10.1109/ICEIC61013.2024.10457163
Lee SKim YNam DKim J(2024)Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction SetsIEEE Access10.1109/ACCESS.2024.335929612(20767-20778)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3359296
Seyedfaraji SBichl MAftab ARehman S(2024)HOPE: Holistic STT-RAM Architecture Exploration Framework for Future Cross-Platform AnalysisIEEE Access10.1109/ACCESS.2024.335889112(16598-16609)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3358891
Sabu ALiu CCarlson T(2024)Viper: Utilizing Hierarchical Program Structure to Accelerate Multi-Core SimulationIEEE Access10.1109/ACCESS.2024.335406912(17669-17678)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3354069
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents