research-article

Ramulator: A Fast and Extensible DRAM Simulator

Authors:

Onur MutluAuthors Info & Claims

IEEE Computer Architecture Letters, Volume 15, Issue 1

Pages 45 - 49

https://doi.org/10.1109/LCA.2015.2414456

Published: 01 January 2016 Publication History

Abstract

Recently, both industry and academia have proposed many different roadmaps for the future of DRAM. Consequently, there is a growing need for an extensible DRAM simulator, which can be easily modified to judge the merits of today's DRAM standards as well as those of tomorrow. In this paper, we present Ramulator , a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility. Unlike existing simulators, Ramulator is based on a generalized template for modeling a DRAM system, which is only later infused with the specific details of a DRAM standard. Thanks to such a decoupled and modular design, Ramulator is able to provide out-of-the-box support for a wide array of DRAM standards: DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, as well as some academic proposals (SALP, AL-DRAM, TL-DRAM, RowClone, and SARP). Importantly, Ramulator does not sacrifice simulation speed to gain extensibility: according to our evaluations, Ramulator is 2.5 $\times$ faster than the next fastest simulator. Ramulator is released under the permissive BSD license.

References

[1]

A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Rethinking DRAM design and organization for energy-constrained multi-cores,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 175–186.

Digital Library

[2]

Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray-level parallelism (SALP) in DRAM,” in Proc. 39th Annu. Int. Symp. Comput. Archit., 2012, pp. 368–379.

Digital Library

[3]

JEDEC, <italic>JESD79-3 DDR3 SDRAM Standard</italic>, Jun. 2007.

[4]

JEDEC, <italic>JESD79-4 DDR4 SDRAM</italic>, Sep. 2012.

[5]

JEDEC, <italic>JESD209-3 Low Power Double Data Rate 3 (LPDDR3)</italic>, May 2012.

[6]

JEDEC, <italic>JESD209-4 Low Power Double Data Rate 3 (LPDDR4)</italic>, Aug. 2014.

[7]

JEDEC, <italic>JESD212 GDDR5 SGRAM</italic>, Dec. 2009.

[8]

M. Meterelliyoz, F. Al-amoody, U. Arslan, F. Hamzaoglu, L. Hood, M. Lal, J. Miller, A. Ramasundar, D. Soltman, W. Ifar, Y. Wang, and K. Zhang, “2nd generation embedded DRAM with 4x lower self refresh power in 22nm tri-gate CMOS technology,” in Proc. VLSI Symp., 2014, pp. 1–2.

[9]

{Online}. Available: http://investors.micron.com/releasedetail.cfm?ReleaseID=581168

[10]

, <italic>JESD229 Wide I/O Single Data Rate (Wide/IO SDR)</italic>, 2011.

[11]

, <italic>JESD229-2 Wide I/O 2 (WideIO2)</italic>, 2014.

[12]

J. Reinders, “Knights corner: Your path to knights landing,” <day>17</day>, 2014.

[13]

JEDEC, <italic>JESD235 High Bandwidth Memory (HBM) DRAM</italic>, 2013.

[14]

Hybrid Memory Cube Consortium, <italic>HMC Specification 1.0</italic>, Jan. 2013.

[15]

Hybrid Memory Cube Consortium, <italic>HMC Specification 1.1</italic>, Feb. 2014.

[16]

N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Staged reads: Mitigating the impact of DRAM writes on DRAM reads,” in Proc. 8th Int. Symp. High-Perform. Comput., 2012, pp. 1–12.

Digital Library

[17]

J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-aware intelligent DRAM refresh,” in Proc. Annu. Int. Symp. Comput. Archit., 2012, pp. 1–12.

Digital Library

[18]

D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, “Tiered-latency DRAM: A low latency and low cost DRAM architecture,” in Proc. 19th Int. Symp. High-Perform. Comput., 2013, pp. 615–626.

Digital Library

[19]

V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, “RowClone: Fast and efficient in-DRAM copy and initialization of bulk data,” in Proc. 46th Annu. IEEE/ACM Int. Symp. Microarchitect., 2013, pp. 185–197.

Digital Library

[20]

T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie, “Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation,” in Proc. 41st Annu. Int. Symp. Comput. Archit., 2014, pp. 349–360.

Digital Library

[21]

S. O, Y. H. Son, N. S. Kim, and J. H. Ahn, “Row-buffer decoupling: A case for low-latency DRAM microarchitecture,” in Proc. 41st Annu. Int. Symp. Comput. Archit., 2014, pp. 337–348.

Digital Library

[22]

K. Chang, D. Lee, Z. Chishti, C. Wilkerson, A. Alameldeen, Y. Kim, and O. Mutlu, “Improving DRAM performance by parallelizing refreshes with accesses,” in Proc. th Int. Symp. High-Perform. Comput., 2014, pp. 356–367.

[23]

D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu, “Adaptive-latency DRAM: Optimizing DRAM timing for the common-case,” in Proc. 21st Int. Symp. High-Perform. Comput., 2015, pp. 489–501.

[24]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A cycle accurate memory system simulator,” IEEE Comput. Archit. Lett., vol. Volume 10, no. Issue 1, pp. 16–19, 2011.

Digital Library

[25]

N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. H. Pugsley, A. N. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, “USIMM: The utah simulated memory module,” Univ. Utah, Salt Lake City, UT, USA, Tech. Rep. UUCS-12-002, 2012.

[26]

M. K. Jeong, D. H. Yoon, and M. Erez. (2012). DrSim: A platform for flexible DRAM system research {Online}. Available: http://lph.ece.utexas.edu/public/DrSim

[27]

M. Poremba and Y. Xie, “NVMain: An architectural-level main memory simulator for emerging non-volatile memories,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2012, pp. 392–397.

Digital Library

[28]

A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2009, pp. 163–174.

[29]

J. H. Ahn, S. Li, O. Seongil, and N. Jouppi, “McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2013, pp. 74–85.

[30]

A. Hansson, N. Agarwal, A. Kolli, T. Wenisch, and A. Udipi, “Simulating DRAM controllers for future system architecture exploration,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2014, pp. 163–174.

[31]

K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, and K. Goossens. (2012). DRAMPower: Open-source DRAM power & energy estimation tool {Online}. Available: http://www.drampower.info

[32]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The Gem5 simulator,” SIGARCH Comput. Archit. News, vol. Volume 39, no. Issue 2, pp. 1–7, 2011.

Digital Library

[33]

{Online}. Available: https://github.com/CMU-SAFARI/ramulator

[34]

Micron, “DDR3 SDRAM verilog model,” 2012, https://www.micron.com/parts/dram/ddr3-sdram/mt41k512m4hx-15e

[35]

S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens, “Memory access scheduling,” in Proc. 27th Annu. Int. Symp. Comput. Archit., 2000, pp. 128–138.

Digital Library

[36]

Hynix, <italic>GDDR5 SGRAM H5GQ1H24AFR</italic>, 2009.

[37]

U. Kang, H. soo Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi, “Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling,” in Proc. Memory Forum (Co-located with ISCA), 2014.

[38]

O. Mutlu, “Memory scaling: A systems architecture perspective,” in Proc. 5th IEEE Int. Memory Workshop, 2013, pp. 21–25.

Cited By

Zhang XZhang GWang LZhang SZhao X(2024)AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673075(1001-1011)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673075
Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Liu QHuang DCostero LZapater MAtienza D(2024)Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloadsACM Transactions on Architecture and Code Optimization10.1145/365920721:3(1-23)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3659207
Show More Cited By

Recommendations

Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator
We present Ramulator 2.0, a highly modular and extensible DRAM simulator that enables rapid and agile implementation and evaluation of design changes in the memory controller and DRAM to meet the increasing research effort in improving the performance, ...
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
WOM-Code Solutions for Low Latency and High Endurance in Phase Change Memory
This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM—attributed to PCM SET—...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Computer Architecture Letters

IEEE Computer Architecture Letters Volume 15, Issue 1

January 2016

65 pages

ISSN:1556-6056

Issue’s Table of Contents

Copyright © 2016.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 January 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

196
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XZhang GWang LZhang SZhao X(2024)AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673075(1001-1011)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673075
Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Liu QHuang DCostero LZapater MAtienza D(2024)Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloadsACM Transactions on Architecture and Code Optimization10.1145/365920721:3(1-23)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3659207
Li YTian BGao M(2024)Trimma: Trimming Metadata Storage and Latency for Hybrid Memory SystemsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689612(108-120)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689612
Yun SNam HKyung KPark JKim BKwon YLee EAhn J(2024)CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding LayersProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656595(338-351)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656595
Ji SLiu CDing YLiao QTang ZDe V(2024)A Real-time Execution System of Multimodal Transformer through PIM-GPU CollaborationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657336(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3657336
Lee YLee JKwon JLee YRo WDe V(2024)Geneva: A Dynamic Confluence of Speculative Execution and In-Order Commitment WindowsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655924(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655924
Park JChoi JKyung KKim MKwon YKim NAhn JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640422(103-119)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640422
Zhang BKannan RBusart CPrasanna V(2024)VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision TasksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346689135:12(2405-2422)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3466891
Xue RHan DYan MZou MYang XWang DLi WTang ZKim JYe XFan D(2024)HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339484135:7(1122-1138)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3394841
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents