Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Ramulator: A Fast and Extensible DRAM Simulator

Published: 01 January 2016 Publication History

Abstract

Recently, both industry and academia have proposed many different roadmaps for the future of DRAM. Consequently, there is a growing need for an extensible DRAM simulator, which can be easily modified to judge the merits of today's DRAM standards as well as those of tomorrow. In this paper, we present Ramulator , a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility. Unlike existing simulators, Ramulator is based on a generalized template for modeling a DRAM system, which is only later infused with the specific details of a DRAM standard. Thanks to such a decoupled and modular design, Ramulator is able to provide out-of-the-box support for a wide array of DRAM standards: DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, as well as some academic proposals (SALP, AL-DRAM, TL-DRAM, RowClone, and SARP). Importantly, Ramulator does not sacrifice simulation speed to gain extensibility: according to our evaluations, Ramulator is 2.5 $\times$ faster than the next fastest simulator. Ramulator is released under the permissive BSD license.

References

[1]
A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Rethinking DRAM design and organization for energy-constrained multi-cores,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 175–186.
[2]
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray-level parallelism (SALP) in DRAM,” in Proc. 39th Annu. Int. Symp. Comput. Archit., 2012, pp. 368–379.
[3]
JEDEC, <italic>JESD79-3 DDR3 SDRAM Standard</italic>, Jun. 2007.
[4]
JEDEC, <italic>JESD79-4 DDR4 SDRAM</italic>, Sep. 2012.
[5]
JEDEC, <italic>JESD209-3 Low Power Double Data Rate 3 (LPDDR3)</italic>, May 2012.
[6]
JEDEC, <italic>JESD209-4 Low Power Double Data Rate 3 (LPDDR4)</italic>, Aug. 2014.
[7]
JEDEC, <italic>JESD212 GDDR5 SGRAM</italic>, Dec. 2009.
[8]
M. Meterelliyoz, F. Al-amoody, U. Arslan, F. Hamzaoglu, L. Hood, M. Lal, J. Miller, A. Ramasundar, D. Soltman, W. Ifar, Y. Wang, and K. Zhang, “2nd generation embedded DRAM with 4x lower self refresh power in 22nm tri-gate CMOS technology,” in Proc. VLSI Symp., 2014, pp. 1–2.
[9]
{Online}. Available: http://investors.micron.com/releasedetail.cfm?ReleaseID=581168
[10]
, <italic>JESD229 Wide I/O Single Data Rate (Wide/IO SDR)</italic>, 2011.
[11]
, <italic>JESD229-2 Wide I/O 2 (WideIO2)</italic>, 2014.
[12]
J. Reinders, “Knights corner: Your path to knights landing,” <day>17</day>, 2014.
[13]
JEDEC, <italic>JESD235 High Bandwidth Memory (HBM) DRAM</italic>, 2013.
[14]
Hybrid Memory Cube Consortium, <italic>HMC Specification 1.0</italic>, Jan. 2013.
[15]
Hybrid Memory Cube Consortium, <italic>HMC Specification 1.1</italic>, Feb. 2014.
[16]
N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Staged reads: Mitigating the impact of DRAM writes on DRAM reads,” in Proc. 8th Int. Symp. High-Perform. Comput., 2012, pp. 1–12.
[17]
J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-aware intelligent DRAM refresh,” in Proc. Annu. Int. Symp. Comput. Archit., 2012, pp. 1–12.
[18]
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, “Tiered-latency DRAM: A low latency and low cost DRAM architecture,” in Proc. 19th Int. Symp. High-Perform. Comput., 2013, pp. 615–626.
[19]
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, “RowClone: Fast and efficient in-DRAM copy and initialization of bulk data,” in Proc. 46th Annu. IEEE/ACM Int. Symp. Microarchitect., 2013, pp. 185–197.
[20]
T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie, “Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation,” in Proc. 41st Annu. Int. Symp. Comput. Archit., 2014, pp. 349–360.
[21]
S. O, Y. H. Son, N. S. Kim, and J. H. Ahn, “Row-buffer decoupling: A case for low-latency DRAM microarchitecture,” in Proc. 41st Annu. Int. Symp. Comput. Archit., 2014, pp. 337–348.
[22]
K. Chang, D. Lee, Z. Chishti, C. Wilkerson, A. Alameldeen, Y. Kim, and O. Mutlu, “Improving DRAM performance by parallelizing refreshes with accesses,” in Proc. th Int. Symp. High-Perform. Comput., 2014, pp. 356–367.
[23]
D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu, “Adaptive-latency DRAM: Optimizing DRAM timing for the common-case,” in Proc. 21st Int. Symp. High-Perform. Comput., 2015, pp. 489–501.
[24]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A cycle accurate memory system simulator,” IEEE Comput. Archit. Lett., vol. Volume 10, no. Issue 1, pp. 16–19, 2011.
[25]
N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. H. Pugsley, A. N. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, “USIMM: The utah simulated memory module,” Univ. Utah, Salt Lake City, UT, USA, Tech. Rep. UUCS-12-002, 2012.
[26]
M. K. Jeong, D. H. Yoon, and M. Erez. (2012). DrSim: A platform for flexible DRAM system research {Online}. Available: http://lph.ece.utexas.edu/public/DrSim
[27]
M. Poremba and Y. Xie, “NVMain: An architectural-level main memory simulator for emerging non-volatile memories,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2012, pp. 392–397.
[28]
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2009, pp. 163–174.
[29]
J. H. Ahn, S. Li, O. Seongil, and N. Jouppi, “McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2013, pp. 74–85.
[30]
A. Hansson, N. Agarwal, A. Kolli, T. Wenisch, and A. Udipi, “Simulating DRAM controllers for future system architecture exploration,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2014, pp. 163–174.
[31]
K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, and K. Goossens. (2012). DRAMPower: Open-source DRAM power & energy estimation tool {Online}. Available: http://www.drampower.info
[32]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The Gem5 simulator,” SIGARCH Comput. Archit. News, vol. Volume 39, no. Issue 2, pp. 1–7, 2011.
[33]
{Online}. Available: https://github.com/CMU-SAFARI/ramulator
[34]
Micron, “DDR3 SDRAM verilog model,” 2012, https://www.micron.com/parts/dram/ddr3-sdram/mt41k512m4hx-15e
[35]
S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens, “Memory access scheduling,” in Proc. 27th Annu. Int. Symp. Comput. Archit., 2000, pp. 128–138.
[36]
Hynix, <italic>GDDR5 SGRAM H5GQ1H24AFR</italic>, 2009.
[37]
U. Kang, H. soo Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi, “Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling,” in Proc. Memory Forum (Co-located with ISCA), 2014.
[38]
O. Mutlu, “Memory scaling: A systems architecture perspective,” in Proc. 5th IEEE Int. Memory Workshop, 2013, pp. 21–25.

Cited By

View all
  • (2024)AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673075(1001-1011)Online publication date: 12-Aug-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • (2024)Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloadsACM Transactions on Architecture and Code Optimization10.1145/365920721:3(1-23)Online publication date: 20-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Computer Architecture Letters
IEEE Computer Architecture Letters  Volume 15, Issue 1
January 2016
65 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 January 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673075(1001-1011)Online publication date: 12-Aug-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • (2024)Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloadsACM Transactions on Architecture and Code Optimization10.1145/365920721:3(1-23)Online publication date: 20-Apr-2024
  • (2024)Trimma: Trimming Metadata Storage and Latency for Hybrid Memory SystemsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689612(108-120)Online publication date: 14-Oct-2024
  • (2024)CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding LayersProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656595(338-351)Online publication date: 30-May-2024
  • (2024)A Real-time Execution System of Multimodal Transformer through PIM-GPU CollaborationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657336(1-6)Online publication date: 23-Jun-2024
  • (2024)Geneva: A Dynamic Confluence of Speculative Execution and In-Order Commitment WindowsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655924(1-6)Online publication date: 23-Jun-2024
  • (2024)AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640422(103-119)Online publication date: 27-Apr-2024
  • (2024)VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision TasksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346689135:12(2405-2422)Online publication date: 1-Dec-2024
  • (2024)HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339484135:7(1122-1138)Online publication date: 30-Apr-2024
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media