Abstract
To meet the increasing demands for very large memory capacities, bandwidth and energy efficiency, researchers are exploring the use of heterogeneous memory systems that combine faster 3D-DRAMs, DDRx DRAM and non-volatile memories (NVMs). In this paper we evaluate prefetching in a flat-addressable heterogeneous memory comprising High Bandwidth Memory (HBM) and phase change memory (PCM). We find that large prefetch buffers (64 MB) can outperform smaller buffer sizes (2 MB), however it is not feasible to place such large buffers on the processor die. Hence, in this paper we evaluate an HBM-resident prefetch buffer that provides larger capacity and takes advantage of HBM’s higher memory bandwidth. We also present new prefetching policies that accommodate the differences in data path as compared to traditional prefetchers. We show that, reserving a small fraction (1/16th) of HBM memory to host a hardware prefetch buffer can improve IPC for a set of SPEC CPU2006 and HPC benchmarks by an average of 34% and a maximum of 98% over a baseline system with no-prefetching. Prefetching reduces total PCM traffic by 10% on average, which results in more memory traffic to the faster HBM, providing overall performance improvement. We found that such prfetching outperforms CAMEO and Alloy cache schemes on average by 60% and 10%, respectively.
M. Meswani—The author did the work while employed at AMD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
HANA Memory Usage. http://saphanatutorial.com/sap-hana-memory-usage-explained/
Mutlu, O.: Memory scaling: a systems architecture perspective. In: International Memory Workshop. IEEE (2013)
Meswani, M.R., et al.: Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories. In: HPCA, pp. 126–136. IEEE (2015)
Qureshi, M.K., et al.: Phase change memory: from devices to systems. Synth. Lect. Comput. Archit. 6(4), 1–134 (2011)
Qureshi, M.K., et al.: Scalable high performance main memory system using phase-change memory technology. ACM SIGARCH Comput. Archit. News 37(3), 24–33 (2009)
Su, C., et al.: HPMC: an energy-aware management system of multi-level memory architectures. In: MEMSYS, pp. 167–178. ACM (2015)
Micron NVDIMM. https://www.micron.com/products/dram-modules/nvdimm#/
3D-XPoint. http://www.intel.com/newsroom/kits/nvm/3dxpoint/pdfs/Launch_Keynote.pdf
Sim, J., et al.: Transparent hardware management of stacked dram as part of memory. In: MICRO, pp. 13–24. IEEE (2014).
Oskin, M., Loh, G.H.: A software-managed approach to die-stacked DRAM. In: PACT, pp. 188–200. IEEE (2015)
Chou, C., et al.: CAMEO: a two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In: MICRO, pp. 1–12. IEEE Computer Society (2014)
3D-ICs. https://www.jedec.org/category/technology-focus-area/3d-ics
Numonyx: PCM. http://www.pdl.cmu.edu/SDI/2009/slides/Numonyx.pdf
Qureshi, M.K., Loh, G.H.: Fundamental latency trade-off in architecting DRAM caches: outperforming impractical SRAM-Tags with a simple and practical design. In: MICRO, pp. 235–246. IEEE Computer Society (2012)
Jevdjic, D., et al.: Unison cache: a scalable and effective die-stacked dram cache. In: MICRO, pp. 25–37. IEEE (2014)
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: ISCA, pp. 364–373. IEEE (1990)
Beckmann, N., Sanchez, D.: Meeting midway: improving CMP performance with memory-side prefetching. In: PACT, pp. 289–298. IEEE (2013)
Kandiraju, G.B., Sivasubramaniam, A.: Going the distance for TLB prefetching: an application-driven study. In: IEEE Computer Society, vol. 30 (2002)
Islam, M., et al.: Prefetching as a potentially effective technique for hybrid memory optimization. In: MEMSYS. ACM (2016)
Hybrid Memory Cube Consortium. http://www.hybridmemorycube.org/
Kim, J., Kim, Y.: HBM: memory solution for bandwidth-hungry processors. In: Hot Chips: A Symposium on High Performance Chips (2014)
Yoon, H., et al.: Efficient data mapping and buffering techniques for multilevel cell phase-change memories. TACO 11(4), 40 (2015). ACM
Wang, H., et al.: Duang: fast and lightweight page migration in asymmetric memory systems. In: HPCA, pp. 481–493. IEEE (2016)
Fu, J.W., et al.: Stride directed prefetching in scalar processors. ACM SIGMICRO Newslett. 23(1–2), 102–110 (1992)
Joseph, D., Grunwald, D.: Prefetching using Markov predictors. In: ACM SIGARCH Computer Architecture News, vol. 25, pp. 252–263. ACM (1997)
Ahn, J., et al.: Low-power hybrid memory cubes with link power management and two-level prefetching. Trans. VLSI Syst. 24(2), 453–464 (2016). IEEE
Yoon, H., et al.: Row buffer locality aware caching policies for hybrid memories. In: International Conference on Computer Design, pp. 337–344. IEEE (2012).
Nesbit, K.J., Smith, J.E.: Data cache prefetching using a global history buffer. In: IEE Proceedings Software, p. 96. IEEE (2004)
Jiang, X., et al.: Chop: adaptive filter-based dram caching for CMP server platforms. In: HPCA, pp. 1–12. IEEE (2010)
Kim, Y., et al.: Ramulator: a fast and extensible dram simulator. In: Computer Architecture Letters (2015)
Nair, P.J., et al.: Reducing read latency of phase change memory via early read and turbo read. In: HPCA, pp. 309–319. IEEE (2015).
Intel PinPlay. https://software.intel.com/en-us/articles/program-recordreplay-toolkit
Shelor, C.F., Kavi, K.M.: Moola: multicore cache simulator. In: International Conference on Computers and Their Applications (2015)
SPEC CPU 2006. https://www.spec.org/cpu2006/
Proxy-Apps for Neutronics. https://cesar.mcs.anl.gov/content/software/neutronics
Lawrence Livermore National Laboratory: Hydrodynamics challenge problem. In: Technical report LLNL-TR-490254
Mohd-Yusof, J., et al.: Co-design for molecular dynamics: an exascale proxy application (2013)
Heroux, M., Hammond, S.: MiniFE: finite element solver. https://portal.nersc.gov/project/CAL/designforward.htm#MiniFE
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Islam, M., Kavi, K.M., Meswani, M., Banerjee, S., Jayasena, N. (2017). HBM-Resident Prefetching for Heterogeneous Memory System. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-54999-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54998-9
Online ISBN: 978-3-319-54999-6
eBook Packages: Computer ScienceComputer Science (R0)