Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2523721.2523740acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Writeback-aware bandwidth partitioning for multi-core systems with PCM

Published: 07 October 2013 Publication History

Abstract

Phase-Change Memory (PCM) has emerged as a promising low-power candidate to replace DRAM in main memory. Hybrid memory architecture comprised of a large PCM and a small DRAM is a popular solution to mitigate undesirable characteristics of PCM writes. Because PCM writes are much slower than reads, writebacks from the last-level cache consume a large portion of memory bandwidth, and thus, impact performance. Effectively utilizing shared resources, such as the last-level cache and the memory bandwidth, is crucial to achieving high performance for multi-core systems. Although existing memory bandwidth allocation schemes improve system performance, no current approach uses writeback information to partition bandwidth for hybrid memory. We use a writeback-aware analytic model to derive the allocation strategy for bandwidth partitioning of phase-change memory. From the derivation of the model, Writeback-aware Bandwidth Partitioning (WBP) is proposed as a new runtime mechanism to partition PCM service cycles among applications. WBP uses a partitioning weight to indicate the importance of writebacks (in addition to LLC misses) to bandwidth allocation. A companion Dynamic Weight Adjustment (DWA) scheme dynamically selects the partitioning weight to maximize system performance. Simulation results show that WBP and DWA improve performance by 24.9% (weighted speedup) over bandwidth partitioning schemes that do not take writebacks into consideration in a 8-core system.

References

[1]
J. Kong, J. Choi, L. Choi, and S. W. Chung, "Low-cost application-aware DVFS for multi-core architecture," in ICCIT '08, 2008.
[2]
Kwang-Jin Lee et al., "A 90 nm 1.8 V 512 Mb diode-switch PRAM with 266 MB/s read throughput," Solid-State Circuits, IEEE Journal of, vol. 43, 2008.
[3]
Kang et al, "A 0.1 μm 1.8V 256Mb 66MHz Synchronous Burst PRAM," in ISSCC '06, 2006.
[4]
F. Pellizzer et al., "A 90nm phase change memory technology for stand-alone non-volatile memory applications," in Symp. on VLSI Tech., 2006.
[5]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang, "A durable and energy efficient main memory using phase change memory technology," in ISCA '09, 2009.
[6]
Qureshi, Moinuddin K. et al., "Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling," in MICRO, 2009.
[7]
S. Cho and H. Lee, "Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance," in MICRO, 2009.
[8]
A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mosse, "Increasing PCM main memory lifetime," in DATE '10, 2010.
[9]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, "Architecting phase change memory as a scalable DRAM alternative," in ISCA '09, 2009.
[10]
A. P. Ferreira, B. Childers, R. Melhem, D. Mosse, and M. Yousif, "Using PCM in next-generation embedded space applications," in RTAS, 2010.
[11]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable high performance main memory system using phase-change memory technology," in ISCA '09, 2009.
[12]
F. Liu, X. Jiang, and Y. Solihin, "Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance," in HPCA, 2010.
[13]
S. Chen, P. B. Gibbons, and S. Nath, "Rethinking database algorithms for phase change memory," in CIDR '11, 2011.
[14]
M. K. Qureshi, M. Franceschini, and L. A. Lastras-Monta\ no, "Improving read performance of phase change memories via write cancellation and write pausing," in HPCA, 2010, pp. 1--11.
[15]
A. S. Tanenbaum, Computer Networks, 3rd Edition.\hskip 1em plus 0.5em minus 0.4em\relax Prentice Hall, 1996.
[16]
M. K. Qureshi and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in MICRO 39, 2006.
[17]
M. Zhou, Y. Du, B. Childers, R. Melhem, and D. Mossé, "Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 53:1--53:21, Jan. 2012.
[18]
P. G. Emma, "Understanding some simple processor-performance limits," IBM J. Res. Dev., vol. 41, no. 3, pp. 215--232, May 1997.
[19]
Y. Luo, O. M. Lubeck, H. Wasserman, F. Bassetti, and K. W. Cameron, "Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model," in Proc. of the 1st Intl. workshop on Software and performance, 1998.
[20]
Z. Zhang, Z. Zhu, and X. Zhang, "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," in MICRO 33, 2000.
[21]
J. G. K. Luo and M. Franklin, "Balancing throughput and fairness in SMT processors," in ISPASS '01, 2001, pp. 164 -- 171.
[22]
W. Zhang and T. Li, "Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures," in PACT '09, 2009.
[23]
G. E. Suh, L. Rudolph, and S. Devadas, "Dynamic partitioning of shared cache memory," Journal of Supercomputing, 2002.
[24]
M. Moreto, F. J. Cazorla, A. Ramirez, and M. Valero, "Transactions on high-performance embedded architectures and compilers III."\hskip 1em plus 0.5em minus 0.4em\relax Berlin, Heidelberg: Springer-Verlag, 2011, ch. Dynamic cache partitioning based on the MLP of cache misses, pp. 3--23.
[25]
J. D. Owens, P. Mattson, U. J. Kapasi, W. J. Dally, and S. Rixner, "Memory access scheduling," ISCA, vol. 0, p. 128, 2000.
[26]
K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith, "Fair queuing memory systems," in MICRO 39, 2006, pp. 208--222.
[27]
E. Ipek, O. Mutlu, J. F. Martınez, and R. Caruana, "Self-optimizing memory controllers: A reinforcement learning approach," in ISCA '08.
[28]
R. Wang, L. Chen, and T. Pinkston, "An analytical performance model for partitioning off-chip memory bandwidth," in IPDPS, 2013.
[29]
D. Kaseridis, J. Stuecheli, J. Chen, and L. K. John, "A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems," in HPCA'10, 2010, pp. 1--11.
[30]
E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt, "Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems," in ASPLOS, ser. ASPLOS XV, 2010.

Cited By

View all
  • (2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
  • (2015)Real-Time In-Memory Checkpointing for Future Hybrid Memory SystemsProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751212(263-272)Online publication date: 8-Jun-2015
  • (2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
  • Show More Cited By

Index Terms

  1. Writeback-aware bandwidth partitioning for multi-core systems with PCM

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
      October 2013
      422 pages
      ISBN:9781479910212

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 07 October 2013

      Check for updates

      Author Tags

      1. analytic model
      2. memory bandwidth
      3. partitioning
      4. phase change memory

      Qualifiers

      • Research-article

      Acceptance Rates

      PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;
      Overall Acceptance Rate 121 of 471 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
      • (2015)Real-Time In-Memory Checkpointing for Future Hybrid Memory SystemsProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751212(263-272)Online publication date: 8-Jun-2015
      • (2015)A Comprehensive Analytical Performance Model of DRAM CachesProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688044(157-168)Online publication date: 28-Jan-2015
      • (2014)ANATOMYACM SIGMETRICS Performance Evaluation Review10.1145/2637364.259199542:1(505-517)Online publication date: 16-Jun-2014
      • (2014)ANATOMYThe 2014 ACM international conference on Measurement and modeling of computer systems10.1145/2591971.2591995(505-517)Online publication date: 16-Jun-2014

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media