research-article

Architecting a chunk-based memory race recorder in modern CMPs

Authors:

Cristiano Pereira,

Ali-Reza Adl-TabatabaiAuthors Info & Claims

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 576 - 585

https://doi.org/10.1145/1669112.1669183

Published: 12 December 2009 Publication History

Abstract

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay.

This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.

References

[1]

O. Aciicmez and J.-P. Seifert. Cheap hardware parallelism implies cheap security. In Workshop on Fault Diagnosis and Tolerance in Cryptography, 2007.

Digital Library

[2]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), July 1970.

Digital Library

[3]

L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the International Symposium on Computer Architecture, 2006.

Digital Library

[4]

Y. Chen and T. Mattson. Model-Based Computing Benchmark Suite 2.0. Intel Technical Report, Intel Corporation, 2009.

[5]

T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.

Digital Library

[6]

D. Hower and M. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In Proceedings of the International Symposium on Computer Architecture, 2008.

Digital Library

[7]

A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. A pin-based on-the-fly multi-core cache simulator. In Workshop on Modeling, Benchmarking and Simulation, 2008.

[8]

L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978.

Digital Library

[9]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the Conference on Programming Language Design and Implementation, 2005.

Digital Library

[10]

P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the International Symposium on Computer Architecture, 2008.

Digital Library

[11]

P. Montesinos, M. Hicks, S. King, and J. Torrellas. Capo: A software-hardware interface for practical deterministic multiprocessor replay. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009.

Digital Library

[12]

S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.

Digital Library

[13]

S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the International Symposium on Computer Architecture, 2005.

Digital Library

[14]

M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009.

Digital Library

[15]

M. Prvulovic. Cord: Cost-effective (and nearly overhead-free) order recording and data race detection. In Proceedings of the International Symposium on High-Performance Computer Architecture, 2006.

[16]

M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions for high performance computers. IEEE Trans. Comput., 46(12):1378--1381, 1997.

Digital Library

[17]

M. Raynal and M. Singhal. Logical time: Capturing causality in distributed systems. IEEE Computer, 29(2):49--56, 1996.

Digital Library

[18]

D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing signatures for transactional memory. In Proceedings of the International Symposium on Microarchitecture, 2007.

Digital Library

[19]

J. Slye and E. Elnozahy. Supporting non-deterministic execution in fault-tolerant systems. In International Symposium on Fault-Tolerant Computing, June 1996.

Digital Library

[20]

J. Stone. Debugging concurrent processes: a case study. In Proceedings of the Conference on Programming Language Design and Implementation, 1988.

Digital Library

[21]

M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, University of Wisconsin-Madison, 2006.

Digital Library

[22]

M. Xu, R. Bodik, and M. Hill. A flight data recorder for enabling full-system multiprocessor deterministic replay. In Proceedings of the International Symposium on Computer Architecture, 2003.

Digital Library

Cited By

Utterback RAgrawal KLee IKulkarni M(2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3365659
Ren STan LLi CXiao ZSong W(2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1109/TC.2017.2727492
Ghose SHsieh KBoroumand AAusavarungnirun RMutlu O(2018)The Processing-in-Memory Paradigm: Mechanisms to Enable AdoptionBeyond-CMOS Technologies for Next Generation Computer Design10.1007/978-3-319-90385-9_5(133-194)Online publication date: 21-Aug-2018
https://doi.org/10.1007/978-3-319-90385-9_5
Show More Cited By

Index Terms

Architecting a chunk-based memory race recorder in modern CMPs
1. Computer systems organization
  1. Architectures

Recommendations

Filtering directory lookups in CMPs

Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed ...
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Shared memory multiprocessors are difficult to program because of the non-deterministic ways in which the memory operations from different threads interleave. To address this issue, many hardware-based memory race recorders have been proposed that ...
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

December 2009

601 pages

ISBN:9781605587981

DOI:10.1145/1669112

General Chairs:
David Albonesi
Cornell
,
Margaret Martonosi
Princeton
,
Program Chairs:
David August
Princeton/Parakinetics
,
José Martínez
Cornell

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS TG u-Arch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

Micro-42

Sponsor:

SIGMICRO

Micro-42: The 42nd Annual IEEE/ACM International Symposium on Microarchitecture

December 12 - 16, 2009

New York, New York

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
431
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Utterback RAgrawal KLee IKulkarni M(2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3365659
Ren STan LLi CXiao ZSong W(2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1109/TC.2017.2727492
Ghose SHsieh KBoroumand AAusavarungnirun RMutlu O(2018)The Processing-in-Memory Paradigm: Mechanisms to Enable AdoptionBeyond-CMOS Technologies for Next Generation Computer Design10.1007/978-3-319-90385-9_5(133-194)Online publication date: 21-Aug-2018
https://doi.org/10.1007/978-3-319-90385-9_5
Utterback RAgrawal KLee IKulkarni M(2017)Processor-Oblivious Record and ReplayACM SIGPLAN Notices10.1145/3155284.301876452:8(145-161)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018764
Utterback RAgrawal KLee IKulkarni MSarkar VRauchwerger L(2017)Processor-Oblivious Record and ReplayProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018764(145-161)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3018743.3018764
Qian XSen KHargrove PIancu C(2016)SReplayProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926264(1-13)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926264
Bond MKulkarni MCao MSalmi MHuang JStansifer RKrall A(2015)Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual MachineProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807434(90-101)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1145/2807426.2807434
Chen YZhang SGuo QLi LWu RChen T(2015)Deterministic ReplayACM Computing Surveys10.1145/279007748:2(1-47)Online publication date: 24-Sep-2015
https://dl.acm.org/doi/10.1145/2790077
Honarmand NTorrellas JYew PZhai AKeckler S(2014)Replay debuggingProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665737(445-456)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665737
Qian XSahelices BQian DYew PZhai AKeckler S(2014)PacifierProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665736(433-444)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665736
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten