Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1669112.1669183acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Architecting a chunk-based memory race recorder in modern CMPs

Published: 12 December 2009 Publication History

Abstract

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay.
This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.

References

[1]
O. Aciicmez and J.-P. Seifert. Cheap hardware parallelism implies cheap security. In Workshop on Fault Diagnosis and Tolerance in Cryptography, 2007.
[2]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), July 1970.
[3]
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the International Symposium on Computer Architecture, 2006.
[4]
Y. Chen and T. Mattson. Model-Based Computing Benchmark Suite 2.0. Intel Technical Report, Intel Corporation, 2009.
[5]
T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.
[6]
D. Hower and M. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In Proceedings of the International Symposium on Computer Architecture, 2008.
[7]
A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. A pin-based on-the-fly multi-core cache simulator. In Workshop on Modeling, Benchmarking and Simulation, 2008.
[8]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978.
[9]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the Conference on Programming Language Design and Implementation, 2005.
[10]
P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the International Symposium on Computer Architecture, 2008.
[11]
P. Montesinos, M. Hicks, S. King, and J. Torrellas. Capo: A software-hardware interface for practical deterministic multiprocessor replay. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009.
[12]
S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.
[13]
S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the International Symposium on Computer Architecture, 2005.
[14]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009.
[15]
M. Prvulovic. Cord: Cost-effective (and nearly overhead-free) order recording and data race detection. In Proceedings of the International Symposium on High-Performance Computer Architecture, 2006.
[16]
M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions for high performance computers. IEEE Trans. Comput., 46(12):1378--1381, 1997.
[17]
M. Raynal and M. Singhal. Logical time: Capturing causality in distributed systems. IEEE Computer, 29(2):49--56, 1996.
[18]
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing signatures for transactional memory. In Proceedings of the International Symposium on Microarchitecture, 2007.
[19]
J. Slye and E. Elnozahy. Supporting non-deterministic execution in fault-tolerant systems. In International Symposium on Fault-Tolerant Computing, June 1996.
[20]
J. Stone. Debugging concurrent processes: a case study. In Proceedings of the Conference on Programming Language Design and Implementation, 1988.
[21]
M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, University of Wisconsin-Madison, 2006.
[22]
M. Xu, R. Bodik, and M. Hill. A flight data recorder for enabling full-system multiprocessor deterministic replay. In Proceedings of the International Symposium on Computer Architecture, 2003.

Cited By

View all
  • (2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
  • (2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
  • (2018)The Processing-in-Memory Paradigm: Mechanisms to Enable AdoptionBeyond-CMOS Technologies for Next Generation Computer Design10.1007/978-3-319-90385-9_5(133-194)Online publication date: 21-Aug-2018
  • Show More Cited By

Index Terms

  1. Architecting a chunk-based memory race recorder in modern CMPs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
    December 2009
    601 pages
    ISBN:9781605587981
    DOI:10.1145/1669112
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. determinism
    2. deterministic replay
    3. memory race recorder

    Qualifiers

    • Research-article

    Conference

    Micro-42
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
    • (2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
    • (2018)The Processing-in-Memory Paradigm: Mechanisms to Enable AdoptionBeyond-CMOS Technologies for Next Generation Computer Design10.1007/978-3-319-90385-9_5(133-194)Online publication date: 21-Aug-2018
    • (2017)Processor-Oblivious Record and ReplayACM SIGPLAN Notices10.1145/3155284.301876452:8(145-161)Online publication date: 26-Jan-2017
    • (2017)Processor-Oblivious Record and ReplayProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018764(145-161)Online publication date: 26-Jan-2017
    • (2016)SReplayProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926264(1-13)Online publication date: 1-Jun-2016
    • (2015)Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual MachineProceedings of the Principles and Practices of Programming on The Java Platform10.1145/2807426.2807434(90-101)Online publication date: 8-Sep-2015
    • (2015)Deterministic ReplayACM Computing Surveys10.1145/279007748:2(1-47)Online publication date: 24-Sep-2015
    • (2014)Replay debuggingProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665737(445-456)Online publication date: 14-Jun-2014
    • (2014)PacifierProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665736(433-444)Online publication date: 14-Jun-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media