Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hybrid binary rewriting for memory access instrumentation

Published: 09 March 2011 Publication History

Abstract

Memory access instrumentation is fundamental to many applications such as software transactional memory systems, profiling tools and race detectors. We examine the problem of efficiently instrumenting memory accesses in x86 machine code to support software transactional memory and profiling. We aim to automatically instrument all shared memory accesses in critical sections of x86 binaries, while achieving overhead close to that obtained when performing manual instrumentation at the source code level.
The two primary options in building such an instrumentation system are static and dynamic binary rewriting: the former instruments binaries at link time before execution, while the latter binary rewriting instruments binaries at runtime. Static binary rewriting offers extremely low overhead but is hampered by the limits of static analysis. Dynamic binary rewriting is able to use runtime information but typically incurs higher overhead. This paper proposes an alternative: hybrid binary rewriting. Hybrid binary rewriting is built around the idea of a persistent instrumentation cache (PIC) that is associated with a binary and contains instrumented code from it. It supports two execution modes when using instrumentation: active and passive modes. In the active execution mode, a dynamic binary rewriting engine (PIN) is used to intercept execution, and generate instrumentation into the PIC, which is an on-disk file. This execution mode can take full advantage of runtime information. Later, passive execution can be used where instrumented code is executed out of the PIC. This allows us to attain overheads similar to those incurred with static binary rewriting.
This instrumentation methodology enables a variety of static and dynamic techniques to be applied. For example, in passive mode, execution occurs directly from the original executable save for regions that require instrumentation. This has allowed us to build a low-overhead transactional memory profiler. We also demonstrate how we can use the combination of static and dynamic techniques to eliminate instrumentation for accesses to locations that are thread-private.

References

[1]
M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2009.
[2]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, 2000.
[3]
D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for windows. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2000.
[4]
D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 61--70, 2008.
[5]
C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35--46, 2008.
[6]
D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194--208, 2006.
[7]
A. Dragojevic, Y. Ni, and A.-R. Adl-Tabatabai. Optimizing transactions for captured memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 214--222, 2009.
[8]
A. Eustace and A. Srivastava. Atom: a flexible interface for building high performance program analysis tools. In Proceedings of the USENIX 1995 Technical Conference Proceedings, pages 303--314, 1995.
[9]
V. Gajinov, F. Zyulkyarov, O. S. Unsal, A. Cristal, E. Ayguade, T. Harris, and M. Valero. Quake™: parallelizing a complex sequential application using transactional memory. In Proceedings of the 23rd international conference on Supercomputing, pages 126--135, 2009.
[10]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, 2005.
[11]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 89--100, 2007.
[12]
M. Olszewski, J. Cutler, and J. G. Steffan. Judostm: A dynamic binary-rewriting approach to software transactional memory. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 365--375, 2007.
[13]
M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pages 22:1--22:14, 2010.
[14]
P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 173--184, 2009.
[15]
V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Proceedings of the International Symposium on Code Generation and Optimization, pages 74--88, 2007.
[16]
A. Roy, S. Hand, and T. Harris. Exploring the limits of disjoint access parallelism. In Proceedings of the First USENIX conference on Hot topics in parallelism, 2009.
[17]
B. Schwarz, S. Debray, G. Andrews, and M. Legendre. PLTO: A link-time optimizer for the Intel IA-32 architecture. In Proceedings of the 2001 Workshop on Binary Translation, 2001.
[18]
J. Seward and N. Nethercote. Using valgrind to detect undefined value errors with bit-precision. In Proceedings of the annual conference on USENIX Annual Technical Conference, 2005.
[19]
S. Sridhar, J. S. Shapiro, and P. P. Bungale. HDTrans: a low-overhead dynamic translator. SIGARCH Computer Architecture News, 35(1):135--140, 2007.
[20]
T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, 2009.
[21]
L. Van Put, D. Chanet, B. De Bus, B. De Sutter, and K. De Bosschere. DIABLO: a reliable, retargetable and extensible link-time rewriting framework. In Proceedings of the 2005 IEEE International Symposium On Signal Processing And Information Technology, pages 7--12, 2005.
[22]
C. von Praun, R. Bordawekar, and C. Cascaval. Modeling optimistic concurrency using quantitative dependence analysis. In Proceedings of the 13th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2008.
[23]
A. Welc, B. Saha, and A.-R. Adl-Tabatabai. Irrevocable transactions and their applications. In Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 285--296, 2008.

Cited By

View all

Index Terms

  1. Hybrid binary rewriting for memory access instrumentation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 7
    VEE '11
    July 2011
    231 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2007477
    Issue’s Table of Contents
    • cover image ACM Conferences
      VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
      March 2011
      250 pages
      ISBN:9781450306874
      DOI:10.1145/1952682
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 March 2011
    Published in SIGPLAN Volume 46, Issue 7

    Check for updates

    Author Tags

    1. binary rewriting
    2. transactional memory

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Informed Memory Access MonitoringPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_4(73-97)Online publication date: 19-Jun-2023
    • (2018)NUMAPROF, A NUMA Memory ProfilerEuro-Par 2018: Parallel Processing Workshops10.1007/978-3-030-10549-5_13(159-170)Online publication date: 27-Aug-2018
    • (2015)An Instruction-Set Randomization Using Length-Preserving PermutationProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0110.1109/Trustcom.2015.397(376-383)Online publication date: 20-Aug-2015
    • (2024)Towards Realistic Results for Instrumentation-Based Profilers for JIT-Compiled SystemsProceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3679007.3685058(82-89)Online publication date: 13-Sep-2024
    • (2023)Efficient Dynamic Binary Translation with Accumulative Persistent Code Caching2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00066(458-466)Online publication date: Jan-2023
    • (2014)Computation Integrity Measurement Based on Branch TransferProceedings of the 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2014.75(590-597)Online publication date: 24-Sep-2014
    • (2012)A lightweight hybrid hardware/software approach for object-relative memory profilingProceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software10.1109/ISPASS.2012.6189205(46-57)Online publication date: 1-Apr-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media