Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1993498.1993508acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

Published: 04 June 2011 Publication History

Abstract

Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the overall execution time, the JIT compiler is often decoupled and operates in a separate thread independent from the main simulation loop to reduce the overhead of JIT compilation. In this paper we present two innovative contributions. The first contribution is a generalized trace compilation approach that considers all frequently executed paths in a program for JIT compilation, as opposed to previous approaches where trace compilation is restricted to paths through loops. The second contribution reduces JIT compilation cost by compiling several hot traces in a concurrent task farm. Altogether we combine generalized light-weight tracing, large translation units, parallel JIT compilation and dynamic work scheduling to ensure timely and efficient processing of hot traces. We have evaluated our industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA against three benchmark suites (EEMBC, BioPerf and SPEC CPU2006) and demonstrate speedups of up to 2.08 on a standard quad-core Intel Xeon machine. Across short- and long-running benchmarks our scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled, parallel (or asynchronous) JIT compilation.

References

[1]
J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35: 97--113, June 2003.
[2]
D. Bader, Y. Li, T. Li, and V. Sachdeva. Bioperf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In Proceedings of the IEEE International Workload Characterization Symposium, IISWC'05, pages 163--173, 2005.
[3]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM.
[4]
M. Bebenita, M. Chang, G. Wagner, A. Gal, C. Wimmer, and M. Franz. Trace-based compilation in execution environments without interpreters. In Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java, PPPJ '10, pages 59--68, New York, NY, USA, 2010. ACM.
[5]
F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference, ATEC '05, page 41, Berkeley, CA, USA, 2005. USENIX Association.
[6]
I. Böhm, B. Franke, and N. Topham. Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In Proceedings of the International Conference on Embedded Computer Systems, IC-SAMOS'10, pages 1--10, 2010.
[7]
F. Brandner, A. Fellnhofer, A. Krall, and D. Riegler. Fast and accurate simulation using the llvm compiler framework. In Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO'09, pages 1--6, 2009.
[8]
S. Campanoni, G. Agosta, and S. C. Reghizzi. A parallel dynamic compiler for CIL bytecode. SIGPLAN Not., 43: 11--20, April 2008.
[9]
A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. Bharadwaj Yadavalli, and J. Yates. FX!32: a profile-directed binary translator. Micro, IEEE, 18 (2): 56--64, 1998.
[10]
B. Cmelik and D. Keppel. Shade: a fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '94, pages 128--137, New York, NY, USA, 1994. ACM.
[11]
A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 102--107, New York, NY, USA, 2010. ACM.
[12]
S. Farfeleder, A. Krall, and N. Horspool. Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures. J. Syst. Archit., 53: 501--510, August 2007.
[13]
A. Gal, C. W. Probst, and M. Franz. HotpathVM: an effective JIT compiler for resource-constrained devices. In Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 144--153, New York, NY, USA, 2006. ACM.
[14]
A. Gal, M. Bebenita, M. Chang, and M. Franz. Making the compilation "pipeline" explicit: Dynamic compilation using trace tree serialization. Technical Report 07-12, University of California, Irvine, 2007.
[15]
A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, pages 465--478, New York, NY, USA, 2009. ACM.
[16]
J. Ha, M. Haghighat, S. Cong, and K. McKinley. A concurrent trace-based just-in-time compiler for single-threaded JavaScript. In Proceedings of the Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, PESPMA'09, 2009.
[17]
M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 251--269, New York, NY, USA, 2004. ACM.
[18]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, September 2006.
[19]
D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In High Performance Embedded Architectures and Compilers, volume 5409 of Lecture Notes in Computer Science, pages 50--64. Springer Berlin Heidelberg, 2009.
[20]
C. J. Krintz, D. Grove, V. Sarkar, and B. Calder. Reducing the overhead of dynamic compilation. Software: Practice and Experience, 31 (8): 717--738, July 2001.
[21]
R. Lantz. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, MOBS'08, 2008.
[22]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '04, page 75, Washington, DC, USA, 2004. IEEE Computer Society.
[23]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM.
[24]
C. May. MIMIC: a fast System/370 simulator. In Papers of the Symposium on Interpreters and Interpretive Techniques, SIGPLAN '87, pages 1--13, New York, NY, USA, 1987. ACM.
[25]
J. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture, HPCA'10, pages 1--12, 2010.
[26]
Mozilla Foundation. Tamarin project, 17 November 2010. URL http://www.mozilla.org/projects/tamarin/.
[27]
Mozilla Foundation. Tracemonkey, 17 November 2010. URL https://wiki.mozilla.org/JavaScript:TraceMonkey.
[28]
A. Nohl, G. Braun, O. Schliebusch, R. Leupers, H. Meyr, and A. Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In Proceedings of the 39th annual Design Automation Conference, DAC '02, pages 22--27, New York, NY, USA, 2002. ACM.
[29]
Oracle Corporation. HotSpot VM, 17 November 2010. URL http://java.sun.com/performance/reference/whitepapers/.
[30]
W. Qin, J. D'Errico, and X. Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES
[31]
M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM.
[32]
M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM.
[33]
E. Stahl and M. Anand. A comparison of PowerVM and x86-based virtualization performance. Technical Report WP101574, IBM Techdocs White Papers, 2010.
[34]
T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for dynamic compilers. ACM Trans. Program. Lang. Syst., 28: 134--174, January 2006.
[35]
V. Tan. Asynchronous just-in-time compilation. United States Patent Application, WO/2007/055883, 2007.
[36]
The LLVM Compiler Infrastructure. Users of LLVM JIT compilation engine, 17 November 2010. URL http://llvm.org/Users.html.
[37]
The WebKit Open Source Project. SquirrelFish, 17 November 2010. URL http://trac.webkit.org/wiki/SquirrelFish.
[38]
N. Topham and D. Jones. High speed CPU simulation using JIT binary translation. In Proceedings of the Annual Workshop on Modelling, Benchmarking and Simulation, MOBS '07, 2007.
[39]
P. Unnikrishnan, M. Kandemir, and F. Li. Reducing dynamic compilation overhead by overlapping compilation and execution. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, ASP-DAC '06, pages 929--934, Piscataway, NJ, USA, 2006. IEEE Press.
[40]
K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM Mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev., 42: 71--76, January 2008.
[41]
J. Whaley. Partial method compilation using dynamic profile information. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '01, pages 166--179, New York, NY, USA, 2001. ACM.
[42]
B. Yee, D. Sehr, G. Dardyk, J. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox for portable, untrusted x86 native code. In 30th IEEE Symposium on Security and Privacy, pages 79--93, May 2009.
[43]
M. Zaleski, A. D. Brown, and K. Stoodley. YETI: a gradually extensible trace interpreter. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE '07, pages 83--93, New York, NY, USA, 2007. ACM.
[44]
C. Zheng and C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer, 33 (3): 47--52, Mar. 2000.
[45]
J. Zhu and D. D. Gajski. A retargetable, ultra-fast instruction set simulator. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '99, New York, NY, USA, 1999. ACM.

Cited By

View all
  • (2019)A retargetable system-level DBT hypervisorProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358850(505-520)Online publication date: 10-Jul-2019
  • (2019)Mitigating JIT compilation latency in virtual execution environmentsProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313818(101-107)Online publication date: 14-Apr-2019
  • (2019)Cross-ISA machine instrumentation using fast and scalable dynamic binary translationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313811(74-87)Online publication date: 14-Apr-2019
  • Show More Cited By

Index Terms

  1. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2011
    668 pages
    ISBN:9781450306638
    DOI:10.1145/1993498
    • General Chair:
    • Mary Hall,
    • Program Chair:
    • David Padua
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 6
      PLDI '11
      June 2011
      652 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1993316
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic binary translation
    2. dynamic work scheduling
    3. just-in-time compilation
    4. parallelization
    5. task farm

    Qualifiers

    • Research-article

    Conference

    PLDI '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A retargetable system-level DBT hypervisorProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358850(505-520)Online publication date: 10-Jul-2019
    • (2019)Mitigating JIT compilation latency in virtual execution environmentsProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313818(101-107)Online publication date: 14-Apr-2019
    • (2019)Cross-ISA machine instrumentation using fast and scalable dynamic binary translationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313811(74-87)Online publication date: 14-Apr-2019
    • (2019)An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral CompilationACM Transactions on Architecture and Code Optimization10.1145/329344915:4(1-23)Online publication date: 8-Jan-2019
    • (2019)Blaze-TasksACM Transactions on Architecture and Code Optimization10.1145/329344815:4(1-25)Online publication date: 8-Jan-2019
    • (2019)Decoupled Fused CacheACM Transactions on Architecture and Code Optimization10.1145/329344715:4(1-23)Online publication date: 8-Jan-2019
    • (2019)A System-Level Simulator for RRAM-Based Neuromorphic Computing ChipsACM Transactions on Architecture and Code Optimization10.1145/329105415:4(1-24)Online publication date: 8-Jan-2019
    • (2019)The Art of Getting Deep Neural Networks in ShapeACM Transactions on Architecture and Code Optimization10.1145/329105315:4(1-21)Online publication date: 8-Jan-2019
    • (2019)Predicting New Workload or CPU Performance by Analyzing Public DatasetsACM Transactions on Architecture and Code Optimization10.1145/328412715:4(1-21)Online publication date: 8-Jan-2019
    • (2018)Reusing the Optimized Code for JavaScript Ahead-of-Time CompilationACM Transactions on Architecture and Code Optimization10.1145/329105615:4(1-20)Online publication date: 19-Dec-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media