Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1736020.1736026acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

A real system evaluation of hardware atomicity for software speculation

Published: 13 March 2010 Publication History

Abstract

In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially enables additional optimization opportunities beyond that achievable by a high-performance dynamic optimizer, which already implements superblocks.
We show that atomic regions can suffer from severe performance penalties if misspeculations are left uncontrolled, but that a simple software control mechanism is sufficient to reign in all detrimental side-effects. We evaluate using full reference runs of the SPEC CPU2000 integer benchmarks and find that atomic regions enable up to a 9% (3% on average) improvement beyond the performance of a tuned product.
These performance improvements are achieved without any negative side effects. Performance side effects such as code bloat are absent with atomic regions; in fact, static code size is reduced. The hardware necessary is synergistic with other needs and was already available on the commercial product used in our evaluation. Finally, the software complexity is minimal as a single developer was able to incorporate atomic regions into a sophisticated 300,000 line code base in three months, despite never having seen the translator source code beforehand.

References

[1]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the SIGPLAN 2000 Conference on Programming Language Design and Implementation, pages 1--12, 2000.
[2]
C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th International Symposium on Computer Architecture, pages 233--244, 2009.
[3]
R. A. Bringmann, S. A. Mahlke, R. E. Hank, J. C. Gyllenhaal, and W.-m.W. Hwu. Speculative execution exception recovery using writeback suppression. In Proceedings of the 26th International Symposium on Microarchitecture, pages 214--223, 1993.
[4]
B. Cmelik and D. Keppel. Shade: A fast instruction-set simulator for execution profiling. ACM SIGMETRICS Performance Evaluation Review, 22(1):128--137, May 1994.
[5]
J. C. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Reallife Challenges. In Proceedings of the International Symposium on Code Generation and Optimization, pages 15--24, 2003.
[6]
J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, 1981.
[7]
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. The Intel Pentium M Processor: Microarchitecture and Performance. Intel Technology Journal, 7(2):21--36, 2003.
[8]
S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 195, 1998.
[9]
W. M. Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing, 7(1):229--248, Mar 1993.
[10]
A. Klaiber. The Technology Behind Crusoe Processors. Transmeta Whitepaper, Jan. 2000.
[11]
J. R. Larus and R. Rajwar. Transactional Memory. Morgan and Claypool, Dec. 2006.
[12]
S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W.-M. W. Hwu, B. R. Rau, and M. S. Schlansker. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Trans. Comput. Syst., 11(4):376--408, 1993.
[13]
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In In Proceedings of the 25th International Symposium on Microarchitecture, pages 45--54, 1992.
[14]
N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware atomicity for reliable software speculation. In Proceedings of the 34th International Symposium on Computer Architecture, pages 174--185, 2007.
[15]
S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers, 50(6):590--608, 2001.
[16]
R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th International Symposium on Microarchitecture, pages 294--305, 2001.
[17]
G. Rozas. Memory management methods and systems that support cache consistency. United States Patent 7,376,798, May 2008.
[18]
G. Rozas, A. Klaiber, D. Dunn, P. Serris, and L. Shah. Supporting speculative modification in a data cache. United States Patent 7,225,299, May 2007.
[19]
M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259, 1992.
[20]
C. Zilles and N. Neelakantam. Reactive Techniques for Controlling Software Speculation. In Proceedings of the International Symposium on Code Generation and Optimization, pages 305--316, 2005.

Cited By

View all
  • (2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
  • (2019)NoMap: Speeding-Up JavaScript Using Hardware Transactional Memory2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00054(412-425)Online publication date: Feb-2019
  • (2017)HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975290(185-194)Online publication date: Apr-2017
  • Show More Cited By

Index Terms

  1. A real system evaluation of hardware atomicity for software speculation

      Recommendations

      Reviews

      Wolfgang Schreiner

      New features in processor design strive to provide performance improvements, without unduly increasing hardware complexity, in a way that is effectively exploitable by compilers. One such feature is hardware atomicity, where a region of code can be marked as "atomic," such that its effect can eventually be either committed or rolled back. Consequently, the compiler may speculate when generating code for multiple possible execution paths: it may guess the most likely path, translate conditional jumps out of this path into assertions stating that the jumps are not taken, optimize the code along this path assuming that the assertions hold, and tag the result as atomic. If the guess is right, the optimization pays off; if the guess is wrong-that is, some assertion fails-a costly rollback has to be performed. The authors incorporate the idea into the Transmeta Efficeon, a processor that uses code morphing to translate x86-instructions into its internal instruction set, based on the very long instruction word (VLIW) principle. The processor's code-morphing software is modified to generate atomic regions that can be supported by the processor's capabilities for memory checkpointing; thus, the performance of the SPEC CPU2000 benchmarks could be improved by an average of three percent (and as much as nine percent). The improvements depend heavily on carefully monitoring misspeculations at runtime-in order to readjust misbehaving assertions-and on the compile-time optimization of redundant assertions. The results are very encouraging and may well find their way, in the future, into mainstream processor designs. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
      March 2010
      422 pages
      ISBN:9781605588391
      DOI:10.1145/1736020
      • General Chair:
      • James C. Hoe,
      • Program Chair:
      • Vikram S. Adve
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
        ASPLOS '10
        March 2010
        399 pages
        ISSN:0163-5964
        DOI:10.1145/1735970
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 45, Issue 3
        ASPLOS '10
        March 2010
        399 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1735971
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 March 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. atomicity
      2. checkpoint
      3. dynamic translation
      4. optimization
      5. speculation

      Qualifiers

      • Research-article

      Conference

      ASPLOS '10

      Acceptance Rates

      ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 02 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
      • (2019)NoMap: Speeding-Up JavaScript Using Hardware Transactional Memory2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00054(412-425)Online publication date: Feb-2019
      • (2017)HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975290(185-194)Online publication date: Apr-2017
      • (2016)PowerChopACM SIGARCH Computer Architecture News10.1145/3007787.300115244:3(140-152)Online publication date: 18-Jun-2016
      • (2016)Assisting Static Compiler Vectorization with a Speculative Dynamic Vectorizer in an HW/SW Codesigned EnvironmentACM Transactions on Computer Systems10.1145/280769433:4(1-33)Online publication date: 4-Jan-2016
      • (2016)PowerChopProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.22(140-152)Online publication date: 18-Jun-2016
      • (2016)Quantitative characterization of the software layer of a HW/SW co-designed processor2016 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2016.7581274(1-10)Online publication date: Sep-2016
      • (2014)Speculative hardware/software co-designed floating-point multiply-add fusionACM SIGARCH Computer Architecture News10.1145/2654822.254197842:1(623-638)Online publication date: 24-Feb-2014
      • (2014)Speculative hardware/software co-designed floating-point multiply-add fusionACM SIGPLAN Notices10.1145/2644865.254197849:4(623-638)Online publication date: 24-Feb-2014
      • (2014)Efficient Power Gating of SIMD Accelerators Through Dynamic Selective Devectorization in an HW/SW Codesigned EnvironmentACM Transactions on Architecture and Code Optimization10.1145/262968111:3(1-23)Online publication date: 31-Jul-2014
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media