Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA.2018.00058acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Mobilizing the micro-ops: exploiting context sensitive decoding for security and energy efficiency

Published: 02 June 2018 Publication History

Abstract

Modern instruction set decoders feature translation of native instructions into internal micro-ops to simplify CPU design and improve instruction-level parallelism. However, this translation is static in most known instances. This work proposes context-sensitive decoding, a technique that enables customization of the micro-op translation at the microsecond or faster granularity, based on the current execution context and/or preset hardware events. While there are many potential applications, this work demonstrates its effectiveness with two use cases: 1) as a novel security defense to thwart instruction/data cache-based side-channel attacks, as demonstrated on commercial implementations of RSA and AES and 2) as a power management technique that performs selective devectorization to enable efficient unit-level power gating.
This architecture, first by allowing execution to transition between different translation modes rapidly, defends against a variety of attacks, completely obfuscating code-dependent cache access, only sacrificing 5% in steady-state performance - orders of magnitude less than prior art. By selectively disabling the vector units without disabling vector arithmetic, context-sensitive decoding reduces energy by 12.9% with minimal loss in performance. Both optimizations work with no significant changes to the pipeline or the external ISA.

References

[1]
Intel 64 and IA-32 Architectures Software Developer's Manual - Volume 3B, Intel Corporation, August 2011.
[2]
ARM Architecture Reference Manual, ARM Limited.
[3]
M. A. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean code: Achieving near-free online code transformations for warehouse scale computers," in Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. IEEE, 2014, pp. 558--570.
[4]
M. A. Laurenzano, Y. Zhang, J. Chen, L. Tang, and J. Mars, "Powerchop: Identifying and managing non-critical units in hybrid processor architectures," in Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
[5]
A. Bhattacharjee and M. Martonosi, "Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors," in Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009.
[6]
E. Witchel, J. Cates, and K. Asanović, Mondrian memory protection. ACM, 2002.
[7]
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, "Secure program execution via dynamic information flow tracking," in Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004.
[8]
J. Newsome and D. X. Song, "Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software," in Proceedings of the 12th Annual Network and Distributed System Security Symposium, 2005.
[9]
G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic, "Flexitaint: A programmable accelerator for dynamic taint propagation," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, 2008.
[10]
W. Hu, J. Oberg, A. Irturk, M. Tiwari, T. Sherwood, D. Mu, and R. Kastner, "Theoretical fundamentals of gate level information flow tracking," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[11]
S. Bhatkar and R. Sekar, "Data space randomization," in Detection of Intrusions and Malware, and Vulnerability Assessment, 2008.
[12]
C. Cadar, P. Akritidis, M. Costa, J.-P. Martin, and M. Castro, "Data randomization," Technical Report MSR-TR-2008-120, Microsoft Research, Tech. Rep., 2008.
[13]
C. Rohlf and Y. Ivnitskiy, "Attacking clientside JIT compilers," Black Hat, USA, 2011.
[14]
J. Oberg, S. Meiklejohn, T. Sherwood, and R. Kastner, "Leveraging gate-level properties to identify hardware timing channels," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[15]
W. Hu, J. Oberg, A. Irturk, M. Tiwari, T. Sherwood, D. Mu, and R. Kastner, "On the complexity of generating gate level information flow tracking logic," IEEE Transactions on Information Forensics and Security, 2012.
[16]
T. H. Heil and J. E. Smith, "Concurrent garbage collection using hardware-assisted profiling," in Proceedings of the 2Nd International Symposium on Memory Management, 2000.
[17]
F. Zhang, K. Leach, A. Stavrou, H. Wang, and K. Sun, "Using hardware features for increased debugging transparency," in Proceedings of the 2015 IEEE Symposium on Security and Privacy, 2015.
[18]
B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar, "Native client: A sandbox for portable, untrusted x86 native code," in 2009 30th IEEE Symposium on Security and Privacy. IEEE, 2009.
[19]
G. Morrisett, G. Tan, J. Tassarotti, J.-B. Tristan, and E. Gan, "Rocksalt: better, faster, stronger sfi for the x86," in ACM SIGPLAN Notices, 2012.
[20]
R. Wahbe, "Efficient data breakpoints," in ACM SIGPLAN Notices, 1992.
[21]
J. L. Greathouse, Z. Ma, M. I. Frank, R. Peri, and T. Austin, "Demand-driven software race detection using hardware performance counters," in ACM SIGARCH Computer Architecture News, 2011.
[22]
J. L. Greathouse, H. Xin, Y. Luo, and T. Austin, "A case for unlimited watchpoints," in ACM SIGARCH Computer Architecture News, 2012.
[23]
T. Sherwood, S. Sair, and B. Calder, "Phase tracking and prediction," in ACM SIGARCH Computer Architecture News, 2003.
[24]
M. Gupta, V. Sridharan, D. Roberts, A. Prodromou, A. Venkat, D. Tullsen, and R. Gupta, "Reliability-aware data placement for heterogeneous memory architecture," in High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on, 2018.
[25]
Y. Wang, A. Ferraiuolo, D. Zhang, A. C. Myers, and G. E. Suh, "Secdcp: Secure dynamic cache partitioning for efficient timing channel protection," in Proceedings of the 53rd Annual Design Automation Conference, 2016.
[26]
A. Rane, C. Lin, and M. Tiwari, "Raccoon: Closing digital side-channels through obfuscated execution." in USENIX Security Symposium, 2015.
[27]
C. Liu, A. Harris, M. Maas, M. Hicks, M. Tiwari, and E. Shi, "Ghostrider: A hardware-software system for memory trace oblivious computation," ACM SIGARCH Computer Architecture News, 2015.
[28]
I. Kim and M. H. Lipasti, "Implementing optimizations at decode time," in Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002.
[29]
I. Kim and M. H. Lipasti, "Macro-op scheduling: Relaxing scheduling loop constraints," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003.
[30]
S. Hu, I. Kim, M. H. Lipasti, and J. E. Smith, "An approach for implementing efficient superscalar cisc processors," in The Twelfth International Symposium on High-Performance Computer Architecture, 2006., 2006.
[31]
D. L. Howard and M. H. Lipasti, "The effect of program optimization on trace cache efficiency," in Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999.
[32]
E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith, "Trace processors," in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997.
[33]
R. Rajwar, M. Dixon, and R. Singhal, "Specialized evolution of the general purpose cpu." in CIDR, 2015.
[34]
M. DeVuyst, A. Venkat, and D. M. Tullsen, "Execution migration in a heterogeneous-isa chip multiprocessor," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, 2012.
[35]
A. Venkat and D. M. Tullsen, "Harnessing isa diversity: Design of a heterogeneous-isa chip multiprocessor," in Proceedings of the International Symposium on Computer Architecture, 2014.
[36]
A. Venkat, S. Shamasunder, H. Shacham, and D. M. Tullsen, "Hipstr: Heterogeneous-isa program state relocation," in Proceedings of the International Symposium on Architectural Support for Programming Languages and Operating Systems, 2016.
[37]
A. Barbalace, M. Sadini, S. Ansary, C. Jelesnianski, A. Ravichandran, C. Kendir, A. Murray, and B. Ravindran, "Popcorn: Bridging the Programmability Gap in heterogeneous-ISA Platforms," in Proceedings of the 10th European Conference on Computer Systems, Apr. 2015.
[38]
A. Branković, K. Stavrou, E. Gibert, and A. González, "Performance analysis and predictability of the software layer in dynamic binary translators/optimizers," in Proceedings of the ACM International Conference on Computing Frontiers, 2013.
[39]
S. Hu and J. E. Smith, "Using dynamic binary translation to fuse dependent instructions," in Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, 2004.
[40]
A. Venkat, A. Krishnaswamy, K. Yamada, and R. Palanivel, "Binary Translation driven Program State Relocation," in United States Patent Grant US009135435B2, 2015.
[41]
K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye, "Dynamic binary translation and optimization," IEEE Transactions on Computers, 2001.
[42]
J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson, "The Transmeta Code Morphing Software: using speculation, recovery, and adaptive retranslation to address real-life challenges," in Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, 2003.
[43]
D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman, "Denver: Nvidia's first 64-bit ARM processor," IEEE Micro, 2015.
[44]
N. Clark, H. Zhong, and S. Mahlke, "Processor acceleration through automated instruction set customization," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, 2003.
[45]
M. L. Corliss, E. C. Lewis, and A. Roth, "Dise: A programmable macro engine for customizing applications," in Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on, 2003.
[46]
M. L. Corliss, E. C. Lewis, and A. Roth, "Low-overhead interactive debugging via dynamic instrumentation with dise," in 11th International Symposium on High-Performance Computer Architecture, 2005.
[47]
M. L. Corliss, E. C. Lewis, and A. Roth, "Using dise to protect return addresses from attack," SIGARCH Comput. Archit. News, Mar. 2005.
[48]
T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, "Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds," in Proceedings of the 16th ACM Conference on Computer and Communications Security, 2009.
[49]
H. M. G. Wassel, Y. Gao, J. K. Oberg, T. Huffmire, R. Kastner, F. T. Chong, and T. Sherwood, "Surfnoc: A low latency and provably non-interfering approach to secure networks-on-chip," in Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013.
[50]
O. Aciiçmez, W. Schindler, and Ç. K. Koç, "Cache based remote timing attack on the aes," in Cryptographers Track at the RSA Conference, 2007.
[51]
D. J. Bernstein, "Cache-timing attacks on aes," Tech. Rep., 2005.
[52]
K. Mowery, S. Keelveedhi, and H. Shacham, "Are aes x86 cache timing attacks still feasible?" in Proceedings of the 2012 ACM Workshop on Cloud computing security workshop, 2012.
[53]
Y. Yarom and K. Falkner, "Flush+ reload: A high resolution, low noise, l3 cache side-channel attack." in USENIX Security, 2014.
[54]
Y. Yarom, D. Genkin, and N. Heninger, "Cachebleed: A timing attack on openssl constant time rsa," in International Conference on Cryptographic Hardware and Embedded Systems, 2016.
[55]
O. Aciiçmez, Ç. K. Koç, and J.-P. Seifert, "Predicting secret keys via branch prediction," in Cryptographers Track at the RSA Conference, 2007.
[56]
E. Biham and A. Shamir, "Differential fault analysis of secret key cryptosystems," in Annual International Cryptology Conference, 1997.
[57]
K. Gandolfi, C. Mourtel, and F. Olivier, "Electromagnetic analysis: Concrete results," in International Workshop on Cryptographic Hardware and Embedded Systems, 2001.
[58]
D. Genkin, A. Shamir, and E. Tromer, "Acoustic cryptanalysis," Journal of Cryptology, 2016.
[59]
J. J. Hoch and A. Shamir, "Fault analysis of stream ciphers," in International Workshop on Cryptographic Hardware and Embedded Systems, 2004.
[60]
M. C. W. K. Gruss, Daniel and S. Mangard, Flush+Flush: A Fast and Stealthy Cache Attack, 2016.
[61]
Y. Oren, V. P. Kemerlis, S. Sethumadhavan, and A. D. Keromytis, "The spy in the sandbox: Practical cache attacks in javascript and their implications," in Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, 2015.
[62]
S. Crane, A. Homescu, S. Brunthaler, P. Larsen, and M. Franz, "Thwarting cache side-channel attacks through dynamic software diversity." in NDSS, 2015.
[63]
N. Madan, A. Buyuktosunoglu, P. Bose, and M. Annavaram, "A case for guarded power gating for multi-core processors," in Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011.
[64]
S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman, "Managing static leakage energy in micro-processor functional units," in Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002.
[65]
H. Farrokhbakht, M. Taram, B. Khaleghi, and S. Hessabi, "Toot: an efficient and scalable power-gating method for noc routers," in 2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS), 2016.
[66]
J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan, and C. Kozyrakis, "Power management of datacenter workloads using per-core power gating," IEEE Computer Architecture Letters, 2009.
[67]
T. Chen, A. Rucker, and G. E. Suh, "Execution time prediction for energy-efficient hardware accelerators," in Proceedings of the 48th International Symposium on Microarchitecture, 2015.
[68]
R. Kumar, A. Martínez, and A. González, "Efficient power gating of simd accelerators through dynamic selective devectorization in an hw/sw codesigned environment," ACM Trans. Archit. Code Optim., 2014.
[69]
R. Kumar, A. Martínez, and A. González, "Efficient power gating of simd accelerators through dynamic selective devectorization in an hw/sw codesigned environment," ACM Trans. Archit. Code Optim., 2014.
[70]
A. Fog, "Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for intel, amd and via cpus," http://www.agner.org/optimize/instruction_tables.pdf.
[71]
Intel Corporation, Intel® 64 and IA-32 Architectures Optimization Reference Manual, March 2009.
[72]
G. E. Suh, C. W. O'Donnell, and S. Devadas, "Aegis: A single-chip secure processor," Information Security Technical Report, 2005.
[73]
G. E. Suh, D. Clarke, B. Gassend, M. van Dijk, and S. Devadas, "Aegis: Architecture for tamper-evident and tamper-resistant processing," in Proceedings of the 17th Annual International Conference on Supercomputing, 2003.
[74]
ARM, "Arm security technology - building a secure system using trustzone technology," 2009.
[75]
"Debian microcode update," https://wiki.debian.org/Microcode, 2017.
[76]
K. S. McKinley, S. Carr, and C.-W. Tseng, "Improving data locality with loop transformations," ACM Trans. Program. Lang. Syst., Jul. 1996.
[77]
A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner, "Verifying gpu kernels by test amplification," in ACM SIGPLAN Notices, vol. 47, no. 6. ACM, 2012, pp. 383--394.
[78]
K. Crary, N. Glew, D. Grossman, R. Samuels, F. Smith, D. Walker, S. Weirich, and S. Zdancewic, "Talx86: A realistic typed assembly language," in 1999 ACM SIGPLAN Workshop on Compiler Support for System Software Atlanta, GA, USA, 1999.
[79]
G. E. Suh, C. W. O'Donnell, I. Sachdev, and S. Devadas, "Design and implementation of the aegis single-chip secure processor using physical random functions," in Proceedings of the 32Nd Annual International Symposium on Computer Architecture, 2005.
[80]
A. Ferraiuolo, R. Xu, D. Zhang, A. C. Myers, and G. E. Suh, "Verification of a practical hardware security architecture through static information flow analysis," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.
[81]
O. Aciiçmez, "Yet another microarchitectural attack:: exploiting i-cache," in Proceedings of the 2007 ACM workshop on Computer security architecture, 2007.
[82]
D. A. Osvik, A. Shamir, and E. Tromer, Cache Attacks and Countermeasures: The Case of AES, 2006.
[83]
Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, "Microarchitectural techniques for power gating of execution units," in Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004.
[84]
C. Long and L. He, "Distributed sleep transistors network for power reduction," in Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451), 2003.
[85]
H. Jiang, M. Marek-Sadowska, and S. R. Nassif, "Benefits and costs of power-gating technique," in 2005 International Conference on Computer Design, 2005, pp. 559--566.
[86]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009.
[87]
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, "The M5 Simulator: Modeling Networked Systems," Micro, IEEE, 2006.
[88]
"2nd Generation Intel Core vPro Processor Family," Intel, Tech. Rep., 2008, available at http://www.intel.com/content/dam/doc/white-paper/performance-2nd-generation-core-vpro-family-paper.pdf. {Online}. Available: http://www.intel.com/content/dam/doc/white-paper/performance-2nd-generation-core-vpro-family-paper.pdf
[89]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, "Mibench: A free, commercially representative embedded benchmark suite," in Proceedings of the 2001 IEEE International Workshop on Workload Characterization, 2001.
[90]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically characterizing large scale program behavior," in Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, 2002.
[91]
H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie, "Pinplay: A framework for deterministic replay and reproducible analysis of parallel programs," in Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2010.
[92]
O. D. A. Tromer, Eran and A. Shamir, "Efficient cache attacks on aes, and countermeasures," Journal of Cryptology, 2010.

Cited By

View all
  • (2021)FlexFilt: Towards Flexible Instruction Filtering for SecurityProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3488019(646-659)Online publication date: 6-Dec-2021
  • (2021)TAMAACM Transactions on Embedded Computing Systems10.1145/346270020:5(1-24)Online publication date: 9-Jul-2021
  • (2019)Context-Sensitive FencingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304060(395-410)Online publication date: 4-Apr-2019
  • Show More Cited By

Index Terms

  1. Mobilizing the micro-ops: exploiting context sensitive decoding for security and energy efficiency
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture
        June 2018
        884 pages
        ISBN:9781538659847

        Publisher

        IEEE Press

        Publication History

        Published: 02 June 2018

        Check for updates

        Author Tags

        1. microcode
        2. power gating
        3. security
        4. side channel

        Qualifiers

        • Research-article

        Conference

        ISCA '18

        Acceptance Rates

        Overall Acceptance Rate 543 of 3,203 submissions, 17%

        Upcoming Conference

        ISCA '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)FlexFilt: Towards Flexible Instruction Filtering for SecurityProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3488019(646-659)Online publication date: 6-Dec-2021
        • (2021)TAMAACM Transactions on Embedded Computing Systems10.1145/346270020:5(1-24)Online publication date: 9-Jul-2021
        • (2019)Context-Sensitive FencingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304060(395-410)Online publication date: 4-Apr-2019
        • (2019)CORFProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304026(701-714)Online publication date: 4-Apr-2019
        • (2019)Context-Sensitive DecodingIEEE Micro10.1109/MM.2019.291050739:3(75-83)Online publication date: 1-May-2019

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media