Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3314221.3314601acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

A complete formal semantics of x86-64 user-level instruction set architecture

Published: 08 June 2019 Publication History

Abstract

We present the most complete and thoroughly tested formal semantics of x86-64 to date. Our semantics faithfully formalizes all the non-deprecated, sequential user-level instructions of the x86-64 Haswell instruction set architecture. This totals 3155 instruction variants, corresponding to 774 mnemonics. The semantics is fully executable and has been tested against more than 7,000 instruction-level test cases and the GCC torture test suite. This extensive testing paid off, revealing bugs in both the x86-64 reference manual and other existing semantics. We also illustrate potential applications of our semantics in different formal analyses, and discuss how it can be useful for processor verification.

Supplementary Material

WEBM File (p1133-dasgupta.webm)
MP4 File (3314221.3314601.mp4)
Video Presentation

References

[1]
2008. IEEE Std 754-2008 - IEEE Standard for Floating-Point Arithmetic. https://standards.ieee.org/findstds/standard/754-2008.html.
[2]
2017. Sail x86 ISA model. https://github.com/rems-project/sail/tree/sail2/x86. Last accessed: May 2, 2019.
[3]
2018. Angr: A powerful and user-friendly binary analysis platform! http://angr.io/. Last accessed: May 2, 2019.
[4]
2018. Bug Reported in Intel Developer Zone: Possible errors in instruction semantics. https://software.intel.com/en-us/forums/intel-isa-extensions/topic/773342. Last accessed: May 2, 2019.
[5]
2018. Bug Reported in Stoke: Modelling the behavior of flags which may or must take undef values. https://github.com/StanfordPL/stoke/issues/986. Last accessed: May 2, 2019.
[6]
2018. Bug Reported in Stoke: Semantic bugs. https://github.com/StanfordPL/stoke/issues/983. Last accessed: May 2, 2019.
[7]
2018. C Language Testsuites: C-torture version 8.1.0. https://gcc.gnu.org/onlinedocs/gccint/C-Tests.html. Last accessed: May 2, 2019.
[8]
2018. Eric Schkufza. Personal communication.
[9]
2018. Evaluable Strings Intermediate Language. https://radare.gitbooks.io/radare2book/content/disassembling/esil.html. Last accessed: May 2, 2019.
[10]
2018. GDB: The GNU Project Debugger. https://www.gnu.org/software/gdb/. Last accessed: May 2, 2019.
[11]
2018. Intel 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm. Published on October 12, 2016, updated May 18, 2018.
[12]
2018. MPFR Java Bindings. https://github.com/kframework/mpfr-java. Last accessed: May 2, 2019.
[13]
2018. Remill: Library for lifting of x86, amd64, and aarch64 machine code to LLVM bitcode. https://github.com/trailofbits/remill. Last accessed: May 2, 2019.
[14]
2018. The Coq Proof Assistant. https://coq.inria.fr/. Last accessed: May 2, 2019.
[15]
2018. The GNU MPFR Library. https://www.mpfr.org/. Last accessed: May 2, 2019.
[16]
2018. x86 and amd64 Instruction Reference (UnOfficial). http://www.felixcloutier.com/x86/. Last accessed: May 2, 2019.
[17]
2018. X86isa: Implemented-opcodes: Opcodes supported by the x86 model. http://www.cs.utexas.edu/users/moore/acl2/manuals/current/manual/index-seo.php/X86ISA____IMPLEMENTED-OPCODES. Last accessed: May 2, 2019.
[18]
2019. rmem: Executable concurrency models for ARMv8, RISC-V, Power, and x86. https://github.com/rems-project/rmem/. Last accessed: May 2, 2019.
[19]
A. Ahmed, F. Farahmandi, and P. Mishra. 2018. Directed test generation using concolic testing on RTL models. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE). 1538s1543.
[20]
Sergi Alvarez. 2018. Radare2. https://rada.re/r/. Last accessed: May 2, 2019.
[21]
Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, and Peter Sewell. 2019. ISA Semantics for ARMv8-a, RISC-v, and CHERI-MIPS. Proc. ACM Program. Lang. 3, POPL, Article 71 (Jan. 2019), 31 pages.
[22]
Gogul Balakrishnan and Thomas Reps. 2010. WYSINWYX: What You See is Not What You eXecute. ACM Trans. Program. Lang. Syst. 32, 6, Article 23 (Aug. 2010), 84 pages.
[23]
Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 41-41. http://dl.acm.org/citation.cfm?id=1247360.1247401.
[24]
David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J. Schwartz. 2011. BAP: A Binary Analysis Platform. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV'11). Springer-Verlag, Berlin, Heidelberg, 463-469. http://dl.acm.org/citation.cfm?id=2032305.2032342.
[25]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 209-224. http://dl.acm.org/citation.cfm?id=1855741.1855756.
[26]
Sandeep Dasgupta. 2018. Semantics of x86-64 in K. https://github.com/kframework/X86-64-semantics. Last accessed: May 2, 2019.
[27]
Sandeep Dasgupta. 2019. Defining semantics of instructions unsupported in Strata/Stoke. https://github.com/StanfordPL/stoke/pull/996. Last accessed: May 2, 2019.
[28]
Sandeep Dasgupta. 2019. Improving Stoke ability to debug a circuit. https://github.com/StanfordPL/stoke/pull/997. Last accessed: May 2, 2019.
[29]
Chucky Ellison and Grigore Roþu. 2012. An Executable Formal Semantics of C with Applications. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'12). ACM, 533-544.
[30]
Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. 2017. An Empirical Study on the Correctness of Formally Verified Distributed Systems. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, New York, NY, USA, 328-343.
[31]
Harry D. Foster. 2015. Trends in Functional Verification: A 2014 Industry Study. In Proceedings of the 52nd Annual Design Automation Conference (DAC '15). ACM, New York, NY, USA, 48:1-48:6.
[32]
Shilpi Goel, Warren A. Hunt, and Matt Kaufmann. 2017. Engineering a Formal, Executable x86 ISA Simulator for Software Verification. Springer International Publishing, Cham, 173-209.
[33]
Shilpi Goel, Warren A. Hunt, Matt Kaufmann, and Soumava Ghosh. 2014. Simulation and Formal Verification of x86 Machine-Code Programs That Make System Calls. In Proceedings of the 14th Conference on Formal Methods in Computer-Aided Design (FMCAD '14). FMCAD Inc, Austin, TX, Article 18, 8 pages. http://dl.acm.org/citation.cfm?id=2682923.2682944.
[34]
Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. 2016. CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 653-669. http://dl.acm.org/citation.cfm?id=3026877.3026928.
[35]
Niranjan Hasabnis and R. Sekar. 2016. Extracting Instruction Semantics via Symbolic Execution of Code Generators. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 301-313.
[36]
Niranjan Hasabnis and R. Sekar. 2016. Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 311-324.
[37]
Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016. Stratified Synthesis: Automatically Learning the x86-64 Instruction Set. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '16). ACM, New York, NY, USA, 237-250.
[38]
Matt Kaufmann, J. Strother Moore, and Panagiotis Manolios. 2000. Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Norwell, MA, USA.
[39]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04). Palo Alto, California.
[40]
Kevin P. Lawton. 1996. Bochs: A Portable PC Emulator for Unix/X. Linux J. 1996, 29es, Article 7 (Sept. 1996). http://dl.acm.org/citation.cfm?id=326350.326357.
[41]
Xavier Leroy. 2009. Formal Verification of a Realistic Compiler. Commun. ACM 52, 7 (July 2009), 107-115.
[42]
Junghee Lim and Thomas Reps. 2013. TSL: A System for Generating Abstract Interpreters and Its Application to Machine-Code Analysis. ACM Trans. Program. Lang. Syst. 35, 1, Article 4 (April 2013), 59 pages.
[43]
L. Liu and S. Vasudevan. 2011. Efficient validation input generation in RTL by hybridized source code analysis. In 2011 Design, Automation Test in Europe. 1-6.
[44]
Lingyi Liu and Shobha Vasudevan. 2014. Scaling Input Stimulus Generation Through Hybrid Static and Dynamic Analysis of RTL. ACM Trans. Des. Autom. Electron. Syst. 20, 1, Article 4 (Nov. 2014), 33 pages.
[45]
Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. 2012. Path-exploration Lifting: Hi-fi Tests for Lo-fi Emulators. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 337-348.
[46]
Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: better, faster, stronger SFI for the x86. PLDI: Programming Languages Design and Implementation (2012), 395-404.
[47]
Nicholas Nethercote and Julian Seward. 2003. Valgrind: A Program Supervision Framework. Electronic Notes in Theoretical Computer Science 89, 2 (2003), 44-66. RV '2003, Run-time Verification (Satellite Workshop of CAV '03).
[48]
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86 Memory Model: X86-TSO. In Proceedings of the 22Nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs '09). Springer-Verlag, Berlin, Heidelberg, 391-407.
[49]
Daejun Park, Yi Zhang, Manasvi Saxena, Philip Daian, and Grigore Roþu. 2018. A Formal Verification Tool for Ethereum VM Bytecode. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, USA, 912-915.
[50]
Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2017. Simplifying ARM Concurrency: Multicopy-atomic Axiomatic and Operational Models for ARMv8. Proc. ACM Program. Lang. 2, POPL, Article 19 (Dec. 2017), 29 pages.
[51]
Alastair Reid. 2016. ARM's Architecture Specification Language. https://alastairreid.github.io/specification_languages/. Last accessed: May 2, 2019.
[52]
Alastair Reid. 2017. Trustworthy specifications of ARM® v8-A and v8-M system level architecture. Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design, FMCAD 2016 (2017), 161-168.
[53]
Grigore Roþu and Andrei Þtef?nescu. 2012. Checking Reachability using Matching Logic. In Proceedings of the 27th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA' 12). ACM, 555-574. https://doi.org/citation.cfm?doid=2384616.2384656.
[54]
Ian Roessle, Freek Verbeek, and Binoy Ravindran. 2019. Formally Verified Big Step Semantics out of x86-64 Binaries. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2019). ACM, New York, NY, USA, 181-195.
[55]
Grigore Roþu and Traian Florin Þerb?nut?. 2010. An Overview of the K Semantic Framework. Journal of Logic and Algebraic Programming 79, 6 (2010), 397-434.
[56]
Andrew Ruef and Artem Dinaburg. 2014. Static Translation of X86 Instruction Semantics to LLVM with McSema. REcon (2014). https://github.com/trailofbits/mcsema.
[57]
Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant, Magnus O. Myreen, and Jade Alglave. 2009. The Semantics of x86-CC Multiprocessor Machine Code. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '09). ACM, New York, NY, USA, 379-391.
[58]
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic Superoptimization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 305-316.
[59]
Traian Florin Þerb?nuta, Andrei Arusoaie, David Lazar, Chucky Ellison, Dorel Lucanu, and Grigore Roþu. [n. d.]. The K Primer (version 3.2). Technical Report.
[60]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. (2016).
[61]
Venkatesh Srinivasan and Thomas Reps. 2015. Synthesis of machine code from semantics. Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2015 (2015), 596-607.
[62]
Andrei Stef?nescu, Daejun Park, Shijiao Yuwen, Yilong Li, and Grigore Roþu. 2016. Semantics-based Program Verifiers for All Languages. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 74-91.
[63]
Ken Thompson. 1984. Reflections on Trusting Trust. Commun. ACM 27, 8 (Aug. 1984), 761-763.
[64]
Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making Information Flow Explicit in HiStar. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, Berkeley, CA, USA, 19-19. http://dl.acm.org/citation.cfm?id=1267308.1267327.

Cited By

View all
  • (2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
  • (2024)Lightweight, Modular Verification for WebAssembly-to-Native Instruction SelectionProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624862(231-248)Online publication date: 27-Apr-2024
  • (2024)Formalizing x86-64 ISA in Isabelle/HOL: A Binary Semantics for eBPF JIT CorrectnessDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-96-0602-3_11(197-216)Online publication date: 25-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Formal Semantics
  2. ISA specification
  3. x86-64

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)447
  • Downloads (Last 6 weeks)49
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
  • (2024)Lightweight, Modular Verification for WebAssembly-to-Native Instruction SelectionProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624862(231-248)Online publication date: 27-Apr-2024
  • (2024)Formalizing x86-64 ISA in Isabelle/HOL: A Binary Semantics for eBPF JIT CorrectnessDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-96-0602-3_11(197-216)Online publication date: 25-Nov-2024
  • (2024)Minuska: Towards a Formally Verified Programming Language FrameworkSoftware Engineering and Formal Methods10.1007/978-3-031-77382-2_12(200-214)Online publication date: 26-Nov-2024
  • (2023)Generating Proof Certificates for a Language-Agnostic Deductive Program VerifierProceedings of the ACM on Programming Languages10.1145/35860297:OOPSLA1(56-84)Online publication date: 6-Apr-2023
  • (2023)Galápagos: Developing Verified Low Level Cryptography on Heterogeneous HardwaresProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616603(2113-2127)Online publication date: 15-Nov-2023
  • (2023)Formalizing, Verifying and Applying ISA Security Guarantees as Universal ContractsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616602(2083-2097)Online publication date: 15-Nov-2023
  • (2023)A High-Coverage and Efficient Instruction-Level Testing Approach for x86 ProcessorsIEEE Transactions on Computers10.1109/TC.2023.328876272:11(3203-3217)Online publication date: Nov-2023
  • (2023)QueryX: Symbolic Query on Decompiled Code for Finding Bugs in COTS Binaries2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179314(3279-3295)Online publication date: May-2023
  • (2023)Towards End-to-End Verified TEEs via Verified Interface Conformance and Certified Compilers2023 IEEE 36th Computer Security Foundations Symposium (CSF)10.1109/CSF57540.2023.00021(324-339)Online publication date: Jul-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media