research-article

Open access

Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection

Authors:

Alexa VanHattum,

Monica Pardeshi,

Adrian Sampson,

Fraser BrownAuthors Info & Claims

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

Pages 231 - 248

https://doi.org/10.1145/3617232.3624862

Published: 17 April 2024 Publication History

Abstract

Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong as the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific instructions can introduce subtle bugs that violate security guarantees. In this paper, we present Crocus, a system for lightweight, modular verification of instruction-lowering rules within Cranelift, a production retargetable Wasm native code generator. We use Crocus to verify lowering rules that cover WebAssembly 1.0 support for integer operations in the ARM aarch64 backend. We show that Crocus can reproduce 3 known bugs (including a 9.9/10 severity CVE), identify 2 previously-unknown bugs and an underspecified compiler invariant, and help analyze the root causes of a new bug.

References

[1]

Alessandro Abate, Cristina David, Pascal Kesseli, Daniel Kroening, and Elizabeth Polgreen. Counterexample guided inductive synthesis modulo theories. In International Conference on Computer-Aided Verification (CAV), 2018.

[2]

Maaz Bin Safeer Ahmad, Alexander J. Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. Vector instruction selection for digital signal processors using program synthesis. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022.

Digital Library

[3]

Bytecode Alliance. ISLE language reference. https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/isle/docs/language-reference.md, 2023.

[4]

Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, and Peter Sewell. ISA semantics for ARMv8-A, RISC-V, and CHERI-MIPS. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2019.

Digital Library

[5]

Alasdair Armstrong, Brian Campbell, Ben Simner, Christopher Pulte, and Peter Sewell. Isla: Integrating full-scale ISA semantics and axiomatic concurrency models. In International Conference on Computer-Aided Verification (CAV), 2021.

Digital Library

[6]

Javier Cabrera Arteaga, Nicholas Fitzgerald, Martin Monperrus, and Benoit Baudry. Wasm-mutate: Fuzzing WebAssembly compilers with e-graphs. In E-Graph Research, Applications, Practices, and Human-factors Symposium, 2022. URL: https://www.jacarte.me/assets/pdf/wasm_mutate.pdf.

[7]

Vytautas Astrauskas, Peter Müller, Federico Poli, and Alexander J. Summers. Leveraging Rust types for modular specification and verification. In ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA), 2019.

Digital Library

[8]

Marek Baranowski, Shaobo He, and Zvonimir Rakamaric. Verifying Rust programs with SMACK. 2018.

[9]

Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The spec# programming system: An overview. In Construction and Analysis of Safe, Secure, and Interoperable Smart Devices, 2005.

Digital Library

[10]

Björn Roy Baron et al. Cranelift codegen backend for Rust, 2023. URL: https://github.com/bjorn3/rustc_codegen_cranelift.

[11]

Clark W. Barrett, Aaron Stump, and Cesare Tinelli. The SMT-LIB standard version 2.0. In Proceedings of the 8th International Workshop on Satisfiability Modulo Theories (SMT), 2010. URL: https://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.0-r10.12.21.pdf.

[12]

Iulia Bastys, Maximilian Algehed, Alexander Sjösten, and Andrei Sabelfeld. Secwasm: Information flow control for WebAssembly. In Static Analysis, 2022.

Digital Library

[13]

Yves Bertot and Pierre Castéran. Interactive theorem proving and program development: Coq'Art: the calculus of inductive constructions. Springer Science & Business Media, 2013.

[14]

Jay Bosamiya, Wen Shih Lim, and Bryan Parno. Provably-safe multilingual software sandboxing using WebAssembly. In USENIX Security Symposium, 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/bosamiya.

[15]

Fraser Brown, John Renner, Andres Nötzli, Sorin Lerner, Hovav Shacham, and Deian Stefan. Towards a verified range analysis for JavaScript JITs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2020.

Digital Library

[16]

Sebastian Buchwald, Andreas Fried, and Sebastian Hack. Synthesizing an instruction selection rule library from semantic specifications. In ACM/IEEE International Symposium on Code Generation and Optimization (CGO), 2018.

Digital Library

[17]

Bytecode Alliance. The Cranelift compiler. https://github.com/bytecodealliance/wasmtime/tree/main/cranelift, 2023.

[18]

Bytecode Alliance. Wasmtime: A fast and secure runtime for WebAssembly. https://wasmtime.dev, 2023.

[19]

R. G. Cattell. Automatic derivation of code generators from machine descriptions. ACM Transactions on Programming Languages and Systems (TOPLAS), 1980.

Digital Library

[20]

R G G Cattell. Formalization and Automatic Derivation of Code Generators. PhD thesis, Carnegie Mellon University, 1978. https://apps.dtic.mil/sti/pdfs/ADA058872.pdf.

Digital Library

[21]

J. Ceng, M. Hohenauer, R. Leupers, G. Ascheid, H. Meyr, and G. Braun. C compiler retargeting based on instruction semantics models. In Design, Automation & Test in Europe (DATE), 2005.

Digital Library

[22]

Alex Crichton. Data leakage between instances in the pooling allocator. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-wh6w-3828-g9qf, November 2022.

[23]

Alex Crichton. Miscompilation of constant values in division on aarch64. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-7f6x-jwh5-m9r4, July 2022.

[24]

Alex Crichton. Miscompilation of 'i8x16.swizzle' and 'select' with v128 inputs. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-jqwc-c49r-4w2x, 2022.

[25]

Alex Crichton. Guest-controlled out-of-bounds read/write on x8664. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-ff4p-7xrq-q5r8, 2023.

[26]

Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. Synthesizing instruction selection rewrite rules from RTL using SMT. In Formal Methods in Computer-Aided Design (FMCAD), 2022.

[27]

Sandeep Dasgupta, Daejun Park, Theodoros Kasampalis, Vikram S. Adve, and Grigore Roşu. A complete formal semantics of x86-64 user-level instruction set architecture. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2019.

Digital Library

[28]

Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2008.

Digital Library

[29]

Nachum Dershowitz and Jean-Pierre Jouannaud. Rewrite systems. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics, 1991.

[30]

João Dias and Norman Ramsey. Automatically generating instruction selectors using declarative machine descriptions. 2010.

Digital Library

[31]

Chris Fallin. Memory access due to code generation flaw in Cranelift module. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5, May 2021.

[32]

Chris Fallin. RFC: Design of the ISLE instruction-selector DSL. https://github.com/bytecodealliance/rfcs/pull/15, August 2021.

[33]

Chris Fallin. Cranelift's instruction selector DSL, ISLE: Term-rewriting made practical. https://cfallin.org/blog/2023/01/20/cranelift-isle/, January 2023.

[34]

Anthony Fox, Magnus O Myreen, Yong Kiam Tan, and Ramana Kumar. Verified compilation of CakeML to multiple machine-code targets. 2017.

Digital Library

[35]

Go Authors. Go compiler backend lowering rules. https://github.com/golang/go/tree/master/src/cmd/compile/internal/ssa/_gen, 2023.

[36]

Andreas Haas, Andreas Rossberg, Derek L Schuff, Ben L Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. Bringing the web up to speed with WebAssembly. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2017.

Digital Library

[37]

Rikard Hjort. Formally verifying WebAssembly with KWasm, 2020. URL: https://odr.chalmers.se/server/api/core/bitstreams/a06be182-a12e-46ce-94d3-cff7a5dc42ba/content.

[38]

C. A. R. Hoare. An axiomatic basis for computer programming. In Communications of the ACM (CACM), 1969.

Digital Library

[39]

Roger Hoover and Kenneth Zadeck. Generating machine specific optimizing compilers. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 1996.

Digital Library

[40]

Susmit Jha, Rhishikesh Limaye, and Sanjit A. Seshia. Beaver: Engineering an efficient SMT solver for bit-vector arithmetic. In Computer Aided Verification, 2009.

Digital Library

[41]

Evan Johnson, Evan Laufer, Zijie Zhao, Dan Gohman, Shravan Narayan, Stefan Savage, Deian Stefan, and Fraser Brown. WaVe: a verifiably secure WebAssembly sandboxing runtime. In IEEE Security and Privacy (Oakland), 2023.

[42]

Evan Johnson, David Thien, Yousef Alhessi, Shravan Narayan, Fraser Brown, Sorin Lerner, Tyler McMullen, Stefan Savage, and Deian Stefan. Trust but verify: SFI safety for native-compiled Wasm. 2021. URL: https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf.

[43]

Kenton Varda. WebAssembly on Cloudflare workers. https://blog.cloudflare.com/webassembly-on-cloudflare-workers/, 2018.

[44]

Ramana Kumar, Magnus O. Myreen, Michael Norrish, and Scott Owens. CakeML: A verified implementation of ML. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2014.

Digital Library

[45]

Sudipta Kundu, Zachary Tatlock, and Sorin Lerner. Proving optimizations correct using parameterized program equivalence. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2009.

Digital Library

[46]

Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In ACM/IEEE International Symposium on Code Generation and Optimization (CGO), 2004.

[47]

Sorin Lerner, Todd Millstein, and Craig Chambers. Automatically proving the correctness of compiler optimizations. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2003.

Digital Library

[48]

Sorin Lerner, Todd Millstein, Erika Rice, and Craig Chambers. Automated soundness proofs for dataflow analyses and transformations via local rules. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2005.

Digital Library

[49]

Xavier Leroy. Formal verification of a realistic compiler. Communications of the ACM (CACM), 52(7):107--115, 2009.

Digital Library

[50]

Xavier Leroy. A formally verified compiler back-end. Journal of Automated Reasoning, 43(4):363--446, 2009.

Digital Library

[51]

Nuno P Lopes, Juneyoung Lee, Chung-Kil Hur, Zhengyang Liu, and John Regehr. Alive2: bounded translation validation for LLVM. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2021.

Digital Library

[52]

Nuno P Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. Provably correct peephole optimizations with Alive. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2015.

Digital Library

[53]

Nuno P. Lopes and John Regehr. Future directions for optimizing compilers. 2018. URL: https://arxiv.org/pdf/1809.02161.pdf.

[54]

Alberto Martelli and Ugo Montanari. An efficient unification algorithm. In ACM Transactions on Programming Languages and Systems (TOPLAS), 1982.

Digital Library

[55]

Charith Mendis and Saman Amarasinghe. GoSLP: Globally optimized superword level parallelism framework. In ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA), 2018.

Digital Library

[56]

Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the 38th Annual Design Automation Conference, 2001.

Digital Library

[57]

Manasij Mukherjee, Pranav Kant, Zhengyang Liu, and John Regehr. Dataflow-based pruning for speeding up superoptimization. In ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA), 2020.

Digital Library

[58]

Joshua Nelson. Using rustc_codegen_cranelift for debug builds. https://blog.rust-lang.org/inside-rust/2020/11/15/Using-rustc_codegen_cranelift.html, November 2020.

[59]

Luke Nelson, Jacob Van Geffen, Emina Torlak, and Xi Wang. Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.

Digital Library

[60]

Aina Niemetz, Mathias Preiner, Andrew Reynolds, Yoni Zohar, Clark Barrett, and Cesare Tinelli. Towards bit-width-independent proofs in SMT solvers. In International Conference on Automated Deduction (CADE), 2019.

Digital Library

[61]

Pat Hickey. Lucet takes WebAssembly beyond the browser | Fastly. https://www.fastly.com/blog/announcing-lucet-fastly-native-webassembly-compiler-runtime, 2019.

[62]

Jonathan Protzenko, Benjamin Beurdouche, Denis Merigoux, and Karthikeyan Bhargavan. Formally verified cryptographic web applications in WebAssembly. In IEEE Symposium on Security and Privacy (SP), 2019.

[63]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2013.

Digital Library

[64]

Alastair Reid, Luke Church, Shaked Flur, Sarah de Haas, Maritza Johnson, and Ben Laurie. Towards making formal methods normal: meeting developers where they are. 2020.

[65]

Gang Ren, Peng Wu, and David Padua. Optimizing data permutations for SIMD devices. page 118--131, 2006.

Digital Library

[66]

Andreas Rossberg. WebAssembly Specification Release 1.0. https://webassembly.github.io/JS-BigInt-integration/core/_download/WebAssembly.pdf, 2019.

[67]

Andreas Rossberg. WebAssembly Specification Release 2.0 Draft Draft 2023-04-08. https://webassembly.github.io/spec/core/_download/WebAssembly.pdf, 2023.

[68]

Gordon Stewart, Lennart Beringer, Santiago Cuellar, and Andrew W. Appel. Compositional CompCert. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2015.

Digital Library

[69]

Yong Kiam Tan, Magnus O Myreen, Ramana Kumar, Anthony Fox, Scott Owens, and Michael Norrish. The verified CakeML compiler backend. Journal of Functional Programming, 29, 2019. URL: https://cakeml.org/jfp19.pdf.

[70]

Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. Vectorization for digital signal processors via equality saturation. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2021.

Digital Library

[71]

Vercel Inc. Using WebAssembly (Wasm) at the edge. https://vercel.com/docs/concepts/functions/edge-functions/wasm, 2023.

[72]

Eelco Visser, Zine-el-Abidine Benaissa, and Andrew Tolmach. Building program optimizers with rewriting strategies. In ACM International Conference on Functional Programming (ICFP), 1998.

Digital Library

[73]

Conrad Watt. Mechanising and verifying the WebAssembly specification. 2018.

Digital Library

[74]

Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, and Deian Stefan. CT-Wasm: Type-driven secure cryptography for the web ecosystem. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2019.

Digital Library

[75]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.

Digital Library

Index Terms

Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification
  2. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Integrated instruction selection and register allocation for compact code generation exploiting freeform mixing of 16- and 32-bit instructions
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

For memory constrained embedded systems code size is at least as important as performance. One way of increasing code density is to exploit compact instruction formats, e.g. ARM Thumb, where the processor either operates in standard or compact ...
Generalized instruction selection using SSA-graphs
LCTES '08

Instruction selection is a well-studied compiler phase that translates the compiler's intermediate representation of programs to a sequence of target-dependent machine instructions optimizing for various compiler objectives (e.g. speed and space). Most ...
Generalized instruction selection using SSA-graphs
LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems

Instruction selection is a well-studied compiler phase that translates the compiler's intermediate representation of programs to a sequence of target-dependent machine instructions optimizing for various compiler objectives (e.g. speed and space). Most ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

April 2024

494 pages

ISBN:9798400703720

DOI:10.1145/3617232

General Chairs:
Nael Abu-Ghazaleh,
Rajiv Gupta,
Program Chairs:
Madan Musuvathi,
Dan Tsafrir

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

ASPLOS '24

Sponsor:

ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

April 27 - May 1, 2024

CA, La Jolla, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
451
Total Downloads

Downloads (Last 12 months)451
Downloads (Last 6 weeks)62

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents