2019 results sorted by ID
mUOV: Masking the Unbalanced Oil and Vinegar Digital Sigital Signature Scheme at First- and Higher-Order
Suparna Kundu, Quinten Norga, Uttam Kumar Ojha, Anindya Ganguly, Angshuman Karmakar, Ingrid Verbauwhede
Implementation
The National Institute for Standards and Technology (NIST) initiated a standardization procedure for additional digital signatures and recently announced round-2 candidates for the PQ additional digital signature schemes. The multivariate digital signature scheme Unbalanced Oil and Vinegar (UOV) is one of the oldest post-quantum schemes and has been selected by NIST for Round 2. Although UOV is mathematically secure, several side-channel attacks (SCA) have been shown on the UOV or UOV-based...
BatchZK: A Fully Pipelined GPU-Accelerated System for Batch Generation of Zero-Knowledge Proofs
Tao Lu, Yuxun Chen, Zonghui Wang, Xiaohang Wang, Wenzhi Chen, Jiaheng Zhang
Implementation
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables one party to prove the validity of a statement to other parties without disclosing any secret information. With its widespread adoption in applications such as blockchain and verifiable machine learning, the demand for generating zero-knowledge proofs has increased dramatically. In recent years, considerable efforts have been directed toward developing GPU-accelerated systems for proof generation. However, these previous...
The LaZer Library: Lattice-Based Zero Knowledge and Succinct Proofs for Quantum-Safe Privacy
Vadim Lyubashevsky, Gregor Seiler, Patrick Steuer
Implementation
The hardness of lattice problems offers one of the most promising
security foundations for quantum-safe cryptography. Basic schemes
for public key encryption and digital signatures are already close to
standardization at NIST and several other standardization bodies,
and the research frontier has moved on to building primitives with
more advanced privacy features. At the core of many such primi-
tives are zero-knowledge proofs. In recent years, zero-knowledge
proofs for (and using)...
Scutum: Temporal Verification for Cross-Rollup Bridges via Goal-Driven Reduction
Yanju Chen, Juson Xia, Bo Wen, Kyle Charbonnet, Hongbo Wen, Hanzhi Liu, Yu Feng
Implementation
Scalability remains a key challenge for blockchain adoption. Rollups—especially zero-knowledge (ZK) and optimistic rollups—address this by processing transactions off-chain while maintaining Ethereum’s security, thus reducing gas fees and improving speeds. Cross-rollup bridges like Orbiter Finance enable seamless asset transfers across various Layer 2 (L2) rollups and between L2 and Layer 1 (L1) chains. However, the increasing reliance on these bridges raises significant security concerns,...
Classic McEliece Hardware Implementation with Enhanced Side-Channel and Fault Resistance
Peizhou Gan, Prasanna Ravi, Kamal Raj, Anubhab Baksi, Anupam Chattopadhyay
Implementation
In this work, we propose the first hardware implementation of Classic McEliece protected with countermeasures against Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA). Classic Mceliece is one of the leading candidates for Key Encapsulation Mechanisms (KEMs) in the ongoing round 4 of the NIST standardization process for post-quantum cryptography. In particular, we implement a range of generic countermeasures against SCA and FIA, particularly protected the vulnerable operations...
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
Improved ML-DSA Hardware Implementation With First Order Masking Countermeasure
Kamal Raj, Prasanna Ravi, Tee Kiah Chia, Anupam Chattopadhyay
Implementation
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
Isogeny interpolation and the computation of isogenies from higher dimensional representations
David Jao, Jeanne Laflamme
Implementation
The Supersingular Isogeny Diffie-Hellman (SIDH) scheme is a public key cryptosystem that was submitted to the National Institute of Standards and Technology's competition for the standardization of post-quantum cryptography protocols. The private key in SIDH consists of an isogeny whose degree is a prime power. In July 2022, Castryck and Decru discovered an attack that completely breaks the scheme by recovering Bob's secret key, using isogenies between higher dimensional abelian varieties to...
Towards Explainable Side-Channel Leakage: Unveiling the Secrets of Microarchitecture
Ischa Stork, Vipul Arora, Łukasz Chmielewski, Ileana Buhan
Implementation
We explore the use of microbenchmarks, small assembly code snippets, to detect microarchitectural side-channel leakage in CPU implementations. Specifically, we investigate the effectiveness of microbenchmarks in diagnosing the predisposition to side-channel leaks in two commonly used RISC-V cores: Picorv32 and Ibex. We propose a new framework that involves diagnosing side-channel leaks, identifying leakage points, and constructing leakage profiles to understand the underlying causes. We...
Masking Gaussian Elimination at Arbitrary Order, with Application to Multivariate- and Code-Based PQC
Quinten Norga, Suparna Kundu, Uttam Kumar Ojha, Anindya Ganguly, Angshuman Karmakar, Ingrid Verbauwhede
Implementation
Digital signature schemes based on multivariate- and code-based hard problems are promising alternatives for lattice-based signature schemes, due to their smaller signature size. Hence, several candidates in the ongoing additional standardization for quantum secure digital signature (DS) schemes by the National Institute of Standards and Technology (NIST) rely on such alternate hard problems. Gaussian Elimination (GE) is a critical component in the signing procedure of these schemes. In this...
Push-Button Verification for BitVM Implementations
Hanzhi Liu, Jingyu Ke, Hongbo Wen, Luke Pearson, Robin Linus, Lukas George, Manish Bista, Hakan Karakuş, Domo, Junrui Liu, Yanju Chen, Yu Feng
Implementation
Bitcoin, while being the most prominent blockchain with the largest market capitalization, suffers from scalability and throughput limitations that impede the development of ecosystem projects like Bitcoin Decentralized Finance (BTCFi). Recent advancements in BitVM propose a promising Layer 2 (L2) solution to enhance Bitcoin's scalability by enabling complex computations off-chain with on-chain verification. However, Bitcoin's constrained programming environment—characterized by its...
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang
Implementation
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
OpenNTT: An Automated Toolchain for Compiling High-Performance NTT Accelerators in FHE
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
Radical 2-isogenies and cryptographic hash functions in dimensions 1, 2 and 3
Sabrina Kunzweiler, Luciano Maino, Tomoki Moriya, Christophe Petit, Giacomo Pope, Damien Robert, Miha Stopar, Yan Bo Ti
Implementation
We provide explicit descriptions for radical 2-isogenies in dimensions
one, two and three using theta coordinates. These formulas allow us to efficiently
navigate in the corresponding isogeny graphs.
As an application of this, we implement different versions of the CGL hash func-
tion. Notably, the three-dimensional version is fastest, which demonstrates yet
another potential of using higher dimensional isogeny graphs in cryptography.
cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications
Supriya Adhikary, Wai Kong Lee, Angshuman Karmakar, Yongwoo Lee, Seong Oun Hwang, Ramachandra Achar
Implementation
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
Drifting Towards Better Error Probabilities in Fully Homomorphic Encryption Schemes
Olivier Bernard, Marc Joye, Nigel P. Smart, Michael Walter
Implementation
There are two security notions for FHE schemes the traditional notion of IND-CPA, and a more stringent notion of IND-CPA$^D$. The notions are equivalent if the FHE schemes are perfectly correct, however for schemes with negligible failure probability the FHE parameters needed to obtain IND-CPA$^D$ security can be much larger than those needed to obtain IND-CPA security. This paper uses the notion of ciphertext drift in order to understand the practical difference between IND-CPA and...
Computational Analysis of Plausibly Post-Quantum-Secure Recursive Arguments of Knowledge
Dustin Ray, Paulo L. Barreto
Implementation
With the recent standardization of post-quantum cryptographic algorithms, research efforts have largely remained centered on public key exchange and encryption schemes. Argument systems, which allow a party to efficiently argue the correctness of a computation, have received comparatively little attention regarding their quantum-resilient design. These computational integrity frameworks often rely on cryptographic assumptions, such as pairings or group operations, which are vulnerable to...
Overlapped Bootstrapping for FHEW/TFHE and Its Application to SHA3
Deokhwa Hong, Youngjin Choi, Yongwoo Lee, Young-Sik Kim
Implementation
Homomorphic Encryption (HE) enables operations on encrypted data without requiring decryption, thus allowing for secure handling of confidential data within smart contracts. Among the known HE schemes, FHEW and TFHE are particularly notable for use in smart contracts due to their lightweight nature and support for arbitrary logical gates. In contrast, other HE schemes often require several gigabytes of keys and are limited to supporting only addition and multiplication. As a result, there...
Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
SIMD-style Sorting of Integer Sequence in RLWE Ciphertext
Zijing Li, Hongbo Li, Zhengyang Wang
Implementation
This article discusses fully homomorphic encryption and homomorphic sorting. Homomorphic encryption is a special encryption technique that allows all kinds of operations to be performed on ciphertext, and the result is still decryptable, such that when decrypted, the result is the same as that obtained by performing the same operation on the plaintext. Homomorphic sorting is an important problem in homomorphic encryption. Currently, there has been a volume of work on homomorphic sorting. In...
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Aein Rezaei Shahmirzadi, Michael Hutter
Implementation
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai
Implementation
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
General Functional Bootstrapping using CKKS
Andreea Alexandru, Andrey Kim, Yuriy Polyakov
Implementation
The Ducas-Micciancio (DM/FHEW) and Chilotti-Gama-Georgieva-Izabachène (CGGI/TFHE) cryptosystems provide a general privacy-preserving computation capability. These fully homomorphic encryption (FHE) cryptosystems can evaluate an arbitrary function expressed as a general look-up table (LUT) via the method of functional bootstrapping (also known as programmable bootstrapping). The main limitation of DM/CGGI functional bootstrapping is its efficiency because this procedure has to bootstrap every...
Block Ciphers in Idealized Models: Automated Proofs and New Security Results
Miguel Ambrona, Pooya Farshim, Patrick Harasser
Implementation
We develop and implement AlgoROM, a tool to systematically analyze the security of a wide class of symmetric primitives in idealized models of computation. The schemes that we consider are those that can be expressed over an alphabet consisting of XOR and function symbols for hash functions, permutations, or block ciphers.
We implement our framework in OCaml and apply it to a number of prominent constructions, which include the Luby–Rackoff (LR), key-alternating Feistel (KAF), and...
FLUENT: A Tool for Efficient Mixed-Protocol Semi-Private Function Evaluation
Daniel Günther, Joachim Schmidt, Thomas Schneider, Hossein Yalame
Implementation
In modern business to customer interactions, handling private or confidential data is essential. Private Function Evaluation (PFE) protocols ensure the privacy of both the customers' input data and the business' function evaluated on it which is often sensitive intellectual property (IP). However, fully hiding the function in PFE results in high performance overhead. Semi-Private Function Evaluation (SPFE) is a generalization of PFE to only partially hide the function, whereas specific...
Bit t-SNI Secure Multiplication Gadget for Inner Product Masking
John Gaspoz, Siemen Dhooghe
Implementation
Masking is a sound countermeasure to protect against differential power analysis. Since the work by Balasch et al. in ASIACRYPT 2012, inner product masking has been explored as an alternative to the well known Boolean masking. In CARDIS 2017, Poussier et al. showed that inner product masking achieves higher-order security versus Boolean masking, for the same shared size, in the bit-probing model. Wang et al. in TCHES 2020 verified the inner product masking's security order amplification in...
HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0
Ali Şah Özcan, Erkay Savaş
Implementation
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
The SMAesH dataset
Gaëtan Cassiers, Charles Momin
Implementation
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
Optimized Software Implementation of Keccak, Kyber, and Dilithium on RV{32,64}IM{B}{V}
Jipeng Zhang, Yuxing Yan, Junhao Huang, Çetin Kaya Koç
Implementation
With the standardization of NIST post-quantum cryptographic (PQC) schemes, optimizing these PQC schemes across various platforms presents significant research value. While most existing software implementation efforts have concentrated on ARM platforms, research on PQC implementations utilizing various RISC-V instruction set architectures (ISAs) remains limited.
In light of this gap, this paper proposes comprehensive and efficient optimizations of Keccak, Kyber, and Dilithium on...
Practical Implementation of Pairing-Based zkSNARK in Bitcoin Script
Federico Barbacovi, Enrique Larraia, Paul Germouty, Wei Zhang
Implementation
Groth16 is a pairing-based zero-knowledge proof scheme that has a constant proof size and an efficient verification algorithm. Bitcoin Script is a stack-based low-level programming language that is used to lock and unlock bitcoins. In this paper, we present a practical implementation of the Groth16 verifier in Bitcoin Script deployable on the mainnet of a Bitcoin blockchain called BSV. Our result paves the way for a framework of verifiable computation on Bitcoin: a Groth16 proof is generated...
A Combined Design of 4-PLL-TRNG and 64-bit CDC-7-XPUF on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Attestation Proof of Association – provability that attestation keys are bound to the same hardware and person
Eric Verheul
Implementation
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
32-bit and 64-bit CDC-7-XPUF Implementations on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Design and Implementation of a Fast, Platform-Adaptive, AIS-20/31 Compliant PLL-Based True Random Number Generator on a Zynq 7020 SoC FPGA
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Implementation
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
$\Pi$-signHD: A New Structure for the SQIsign Family with Flexible Applicability
Kaizhan Lin, Weize Wang, Chang-An Zhao, Yunlei Zhao
Implementation
Digital signature is a fundamental cryptographic primitive and is widely used in the real world. Unfortunately, the current digital signature standards like EC-DSA and RSA are not quantum-resistant. Among post-quantum cryptography (PQC), isogeny-based signatures preserve some advantages of elliptic curve cryptosystems, particularly offering small signature sizes. Currently, SQIsign and its variants are the most promising isogeny-based digital signature schemes.
In this paper, we propose a...
A Recursive zk-based State Update System
Daniel Bloom, Sai Deng
Implementation
This paper introduces a ZKP (zero-knowledge proof) based state update system, where each block contains a SNARK proof aggregated from the user generated zkVM (zero knowledge virtual machine) proofs. It enables users to generate state update proofs in their local machines, contributing to a secure, decentralized verification process. Our main contribution in this paper, the recursive proofs system, addresses scalability by recursively verifying user proofs and aggregating them in a...
A Better Kyber Butterfly for FPGAs
Jonas Bertels, Quinten Norga, Ingrid Verbauwhede
Implementation
Kyber was selected by NIST as a Post-Quantum
Cryptography Key Encapsulation Mechanism standard. This
means that the industry now needs to transition and adopt
these new standards. One of the most demanding operations in
Kyber is the modular arithmetic, making it a suitable target for
optimization. This work offers a novel modular reduction design
with the lowest area on Xilinx FPGA platforms. This novel design,
through K-reduction and LUT-based reduction, utilizes 49 LUTs
and 1 DSP...
High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature
Shiyu Shen, Hao Yang, Wangchen Dai, Hong Zhang, Zhe Liu, Yunlei Zhao
Implementation
Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees afforded by classical signature schemes. CRYSTALS-Dilithium is an efficient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a...
Update to the Sca25519 Library: Mitigating Tearing-based Side-channel Attacks
Lukasz Chmielewski, Lubomír Hrbáček
Implementation
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios.
This library is protected against various passive and...
Approach for High-Performance Random Number Generators for Critical Systems
Pascal Hammer, Veronika Krause, Tobias Probst, Jürgen Mottok
Implementation
In times of digitalization, the encryption and signing
of sensitive data is becoming increasingly important. These
cryptographic processes require large quantities of high-quality
random numbers. Which is why a high-performance random
number generator (RNG) is to be developed. For this purpose,
existing concepts of RNGs and application standards are first
analyzed. The proposed approach is to design a physical true
random number generator (PTRNG) with a high output of
random numbers....
SoK: Instruction Set Extensions for Cryptographers
Hao Cheng, Johann Großschädl, Ben Marshall, Daniel Page, Markku-Juhani O. Saarinen
Implementation
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
Point (de)compression for elliptic curves over highly $2$-adic finite fields
Dmitrii Koshelev
Implementation
This article addresses the issue of efficient and safe (de)compression of $\mathbb{F}_{\!q}$-points on an elliptic curve $E$ over a highly $2$-adic finite field $\mathbb{F}_{\!q}$ of characteristic $5$ or greater. The given issue was overlooked by cryptography experts, probably because, until recently, such fields were not in trend. Therefore, there was no difficulty (with rare exceptions) in finding a square $\mathbb{F}_{\!q}$-root. However, in our days, fields with large $2$-adicities have...
KpqClean Ver2: Comprehensive Benchmarking and Analysis of KpqC Algorithm Round 2 Submissions
Minjoo Sim, Siwoo Eum, Gyeongju Song, Minwoo Lee, Sangwon Kim, Minho Song, Hwajeong Seo
Implementation
From 2022, Korean Post-Quantum Cryptography (KpqC) Competition has been held. Among the Round 1 algorithms of KpqC, eight algorithms were selected in December 2023. To evaluate the algorithms, the performance is critical factor. However, the performance of the algorithms submitted to KpqC was evaluated in different development environments. Consequently, it is difficult to compare the performance of each algorithm fairly, because the measurements were not conducted in the identical...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof
Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang
Implementation
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
Beyond the Whitepaper: Where BFT Consensus Protocols Meet Reality
David Wong, Denis Kolegov, Ivan Mikushin
Implementation
This paper presents a collection of lessons learned from analyzing the real-world security of various Byzantine Fault Tolerant (BFT) consensus protocol implementations. Drawing upon our experience as a team of security experts who have both developed and audited BFT systems, including BA★, HotStuff variants, Paxos variants, and DAG-based algorithms like Narwhal and Bullshark, we identify and analyze a variety of security vulnerabilities discovered in the translation of theoretical protocols...
Efficient Differentially Private Set Intersection
Xinyu Peng, Yufei Wang, Weiran Liu, Liqiang Peng, Feng Han, Zhen Gu, Jianling Sun, Yuan Hong
Implementation
Private Set Intersection (PSI) enables a sender and a receiver to jointly compute the intersection of their sets without disclosing other information about items not in the intersection. However, in many cases of joint data analysis, it is not just the items outside the intersection that are sensitive but the items within it. To protect such sensitive information, prior work presents a Differentially Private version of PSI (DPSI) based on a circuit-PSI using Fully Homomorphic Encryption....
Optimizing Big Integer Multiplication on Bitcoin: Introducing w-windowed Approach
Dmytro Zakharov, Oleksandr Kurbatov, Manish Bista, Belove Bist
Implementation
A crucial component of any zero-knowledge system is operations with finite fields. This, in turn, leads to the implementation of the fundamental operation: multiplying two big integers. In the realm of Bitcoin, this problem gets revisited, as Bitcoin utilizes its own stack-based and not Turing-complete scripting system called Bitcoin Script. Inspired by Elliptic Curve scalar multiplication, this paper introduces the $w$-windowed method for multiplying two numbers. We outperform...
A short-list of pairing-friendly curves resistant to the Special TNFS algorithm at the 192-bit security level
Diego F. Aranha, Georgios Fotiadis, Aurore Guillevic
Implementation
For more than two decades, pairings have been a fundamental tool for designing elegant cryptosystems, varying from digital signature schemes to more complex privacy-preserving constructions. However, the advancement of quantum computing threatens to undermine public-key cryptography. Concretely, it is widely accepted that a future large-scale quantum computer would be capable to break any public-key cryptosystem used today, rendering today's public-key cryptography obsolete and mandating the...
Quantum Implementation and Analysis of ARIA
Yujin Oh, Kyungbae Jang, Yujin Yang, Hwajeong Seo
Implementation
The progression of quantum computing is considered a potential threat to traditional cryptography system, highlighting the significance of post-quantum security in cryptographic systems. Regarding symmetric key encryption, the Grover algorithm can approximately halve the search complexity. Despite the absence of fully operational quantum computers at present, the necessity of assessing the security of symmetric key encryption against quantum computing continues to grow. In this paper, we...
Depth Optimized Quantum Circuits for HIGHT and LEA
Kyungbae Jang, Yujin Oh, Minwoo Lee, Dukyoung Kim, Hwajeong Seo
Implementation
Quantum computers can model and solve several problems that have posed challenges for classical super computers, leveraging their natural quantum mechanical characteristics. A large-scale quantum computer is poised to significantly reduce security strength in cryptography. In this context, extensive research has been conducted on quantum cryptanalysis. In this paper, we present optimized quantum circuits for Korean block ciphers, HIGHT and LEA. Our quantum circuits for HIGHT and LEA...
Efficient Layered Circuit for Verification of SHA3 Merkle Tree
Changchang Ding, Zheming Fu
Implementation
We present an efficient layered circuit design for SHA3-256 Merkle tree verification, suitable for a GKR proof system, that achieves logarithmic verification and proof size. We demonstrate how to compute the predicate functions for our circuit in $O(\log n)$ time to ensure logarithmic verification and provide GKR benchmarks for our circuit.
Applying Post-Quantum Cryptography Algorithms to a DLT-Based CBDC Infrastructure: Comparative and Feasibility Analysis
Daniel de Haro Moraes, Joao Paulo Aragao Pereira, Bruno Estolano Grossi, Gustavo Mirapalheta, George Marcel Monteiro Arcuri Smetana, Wesley Rodrigues, Courtnay Nery Guimarães Jr., Bruno Domingues, Fábio Saito, Marcos Simplício
Implementation
This article presents an innovative project for a Central Bank Digital Currency (CBDC) infrastructure. Focusing on security and reliability, the proposed architecture: (1) employs post-quantum cryptography (PQC) algorithms for long-term security, even against attackers with access to cryptographically-relevant quantum computers; (2) can be integrated with a Trusted Execution Environment (TEE) to safeguard the confidentiality of transaction contents as they are processed by third-parties; and...
A fast heuristic for mapping Boolean circuits to functional bootstrapping
Sergiu Carpov
Implementation
Functional bootstrapping in FHE schemes such as FHEW and TFHE allows the evaluation of a function on an encrypted message, in addition to noise reduction.
Implementing programs that directly use functional bootstrapping is challenging and error-prone.
In this paper, we propose a heuristic that automatically maps Boolean circuits to functional bootstrapping instructions.
Unlike other approaches, our method does not limit the encrypted data plaintext space to a power-of-two size, allowing...
Prover - Toward More Efficient Formal Verification of Masking in Probing Model
Feng Zhou, Hua Chen, Limin Fan
Implementation
In recent years, formal verification has emerged as a crucial method for assessing security against Side-Channel attacks of masked implementations, owing to its remarkable versatility and high degree of automation. However, formal verification still faces technical bottlenecks in balancing accuracy and efficiency, thereby limiting its scalability. Former tools like maskVerif and CocoAlma are very efficient but they face accuracy issues when verifying schemes that utilize properties of...
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
Xinyi Ji, Jiankuo Dong, Junhao Huang, Zhijian Yuan, Wangchen Dai, Fu Xiao, Jingqiang Lin
Implementation
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Efficient Implementation of Super-optimal Pairings on Curves with Small Prime Fields at the 192-bit Security Level
Jianming Lin, Chang-An Zhao, Yuhao Zheng
Implementation
For many pairing-based cryptographic protocols such as Direct Anonymous Attestation (DAA) schemes, the arithmetic on the first pairing subgroup $\mathbb{G}_1$ is more fundamental. Such operations heavily depend on the sizes of prime fields. At the 192-bit security level, Gasnier and Guillevic presented a curve named GG22D7-457 with CM-discriminant $D = 7$ and embedding degree $k = 22$. Compared to other well-known pairing-friendly curves at the same security level, the curve GG22D7-457 has...
Hardware Implementation and Security Analysis of Local-Masked NTT for CRYSTALS-Kyber
Rafael Carrera Rodriguez, Emanuele Valea, Florent Bruguier, Pascal Benoit
Implementation
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic
Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Fast computation of 2-isogenies in dimension 4 and cryptographic applications
Pierrick Dartois
Implementation
Dimension 4 isogenies have first been introduced in cryptography for the cryptanalysis of Supersingular Isogeny Diffie-Hellman (SIDH) and have been used constructively in several schemes, including SQIsignHD, a derivative of SQIsign isogeny based signature scheme. Unlike in dimensions 2 and 3, we can no longer rely on the Jacobian model and its derivatives to compute isogenies. In dimension 4 (and higher), we can only use theta-models. Previous works by Romain Cosset, David Lubicz and Damien...
Improved High-Order Masked Generation of Masking Vector and Rejection Sampling in Dilithium
Jean-Sébastien Coron, François Gérard, Tancrède Lepoint, Matthias Trannoy, Rina Zeitoun
Implementation
In this work, we introduce enhanced high-order masking techniques tailored for Dilithium, the post-quantum signature scheme recently standardized by NIST. We improve the masked generation of the masking vector $\vec{y}$, based on a fast Boolean-to-arithmetic conversion modulo $q$. We also describe an optimized gadget for the high-order masked rejection sampling, with a complexity independent from the size of the modulus $q$. We prove the security of our gadgets in the classical ISW...
Jolt-b: recursion friendly Jolt with basefold commitment
Hang Su, Qi Yang, Zhenfei Zhang
Implementation
The authors of Jolt [AST24] pioneered a unique method for creating zero-knowledge virtual machines, known as the lookup singularity. This technique extensively uses lookup tables to create virtual machine circuits. Despite Jolt’s performance being twice as efficient as the previous state-of-the-art1 , there is potential for further enhancement.
The initial release of Jolt uses Spartan [Set20] and Hyrax [WTs+ 18] as their backend, leading to two constraints. First, Hyrax employs Pedersen...
Finding Bugs and Features Using Cryptographically-Informed Functional Testing
Giacomo Fenzi, Jan Gilcher, Fernando Virdia
Implementation
In 2018, Mouha et al. (IEEE Trans. Reliability, 2018) performed a post-mortem investigation of the correctness of reference implementations submitted to the SHA3 competition run by NIST, finding previously unidentified bugs in a significant portion of them, including two of the five finalists. Their innovative approach allowed them to identify the presence of such bugs in a black-box manner, by searching for counterexamples to expected cryptographic properties of the implementations under...
Implementation and Performance Evaluation of Elliptic Curve Cryptography over SECP256R1 on STM32 Microprocessor
Onur İşler
Implementation
The use of Internet of Things (IoT) devices in embedded systems has become increasingly popular with advancing technologies. These devices become vulnerable to cyber attacks as they gain popularity. The cryptographic operations performed for the purpose of protection against cyber attacks are crucial to yield fast results in open networks and not slow down network traffic. Therefore, to enhance communication security, studies have been conducted in the literature on using asymmetric...
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
Jing Tian, Bo Wu, Lang Feng, Haochen Zhang, Zhongfeng Wang
Implementation
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Phase Modulation Side Channels: Jittery JTAG for On-Chip Voltage Measurements
Colin O'Flynn
Implementation
Measuring the fluctuations of the clock phase of a target was identified as a leakage source on early electromagnetic side-channel investigations. Despite this, only recently was directly measuring the clock phase (or jitter) of digital signals from a target connected to being a source of exploitable leakage. As the phase of a clock output will be related to signal propagation delay through the target, and this propagation delay is related to voltage, this means that most digital devices...
Masked Vector Sampling for HQC
Maxime Spyropoulos, David Vigilant, Fabrice Perion, Renaud Pacalet, Laurent Sauvage
Implementation
Anticipating the advent of large quantum computers, NIST started a worldwide competition in 2016 aiming to define the next cryptographic standards. HQC is one of these post-quantum schemes still in contention, with three others already standardized. In 2022, Guo et al. introduced a timing attack that exploited an inconsistency in HQC rejection sampling function to recover its secret key in 866,000 calls to an oracle. The authors of HQC updated its specification by applying an algorithm to...
PolyFHEmus: Rethinking Multiplication in Fully Homomorphic Encryption
Charles Gouert, Nektarios Georgios Tsoutsos
Implementation
Homomorphic encryption is a powerful technology that solves key privacy concerns in cloud computing by enabling computation on encrypted data. However, it has not seen widespread adoption due to prohibitively high latencies. In this article, we identify polynomial multiplication as a bottleneck and investigate alternative algorithms to accelerate encrypted computing.
LEA Block Cipher in Rust Language: Trade-off between Memory Safety and Performance
Sangwon Kim, Siwoo Eum, Minho Song, Hwajeong Seo
Implementation
Cryptography implementations of block cipher have been written in C language due to its strong features on system-friendly features. However, the C language is prone to memory safety issues, such as buffer overflows and memory leaks. On the other hand, Rust, novel system programming language, provides strict compile-time memory safety guarantees through its ownership model. This paper presents the implementation of LEA block cipher in Rust language, demonstrating features to prevent...
Quantum Implementation of LSH
Yujin Oh, Kyungbae Jang, Hwajeong Seo
Implementation
As quantum computing progresses, the assessment of cryptographic algorithm resilience against quantum attack gains significance interests in the field of cryptanalysis. Consequently, this paper implements the depth-optimized quantum circuit of Korean hash function (i.e., LSH) and estimates its quantum attack cost in quantum circuits. By utilizing an optimized quantum adder and employing parallelization techniques, the proposed quantum circuit achieves a 78.8\% improvement in full depth and a...
A More Compact AES, and More
Dag Arne Osvik, David Canright
Implementation
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
Protecting cryptographic code against Spectre-RSB
Santiago Arranz Olmos, Gilles Barthe, Chitchanok Chuengsatiansup, Benjamin Grégoire, Vincent Laporte, Tiago Oliveira, Peter Schwabe, Yuval Yarom, Zhiyuan Zhang
Implementation
It is fundamental that executing cryptographic software must not leak secrets through side-channels. For software-visible side-channels, it was long believed that "constant-time" programming would be sufficient as a systematic countermeasure. However, this belief was shattered in 2018 by attacks exploiting speculative execution—so called Spectre attacks.
Recent work shows that language support suffices to protect cryptographic code with minimal overhead against one class of such attacks,...
Optimized Computation of the Jacobi Symbol
Jonas Lindstrøm, Kostas Kryptos Chalkias
Implementation
The Jacobi Symbol is an essential primitive in cryptographic applications such as primality testing, integer factorization, and various encryption schemes. By exploring the interdependencies among modular reductions within the algorithmic loop, we have developed a refined method that significantly enhances computational efficiency. Our optimized algorithm, implemented in the Rust language, achieves a performance increase of 72% over conventional textbook methods and is twice as fast as the...
KyberSlash: Exploiting secret-dependent division timings in Kyber implementations
Daniel J. Bernstein, Karthikeyan Bhargavan, Shivam Bhasin, Anupam Chattopadhyay, Tee Kiah Chia, Matthias J. Kannwischer, Franziskus Kiefer, Thales Paiva, Prasanna Ravi, Goutam Tamvada
Implementation
This paper presents KyberSlash1 and KyberSlash2 – two timing vulnerabilities in several implementations (including the official reference code) of the Kyber Post-Quantum Key Encapsulation Mechanism, currently undergoing standardization as ML-KEM. We demonstrate the exploitability of both KyberSlash1 and KyberSlash2 on two popular platforms: the Raspberry Pi 2 (Arm Cortex-A7) and the Arm Cortex-M4 microprocessor. Kyber secret keys are reliably recovered within minutes for KyberSlash2 and a...
GRASP: Accelerating Hash-based PQC Performance on GPU Parallel Architecture
Yijing Ning, Jiankuo Dong, Jingqiang Lin, Fangyu Zheng, Yu Fu, Zhenjiang Dong, Fu Xiao
Implementation
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...
Polynomial sharings on two secrets: Buy one, get one free
Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Maximilian Orlt
Implementation
While passive side-channel attacks and active fault attacks have been studied intensively in the last few decades, strong attackers combining these attacks have only been studied relatively recently. Due to its simplicity, most countermeasures against passive attacks are based on additive sharing. Unfortunately, extending these countermeasures against faults often leads to quite a significant performance penalty, either due to the use of expensive cryptographic operations or a large number...
Accelerating pairings on BW10 and BW14 Curves
Senegue Gomez Nyamsi, Laurian Guimagang Azebaze, Emmanuel Fouotsa
Implementation
Since the advent of pairing based cryptography, many researchers have developed several techniques and variants of pairings to optimise the speed of pairing computations. The selection of the elliptic curve for a given pairing based protocol is crucial for operations in the first and second pairing groups of points of the elliptic curve and for many cryptographic schemes. A new variant of superoptimal pairing was proposed in 2023, namely x-superoptimal pairing on curves with odd prime...
Grafting: Complementing RNS in CKKS
Jung Hee Cheon, Hyeongmin Choe, Minsik Kang, Jaehyung Kim
Implementation
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
Aneesh Kandi, Anubhab Baksi, Peizhou Gan, Sylvain Guilley, Tomáš Gerlich, Jakub Breier, Anupam Chattopadhyay, Ritu Ranjan Shrivastwa, Zdeněk Martinásek, Shivam Bhasin
Implementation
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
A Note on (2, 2)-isogenies via Theta Coordinates
Jianming Lin, Saiyu Wang, Chang-An Zhao
Implementation
In this paper, we revisit the algorithm for computing chains of $(2, 2)$-isogenies between products of elliptic curves via theta coordinates proposed by Dartois et al. For each fundamental block of this algorithm, we provide a explicit inversion-free version. Besides, we exploit a novel technique of $x$-only ladder to speed up the computation of gluing isogeny. Finally, we present a mixed optimal strategy, which combines the inversion-elimination tool with the original methods together...
Consolidated Linear Masking (CLM): Generalized Randomized Isomorphic Representations, Powerful Degrees of Freedom and Low(er)-cost
Itamar Levi, Osnat Keren
Implementation
Masking is a widely adopted countermeasure against side-channel analysis (SCA) that protects cryptographic implementations from information leakage. However, current masking schemes often incur significant overhead in terms of electronic cost. RAMBAM, a recently proposed masking technique that fits elegantly with the AES algorithm, offers ultra-low latency/area by utilizing redundant representations of finite field elements. This paper presents a comprehensive generalization of RAMBAM and...
Time Sharing - A Novel Approach to Low-Latency Masking
Dilip Kumar S. V., Siemen Dhooghe, Josep Balasch, Benedikt Gierlichs, Ingrid Verbauwhede
Implementation
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
SoK: Model Reverse Engineering Threats for Neural Network Hardware
Seetal Potluri, Farinaz Koushanfar
Implementation
There has been significant progress over the past seven years in model reverse engineering (RE) for neural network (NN) hardware. Although there has been systematization of knowledge (SoK) in an overall sense, however, the treatment from the hardware perspective has been far from adequate. To bridge this gap, this paper systematically categorizes the types of NN hardware used prevalently by the industry/academia, and also the model RE attacks/defenses published in each category. Further, we...
Approximate CRT-Based Gadget Decomposition and Application to TFHE Blind Rotation
Olivier Bernard, Marc Joye
Implementation
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Glitch-Stopping Circuits: Hardware Secure Masking without Registers
Zhenda Zhang, Svetla Nikova, Ventzislav Nikov
Implementation
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
Analyzing and Benchmarking ZK-Rollups
Stefanos Chaliasos, Itamar Reif, Adrià Torralba-Agell, Jens Ernstberger, Assimakis Kattis, Benjamin Livshits
Implementation
As blockchain technology continues to transform the realm of digital transactions, scalability has emerged as a critical issue. This challenge has spurred the creation of innovative solutions, particularly Layer 2 scalability techniques like rollups. Among these, ZK-Rollups are notable for employing Zero-Knowledge Proofs to facilitate prompt on-chain transaction verification, thereby improving scalability and efficiency without sacrificing security. Nevertheless, the intrinsic complexity of...
Secret Key Recovery in a Global-Scale End-to-End Encryption System
Graeme Connell, Vivian Fang, Rolfe Schmidt, Emma Dauterman, Raluca Ada Popa
Implementation
End-to-end encrypted messaging applications ensure that an attacker cannot read a user's message history without their decryption keys. While this provides strong privacy, it creates a usability problem: if a user loses their devices and cannot access their decryption keys, they can no longer access their account. To solve this usability problem, users should be able to back up their account information with the messaging provider. For privacy, this backup should be encrypted and the...
HAWKEYE – Recovering Symmetric Cryptography From Hardware Circuits
Gregor Leander, Christof Paar, Julian Speith, Lukas Stennes
Implementation
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
On the parallelization of square-root Vélu's formulas
Jorge Chávez-Saab, Odalis Ortega, Amalia Pizarro-Madariaga
Implementation
A primary challenge in isogeny-based cryptography lies in the substantial computational cost associated to computing and evaluating prime-degree isogenies. This computation traditionally relied on Vélu's formulas, an approach with time complexity linear in the degree but which was further enhanced by Bernstein, De Feo, Leroux, and Smith to a square-root complexity. The improved square-root Vélu's formulas exhibit a degree of parallelizability that has not been exploited in major...
INDIANA - Verifying (Random) Probing Security through Indistinguishability Analysis
Christof Beierle, Jakob Feldtkeller, Anna Guinet, Tim Güneysu, Gregor Leander, Jan Richter-Brockmann, Pascal Sasdrich
Implementation
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process.
This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
KHAN Encryption Algorithm: Leveraging Full Reptend Primes
Ayaz Khan
Implementation
The Keyed Hashing and Asymmetric Nonce (KHAN) encryption algorithm is a novel cryptographic scheme that utilizes the unique properties of full reptend prime numbers. This paper details the algorithm, its theoretical foundations, and the rigorous proofs of its security properties. By leveraging the characteristics of cyclic sequences derived from full reptend primes, KHAN provides robust encryption with high resistance to cryptanalytic attacks.
Elliptic Curve Cryptography for the masses: Simple and fast finite field arithmetic
Michael Scott
Implementation
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
The Ouroboros of ZK: Why Verifying the Verifier Unlocks Longer-Term ZK Innovation
Denis Firsov, Benjamin Livshits
Implementation
Verifying the verifier in the context of zero-knowledge proof is an essential part of ensuring the long-term integrity of the zero-knowledge ecosystem. This is vital for both zero-knowledge rollups and also other industrial applications of ZK. In addition to further minimizing the required trust and reducing the trusted computing base (TCB), having a verified verifier opens the door to decentralized proof generation by potentially untrusted parties. We outline a research program and justify...
Formal Definition and Verification for Combined Random Fault and Random Probing Security
Sonia Belaid, Jakob Feldtkeller, Tim Güneysu, Anna Guinet, Jan Richter-Brockmann, Matthieu Rivain, Pascal Sasdrich, Abdul Rahman Taleb
Implementation
In our highly digitalized world, an adversary is not constrained to purely digital attacks but can monitor or influence the physical execution environment of a target computing device. Such side-channel or fault-injection analysis poses a significant threat to otherwise secure cryptographic implementations. Hence, it is important to consider additional adversarial capabilities when analyzing the security of cryptographic implementations besides the default black-box model. For side-channel...
Efficient Second-Order Masked Software Implementations of Ascon in Theory and Practice
Barbara Gigerl, Florian Mendel, Martin Schläffer, Robert Primas
Implementation
In this paper, we present efficient protected software implementations of the authenticated cipher Ascon, the recently announced winner of the NIST standardization process for lightweight cryptography.
Our implementations target theoretical and practical security against second-order power analysis attacks.
First, we propose an efficient second-order extension of a previously presented first-order masking of the Keccak S-box that does not require online randomness.
The extension...
Speeding Up Multi-Scalar Multiplications for Pairing-Based zkSNARKs
Xinxin Fan, Veronika Kuchta, Francesco Sica, Lei Xu
Implementation
Multi-scalar multiplication (MSM) is one of the core components of many zero-knowledge proof systems, and a primary performance bottleneck for proof generation in these schemes. One major strategy to accelerate MSM is utilizing precomputation. Several algorithms (e.g., Pippenger and BGMW) and their variants have been proposed in this direction. In this paper, we revisit the recent precomputation-based MSM calculation method proposed by Luo, Fu and Gong at CHES 2023 and generalize their...
Scaling Lattice Sieves across Multiple Machines
Martin R. Albrecht, Joe Rowell
Implementation
Lattice sieves are algorithms for finding short vectors in lattices. We present an implementation of two such sieves – known as “BGJ1” and “BDGL” in the literature – that scales across multiple servers (with varying success). This class of algorithms requires exponential memory which had put into question their ability to scale across sieving nodes. We discuss our architecture and optimisations and report experimental evidence of the efficiency of our approach.
An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator
Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara
Implementation
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
The National Institute for Standards and Technology (NIST) initiated a standardization procedure for additional digital signatures and recently announced round-2 candidates for the PQ additional digital signature schemes. The multivariate digital signature scheme Unbalanced Oil and Vinegar (UOV) is one of the oldest post-quantum schemes and has been selected by NIST for Round 2. Although UOV is mathematically secure, several side-channel attacks (SCA) have been shown on the UOV or UOV-based...
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables one party to prove the validity of a statement to other parties without disclosing any secret information. With its widespread adoption in applications such as blockchain and verifiable machine learning, the demand for generating zero-knowledge proofs has increased dramatically. In recent years, considerable efforts have been directed toward developing GPU-accelerated systems for proof generation. However, these previous...
The hardness of lattice problems offers one of the most promising security foundations for quantum-safe cryptography. Basic schemes for public key encryption and digital signatures are already close to standardization at NIST and several other standardization bodies, and the research frontier has moved on to building primitives with more advanced privacy features. At the core of many such primi- tives are zero-knowledge proofs. In recent years, zero-knowledge proofs for (and using)...
Scalability remains a key challenge for blockchain adoption. Rollups—especially zero-knowledge (ZK) and optimistic rollups—address this by processing transactions off-chain while maintaining Ethereum’s security, thus reducing gas fees and improving speeds. Cross-rollup bridges like Orbiter Finance enable seamless asset transfers across various Layer 2 (L2) rollups and between L2 and Layer 1 (L1) chains. However, the increasing reliance on these bridges raises significant security concerns,...
In this work, we propose the first hardware implementation of Classic McEliece protected with countermeasures against Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA). Classic Mceliece is one of the leading candidates for Key Encapsulation Mechanisms (KEMs) in the ongoing round 4 of the NIST standardization process for post-quantum cryptography. In particular, we implement a range of generic countermeasures against SCA and FIA, particularly protected the vulnerable operations...
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
The Supersingular Isogeny Diffie-Hellman (SIDH) scheme is a public key cryptosystem that was submitted to the National Institute of Standards and Technology's competition for the standardization of post-quantum cryptography protocols. The private key in SIDH consists of an isogeny whose degree is a prime power. In July 2022, Castryck and Decru discovered an attack that completely breaks the scheme by recovering Bob's secret key, using isogenies between higher dimensional abelian varieties to...
We explore the use of microbenchmarks, small assembly code snippets, to detect microarchitectural side-channel leakage in CPU implementations. Specifically, we investigate the effectiveness of microbenchmarks in diagnosing the predisposition to side-channel leaks in two commonly used RISC-V cores: Picorv32 and Ibex. We propose a new framework that involves diagnosing side-channel leaks, identifying leakage points, and constructing leakage profiles to understand the underlying causes. We...
Digital signature schemes based on multivariate- and code-based hard problems are promising alternatives for lattice-based signature schemes, due to their smaller signature size. Hence, several candidates in the ongoing additional standardization for quantum secure digital signature (DS) schemes by the National Institute of Standards and Technology (NIST) rely on such alternate hard problems. Gaussian Elimination (GE) is a critical component in the signing procedure of these schemes. In this...
Bitcoin, while being the most prominent blockchain with the largest market capitalization, suffers from scalability and throughput limitations that impede the development of ecosystem projects like Bitcoin Decentralized Finance (BTCFi). Recent advancements in BitVM propose a promising Layer 2 (L2) solution to enhance Bitcoin's scalability by enabling complex computations off-chain with on-chain verification. However, Bitcoin's constrained programming environment—characterized by its...
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
We provide explicit descriptions for radical 2-isogenies in dimensions one, two and three using theta coordinates. These formulas allow us to efficiently navigate in the corresponding isogeny graphs. As an application of this, we implement different versions of the CGL hash func- tion. Notably, the three-dimensional version is fastest, which demonstrates yet another potential of using higher dimensional isogeny graphs in cryptography.
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
There are two security notions for FHE schemes the traditional notion of IND-CPA, and a more stringent notion of IND-CPA$^D$. The notions are equivalent if the FHE schemes are perfectly correct, however for schemes with negligible failure probability the FHE parameters needed to obtain IND-CPA$^D$ security can be much larger than those needed to obtain IND-CPA security. This paper uses the notion of ciphertext drift in order to understand the practical difference between IND-CPA and...
With the recent standardization of post-quantum cryptographic algorithms, research efforts have largely remained centered on public key exchange and encryption schemes. Argument systems, which allow a party to efficiently argue the correctness of a computation, have received comparatively little attention regarding their quantum-resilient design. These computational integrity frameworks often rely on cryptographic assumptions, such as pairings or group operations, which are vulnerable to...
Homomorphic Encryption (HE) enables operations on encrypted data without requiring decryption, thus allowing for secure handling of confidential data within smart contracts. Among the known HE schemes, FHEW and TFHE are particularly notable for use in smart contracts due to their lightweight nature and support for arbitrary logical gates. In contrast, other HE schemes often require several gigabytes of keys and are limited to supporting only addition and multiplication. As a result, there...
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
This article discusses fully homomorphic encryption and homomorphic sorting. Homomorphic encryption is a special encryption technique that allows all kinds of operations to be performed on ciphertext, and the result is still decryptable, such that when decrypted, the result is the same as that obtained by performing the same operation on the plaintext. Homomorphic sorting is an important problem in homomorphic encryption. Currently, there has been a volume of work on homomorphic sorting. In...
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
The Ducas-Micciancio (DM/FHEW) and Chilotti-Gama-Georgieva-Izabachène (CGGI/TFHE) cryptosystems provide a general privacy-preserving computation capability. These fully homomorphic encryption (FHE) cryptosystems can evaluate an arbitrary function expressed as a general look-up table (LUT) via the method of functional bootstrapping (also known as programmable bootstrapping). The main limitation of DM/CGGI functional bootstrapping is its efficiency because this procedure has to bootstrap every...
We develop and implement AlgoROM, a tool to systematically analyze the security of a wide class of symmetric primitives in idealized models of computation. The schemes that we consider are those that can be expressed over an alphabet consisting of XOR and function symbols for hash functions, permutations, or block ciphers. We implement our framework in OCaml and apply it to a number of prominent constructions, which include the Luby–Rackoff (LR), key-alternating Feistel (KAF), and...
In modern business to customer interactions, handling private or confidential data is essential. Private Function Evaluation (PFE) protocols ensure the privacy of both the customers' input data and the business' function evaluated on it which is often sensitive intellectual property (IP). However, fully hiding the function in PFE results in high performance overhead. Semi-Private Function Evaluation (SPFE) is a generalization of PFE to only partially hide the function, whereas specific...
Masking is a sound countermeasure to protect against differential power analysis. Since the work by Balasch et al. in ASIACRYPT 2012, inner product masking has been explored as an alternative to the well known Boolean masking. In CARDIS 2017, Poussier et al. showed that inner product masking achieves higher-order security versus Boolean masking, for the same shared size, in the bit-probing model. Wang et al. in TCHES 2020 verified the inner product masking's security order amplification in...
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
With the standardization of NIST post-quantum cryptographic (PQC) schemes, optimizing these PQC schemes across various platforms presents significant research value. While most existing software implementation efforts have concentrated on ARM platforms, research on PQC implementations utilizing various RISC-V instruction set architectures (ISAs) remains limited. In light of this gap, this paper proposes comprehensive and efficient optimizations of Keccak, Kyber, and Dilithium on...
Groth16 is a pairing-based zero-knowledge proof scheme that has a constant proof size and an efficient verification algorithm. Bitcoin Script is a stack-based low-level programming language that is used to lock and unlock bitcoins. In this paper, we present a practical implementation of the Groth16 verifier in Bitcoin Script deployable on the mainnet of a Bitcoin blockchain called BSV. Our result paves the way for a framework of verifiable computation on Bitcoin: a Groth16 proof is generated...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
Digital signature is a fundamental cryptographic primitive and is widely used in the real world. Unfortunately, the current digital signature standards like EC-DSA and RSA are not quantum-resistant. Among post-quantum cryptography (PQC), isogeny-based signatures preserve some advantages of elliptic curve cryptosystems, particularly offering small signature sizes. Currently, SQIsign and its variants are the most promising isogeny-based digital signature schemes. In this paper, we propose a...
This paper introduces a ZKP (zero-knowledge proof) based state update system, where each block contains a SNARK proof aggregated from the user generated zkVM (zero knowledge virtual machine) proofs. It enables users to generate state update proofs in their local machines, contributing to a secure, decentralized verification process. Our main contribution in this paper, the recursive proofs system, addresses scalability by recursively verifying user proofs and aggregating them in a...
Kyber was selected by NIST as a Post-Quantum Cryptography Key Encapsulation Mechanism standard. This means that the industry now needs to transition and adopt these new standards. One of the most demanding operations in Kyber is the modular arithmetic, making it a suitable target for optimization. This work offers a novel modular reduction design with the lowest area on Xilinx FPGA platforms. This novel design, through K-reduction and LUT-based reduction, utilizes 49 LUTs and 1 DSP...
Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees afforded by classical signature schemes. CRYSTALS-Dilithium is an efficient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a...
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios. This library is protected against various passive and...
In times of digitalization, the encryption and signing of sensitive data is becoming increasingly important. These cryptographic processes require large quantities of high-quality random numbers. Which is why a high-performance random number generator (RNG) is to be developed. For this purpose, existing concepts of RNGs and application standards are first analyzed. The proposed approach is to design a physical true random number generator (PTRNG) with a high output of random numbers....
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
This article addresses the issue of efficient and safe (de)compression of $\mathbb{F}_{\!q}$-points on an elliptic curve $E$ over a highly $2$-adic finite field $\mathbb{F}_{\!q}$ of characteristic $5$ or greater. The given issue was overlooked by cryptography experts, probably because, until recently, such fields were not in trend. Therefore, there was no difficulty (with rare exceptions) in finding a square $\mathbb{F}_{\!q}$-root. However, in our days, fields with large $2$-adicities have...
From 2022, Korean Post-Quantum Cryptography (KpqC) Competition has been held. Among the Round 1 algorithms of KpqC, eight algorithms were selected in December 2023. To evaluate the algorithms, the performance is critical factor. However, the performance of the algorithms submitted to KpqC was evaluated in different development environments. Consequently, it is difficult to compare the performance of each algorithm fairly, because the measurements were not conducted in the identical...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
This paper presents a collection of lessons learned from analyzing the real-world security of various Byzantine Fault Tolerant (BFT) consensus protocol implementations. Drawing upon our experience as a team of security experts who have both developed and audited BFT systems, including BA★, HotStuff variants, Paxos variants, and DAG-based algorithms like Narwhal and Bullshark, we identify and analyze a variety of security vulnerabilities discovered in the translation of theoretical protocols...
Private Set Intersection (PSI) enables a sender and a receiver to jointly compute the intersection of their sets without disclosing other information about items not in the intersection. However, in many cases of joint data analysis, it is not just the items outside the intersection that are sensitive but the items within it. To protect such sensitive information, prior work presents a Differentially Private version of PSI (DPSI) based on a circuit-PSI using Fully Homomorphic Encryption....
A crucial component of any zero-knowledge system is operations with finite fields. This, in turn, leads to the implementation of the fundamental operation: multiplying two big integers. In the realm of Bitcoin, this problem gets revisited, as Bitcoin utilizes its own stack-based and not Turing-complete scripting system called Bitcoin Script. Inspired by Elliptic Curve scalar multiplication, this paper introduces the $w$-windowed method for multiplying two numbers. We outperform...
For more than two decades, pairings have been a fundamental tool for designing elegant cryptosystems, varying from digital signature schemes to more complex privacy-preserving constructions. However, the advancement of quantum computing threatens to undermine public-key cryptography. Concretely, it is widely accepted that a future large-scale quantum computer would be capable to break any public-key cryptosystem used today, rendering today's public-key cryptography obsolete and mandating the...
The progression of quantum computing is considered a potential threat to traditional cryptography system, highlighting the significance of post-quantum security in cryptographic systems. Regarding symmetric key encryption, the Grover algorithm can approximately halve the search complexity. Despite the absence of fully operational quantum computers at present, the necessity of assessing the security of symmetric key encryption against quantum computing continues to grow. In this paper, we...
Quantum computers can model and solve several problems that have posed challenges for classical super computers, leveraging their natural quantum mechanical characteristics. A large-scale quantum computer is poised to significantly reduce security strength in cryptography. In this context, extensive research has been conducted on quantum cryptanalysis. In this paper, we present optimized quantum circuits for Korean block ciphers, HIGHT and LEA. Our quantum circuits for HIGHT and LEA...
We present an efficient layered circuit design for SHA3-256 Merkle tree verification, suitable for a GKR proof system, that achieves logarithmic verification and proof size. We demonstrate how to compute the predicate functions for our circuit in $O(\log n)$ time to ensure logarithmic verification and provide GKR benchmarks for our circuit.
This article presents an innovative project for a Central Bank Digital Currency (CBDC) infrastructure. Focusing on security and reliability, the proposed architecture: (1) employs post-quantum cryptography (PQC) algorithms for long-term security, even against attackers with access to cryptographically-relevant quantum computers; (2) can be integrated with a Trusted Execution Environment (TEE) to safeguard the confidentiality of transaction contents as they are processed by third-parties; and...
Functional bootstrapping in FHE schemes such as FHEW and TFHE allows the evaluation of a function on an encrypted message, in addition to noise reduction. Implementing programs that directly use functional bootstrapping is challenging and error-prone. In this paper, we propose a heuristic that automatically maps Boolean circuits to functional bootstrapping instructions. Unlike other approaches, our method does not limit the encrypted data plaintext space to a power-of-two size, allowing...
In recent years, formal verification has emerged as a crucial method for assessing security against Side-Channel attacks of masked implementations, owing to its remarkable versatility and high degree of automation. However, formal verification still faces technical bottlenecks in balancing accuracy and efficiency, thereby limiting its scalability. Former tools like maskVerif and CocoAlma are very efficient but they face accuracy issues when verifying schemes that utilize properties of...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
For many pairing-based cryptographic protocols such as Direct Anonymous Attestation (DAA) schemes, the arithmetic on the first pairing subgroup $\mathbb{G}_1$ is more fundamental. Such operations heavily depend on the sizes of prime fields. At the 192-bit security level, Gasnier and Guillevic presented a curve named GG22D7-457 with CM-discriminant $D = 7$ and embedding degree $k = 22$. Compared to other well-known pairing-friendly curves at the same security level, the curve GG22D7-457 has...
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Dimension 4 isogenies have first been introduced in cryptography for the cryptanalysis of Supersingular Isogeny Diffie-Hellman (SIDH) and have been used constructively in several schemes, including SQIsignHD, a derivative of SQIsign isogeny based signature scheme. Unlike in dimensions 2 and 3, we can no longer rely on the Jacobian model and its derivatives to compute isogenies. In dimension 4 (and higher), we can only use theta-models. Previous works by Romain Cosset, David Lubicz and Damien...
In this work, we introduce enhanced high-order masking techniques tailored for Dilithium, the post-quantum signature scheme recently standardized by NIST. We improve the masked generation of the masking vector $\vec{y}$, based on a fast Boolean-to-arithmetic conversion modulo $q$. We also describe an optimized gadget for the high-order masked rejection sampling, with a complexity independent from the size of the modulus $q$. We prove the security of our gadgets in the classical ISW...
The authors of Jolt [AST24] pioneered a unique method for creating zero-knowledge virtual machines, known as the lookup singularity. This technique extensively uses lookup tables to create virtual machine circuits. Despite Jolt’s performance being twice as efficient as the previous state-of-the-art1 , there is potential for further enhancement. The initial release of Jolt uses Spartan [Set20] and Hyrax [WTs+ 18] as their backend, leading to two constraints. First, Hyrax employs Pedersen...
In 2018, Mouha et al. (IEEE Trans. Reliability, 2018) performed a post-mortem investigation of the correctness of reference implementations submitted to the SHA3 competition run by NIST, finding previously unidentified bugs in a significant portion of them, including two of the five finalists. Their innovative approach allowed them to identify the presence of such bugs in a black-box manner, by searching for counterexamples to expected cryptographic properties of the implementations under...
The use of Internet of Things (IoT) devices in embedded systems has become increasingly popular with advancing technologies. These devices become vulnerable to cyber attacks as they gain popularity. The cryptographic operations performed for the purpose of protection against cyber attacks are crucial to yield fast results in open networks and not slow down network traffic. Therefore, to enhance communication security, studies have been conducted in the literature on using asymmetric...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Measuring the fluctuations of the clock phase of a target was identified as a leakage source on early electromagnetic side-channel investigations. Despite this, only recently was directly measuring the clock phase (or jitter) of digital signals from a target connected to being a source of exploitable leakage. As the phase of a clock output will be related to signal propagation delay through the target, and this propagation delay is related to voltage, this means that most digital devices...
Anticipating the advent of large quantum computers, NIST started a worldwide competition in 2016 aiming to define the next cryptographic standards. HQC is one of these post-quantum schemes still in contention, with three others already standardized. In 2022, Guo et al. introduced a timing attack that exploited an inconsistency in HQC rejection sampling function to recover its secret key in 866,000 calls to an oracle. The authors of HQC updated its specification by applying an algorithm to...
Homomorphic encryption is a powerful technology that solves key privacy concerns in cloud computing by enabling computation on encrypted data. However, it has not seen widespread adoption due to prohibitively high latencies. In this article, we identify polynomial multiplication as a bottleneck and investigate alternative algorithms to accelerate encrypted computing.
Cryptography implementations of block cipher have been written in C language due to its strong features on system-friendly features. However, the C language is prone to memory safety issues, such as buffer overflows and memory leaks. On the other hand, Rust, novel system programming language, provides strict compile-time memory safety guarantees through its ownership model. This paper presents the implementation of LEA block cipher in Rust language, demonstrating features to prevent...
As quantum computing progresses, the assessment of cryptographic algorithm resilience against quantum attack gains significance interests in the field of cryptanalysis. Consequently, this paper implements the depth-optimized quantum circuit of Korean hash function (i.e., LSH) and estimates its quantum attack cost in quantum circuits. By utilizing an optimized quantum adder and employing parallelization techniques, the proposed quantum circuit achieves a 78.8\% improvement in full depth and a...
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
It is fundamental that executing cryptographic software must not leak secrets through side-channels. For software-visible side-channels, it was long believed that "constant-time" programming would be sufficient as a systematic countermeasure. However, this belief was shattered in 2018 by attacks exploiting speculative execution—so called Spectre attacks. Recent work shows that language support suffices to protect cryptographic code with minimal overhead against one class of such attacks,...
The Jacobi Symbol is an essential primitive in cryptographic applications such as primality testing, integer factorization, and various encryption schemes. By exploring the interdependencies among modular reductions within the algorithmic loop, we have developed a refined method that significantly enhances computational efficiency. Our optimized algorithm, implemented in the Rust language, achieves a performance increase of 72% over conventional textbook methods and is twice as fast as the...
This paper presents KyberSlash1 and KyberSlash2 – two timing vulnerabilities in several implementations (including the official reference code) of the Kyber Post-Quantum Key Encapsulation Mechanism, currently undergoing standardization as ML-KEM. We demonstrate the exploitability of both KyberSlash1 and KyberSlash2 on two popular platforms: the Raspberry Pi 2 (Arm Cortex-A7) and the Arm Cortex-M4 microprocessor. Kyber secret keys are reliably recovered within minutes for KyberSlash2 and a...
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...
While passive side-channel attacks and active fault attacks have been studied intensively in the last few decades, strong attackers combining these attacks have only been studied relatively recently. Due to its simplicity, most countermeasures against passive attacks are based on additive sharing. Unfortunately, extending these countermeasures against faults often leads to quite a significant performance penalty, either due to the use of expensive cryptographic operations or a large number...
Since the advent of pairing based cryptography, many researchers have developed several techniques and variants of pairings to optimise the speed of pairing computations. The selection of the elliptic curve for a given pairing based protocol is crucial for operations in the first and second pairing groups of points of the elliptic curve and for many cryptographic schemes. A new variant of superoptimal pairing was proposed in 2023, namely x-superoptimal pairing on curves with odd prime...
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
In this paper, we revisit the algorithm for computing chains of $(2, 2)$-isogenies between products of elliptic curves via theta coordinates proposed by Dartois et al. For each fundamental block of this algorithm, we provide a explicit inversion-free version. Besides, we exploit a novel technique of $x$-only ladder to speed up the computation of gluing isogeny. Finally, we present a mixed optimal strategy, which combines the inversion-elimination tool with the original methods together...
Masking is a widely adopted countermeasure against side-channel analysis (SCA) that protects cryptographic implementations from information leakage. However, current masking schemes often incur significant overhead in terms of electronic cost. RAMBAM, a recently proposed masking technique that fits elegantly with the AES algorithm, offers ultra-low latency/area by utilizing redundant representations of finite field elements. This paper presents a comprehensive generalization of RAMBAM and...
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
There has been significant progress over the past seven years in model reverse engineering (RE) for neural network (NN) hardware. Although there has been systematization of knowledge (SoK) in an overall sense, however, the treatment from the hardware perspective has been far from adequate. To bridge this gap, this paper systematically categorizes the types of NN hardware used prevalently by the industry/academia, and also the model RE attacks/defenses published in each category. Further, we...
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
As blockchain technology continues to transform the realm of digital transactions, scalability has emerged as a critical issue. This challenge has spurred the creation of innovative solutions, particularly Layer 2 scalability techniques like rollups. Among these, ZK-Rollups are notable for employing Zero-Knowledge Proofs to facilitate prompt on-chain transaction verification, thereby improving scalability and efficiency without sacrificing security. Nevertheless, the intrinsic complexity of...
End-to-end encrypted messaging applications ensure that an attacker cannot read a user's message history without their decryption keys. While this provides strong privacy, it creates a usability problem: if a user loses their devices and cannot access their decryption keys, they can no longer access their account. To solve this usability problem, users should be able to back up their account information with the messaging provider. For privacy, this backup should be encrypted and the...
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
A primary challenge in isogeny-based cryptography lies in the substantial computational cost associated to computing and evaluating prime-degree isogenies. This computation traditionally relied on Vélu's formulas, an approach with time complexity linear in the degree but which was further enhanced by Bernstein, De Feo, Leroux, and Smith to a square-root complexity. The improved square-root Vélu's formulas exhibit a degree of parallelizability that has not been exploited in major...
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process. This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
The Keyed Hashing and Asymmetric Nonce (KHAN) encryption algorithm is a novel cryptographic scheme that utilizes the unique properties of full reptend prime numbers. This paper details the algorithm, its theoretical foundations, and the rigorous proofs of its security properties. By leveraging the characteristics of cyclic sequences derived from full reptend primes, KHAN provides robust encryption with high resistance to cryptanalytic attacks.
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
Verifying the verifier in the context of zero-knowledge proof is an essential part of ensuring the long-term integrity of the zero-knowledge ecosystem. This is vital for both zero-knowledge rollups and also other industrial applications of ZK. In addition to further minimizing the required trust and reducing the trusted computing base (TCB), having a verified verifier opens the door to decentralized proof generation by potentially untrusted parties. We outline a research program and justify...
In our highly digitalized world, an adversary is not constrained to purely digital attacks but can monitor or influence the physical execution environment of a target computing device. Such side-channel or fault-injection analysis poses a significant threat to otherwise secure cryptographic implementations. Hence, it is important to consider additional adversarial capabilities when analyzing the security of cryptographic implementations besides the default black-box model. For side-channel...
In this paper, we present efficient protected software implementations of the authenticated cipher Ascon, the recently announced winner of the NIST standardization process for lightweight cryptography. Our implementations target theoretical and practical security against second-order power analysis attacks. First, we propose an efficient second-order extension of a previously presented first-order masking of the Keccak S-box that does not require online randomness. The extension...
Multi-scalar multiplication (MSM) is one of the core components of many zero-knowledge proof systems, and a primary performance bottleneck for proof generation in these schemes. One major strategy to accelerate MSM is utilizing precomputation. Several algorithms (e.g., Pippenger and BGMW) and their variants have been proposed in this direction. In this paper, we revisit the recent precomputation-based MSM calculation method proposed by Luo, Fu and Gong at CHES 2023 and generalize their...
Lattice sieves are algorithms for finding short vectors in lattices. We present an implementation of two such sieves – known as “BGJ1” and “BDGL” in the literature – that scales across multiple servers (with varying success). This class of algorithms requires exponential memory which had put into question their ability to scale across sieving nodes. We discuss our architecture and optimisations and report experimental evidence of the efficiency of our approach.
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...