143 results sorted by ID
Efficient Modular Multiplication Hardware for Number Theoretic Transform on FPGA
Tolun Tosun, Selim Kırbıyık, Emre Koçer, Erkay Savaş, Ersin Alaybeyoğlu
Implementation
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang
Implementation
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
OpenNTT: An Automated Toolchain for Compiling High-Performance NTT Accelerators in FHE
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications
Supriya Adhikary, Wai Kong Lee, Angshuman Karmakar, Yongwoo Lee, Seong Oun Hwang, Ramachandra Achar
Implementation
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai
Implementation
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Implementation
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
Xinyi Ji, Jiankuo Dong, Junhao Huang, Zhijian Yuan, Wangchen Dai, Fu Xiao, Jingqiang Lin
Implementation
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Hardware Implementation and Security Analysis of Local-Masked NTT for CRYSTALS-Kyber
Rafael Carrera Rodriguez, Emanuele Valea, Florent Bruguier, Pascal Benoit
Implementation
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic
Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
Approximate CRT-Based Gadget Decomposition and Application to TFHE Blind Rotation
Olivier Bernard, Marc Joye
Implementation
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
A Fault-Resistant NTT by Polynomial Evaluation and Interpolation
Sven Bauer, Fabrizio De Santis, Kristjane Koleci, Anita Aghaie
In computer arithmetic operations, the Number Theoretic
Transform (NTT) plays a significant role in the efficient implementation
of cyclic and nega-cyclic convolutions with the application of multiplying
large integers and large degree polynomials. Multiplying polynomials is
a common operation in lattice-based cryptography. Hence, the NTT is a
core component of several lattice-based cryptographic algorithms. Two
well-known examples are the key encapsulation mechanism Kyber and
the...
A Complete Beginner Guide to the Number Theoretic Transform (NTT)
Ardianto Satriawan, Rella Mareta, Hanho Lee
Foundations
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
Number-Theoretic Transform Architecture for Fully Homomorphic Encryption from Hypercube Topology
Jingwei Hu, Yuhong Fang, Wangchen Dai
Implementation
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes.
The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration
Andrey Kim, Ahmet Can Mert, Anisha Mukherjee, Aikata Aikata, Maxim Deryabin, Sunmin Kwon, HyungChul Kang, Sujoy Sinha Roy
Implementation
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation.
We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
2024/208
Last updated: 2024-05-08
Asymmetric Cryptography from Number Theoretic Transformations
Samuel Lavery
Public-key cryptography
In this work, we introduce a family of asymmetric cryptographic functions based on dynamic number theoretic transformations with multiple rounds of modular arithmetic to enhance diffusion and difficulty of inversion. This function acts as a basic cryptographic building block for a novel communication-efficient zero-knowledge crypto-system. The system as defined exhibits partial homomorphism and behaves as an additive positive accumulator. By using a novel technique to constructively embed...
A Novel Power Analysis Attack against CRYSTALS-Dilithium Implementation
Yong Liu, Yuejun Liu, Yongbin Zhou, Yiwen Gao, Zehua Qiao, Huaxin Wang
Attacks and cryptanalysis
Post-Quantum Cryptography (PQC) was proposed due to the potential threats quantum computer attacks against conventional public key cryptosystems, and four PQC algorithms besides CRYSTALS-Dilithium (Dilithium for short) have so far been selected for NIST standardization. However, the selected algorithms are still vulnerable to side-channel attacks in practice, and their physical security need to be further evaluated.
This study introduces two efficient power analysis attacks, the optimized...
2024/079
Last updated: 2024-01-23
On Modular Algorithms and Butterfly Operations in Number Theoretic Transform
Yanze Yang, Yiran Jia, Guangwu Xu
Implementation
Number theoretic transform (NTT) has been a very useful tool in computations for number theory, algebra and cryptography.
Its performance affects some post-quantum
cryptosystems. In this paper, we discuss the butterfly operation of NTT. This basic module of NTT requires heavy modular
arithmetics. Montgomery reduction is commonly used in this setting. Recently several variants of Montgomery have been proposed
for the purpose of speeding up NTT. We observe that the Chinese remainder...
Hints from Hertz: Dynamic Frequency Scaling Side-Channel Analysis of Number Theoretic Transform in Lattice-Based KEMs
Tianrun Yu, Chi Cheng, Zilong Yang, Yingchen Wang, Yanbin Pan, Jian Weng
Attacks and cryptanalysis
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting NTT because it is usually the most time-consuming part of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that...
Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs
Xudong Zhu, Haoqi He, Zhengbang Yang, Yi Deng, Lutan Zhao, Rui Hou
Implementation
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...
Barrett Multiplication for Dilithium on Embedded Devices
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
The Splitting Field of $Y^n-2$, Two-Variable NTT and Lattice-Based Cryptography
Wenzhe Yang
Foundations
The splitting field $F$ of the polynomial $Y^n-2$ is an extension over $\mathbb{Q}$ generated by $\zeta_n=\exp(2 \pi \sqrt{-1} /n)$ and $\sqrt[n]{2}$. In this paper, we lay the foundation for applying the Order-LWE in the integral ring $\mathcal{R}=\mathbb{Z}[\zeta_n, \sqrt[n]{2}]$ to cryptographic uses when $n$ is a power-of-two integer. We explicitly compute the Galois group $\text{Gal}\left(F/\mathbb{Q} \right)$ and the canonical embedding of $F$, based on which we study the properties of...
When NTT Meets SIS: Efficient Side-channel Attacks on Dilithium and Kyber
Zehua Qiao, Yuejun Liu, Yongbin Zhou, Mingyao Shao, Shuo Sun
Attacks and cryptanalysis
In 2022, NIST selected Kyber and Dilithium as post-quantum cryptographic standard algorithms. The Number Theoretic Transformation (NTT) algorithm, which facilitates polynomial multiplication, has become a primary target for side-channel attacks. In this work, we embed the NTT transformation matrix in Dilithium and Kyber into the SIS search problem, and further, we propose a divide and conquer strategy for dimensionality reduction of the SIS problem by utilizing the properties of NTT, and...
Accelerating Polynomial Multiplication for RLWE using Pipelined FFT
Neil Thanawala, Hamid Nejatollahi, Nikil Dutt
Implementation
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of
post quantum cryptography (PQC). Polynomial multiplication is the core of
Ring Learning with Error (RLWE) lattice based cryptography (LBC) which
is one of the most promising PQC candidates. In this work, we present the
design of fast and energy-efficient pipelined Number...
The NTT and residues of a polynomial modulo factors of $X^{2^d} + 1$
Sahil Sharma
Implementation
The Number Theoretic Transform (NTT) plays a central role in efficient implementations of cryptographic primitives selected for Post Quantum Cryptography. Although it certainly exists, academic papers that cite the NTT omit the connection between the NTT and residues of a polynomial modulo factors of $X^{2^d} + 1$ and mention only the final expressions of what the NTT computes. This short paper establishes that connection and, in doing so, elucidates key aspects of computing the NTT. Based...
A Lattice Attack on CRYSTALS-Kyber with Correlation Power Analysis
Yen-Ting Kuo, Atsushi Takayasu
Attacks and cryptanalysis
CRYSTALS-Kyber is a key-encapsulation mechanism, whose security is based on the hardness of solving the learning-with-errors (LWE) problem over module lattices. As in its specification, Kyber prescribes the usage of the Number Theoretic Transform (NTT) for efficient polynomial multiplication. Side-channel assisted attacks against Post-Quantum Cryptography (PQC) algorithms like Kyber remain a concern in the ongoing standardization process of quantum-computer-resistant cryptosystems. Among the...
Designing Efficient and Flexible NTT Accelerators
Ahmet MALAL
Implementation
The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST's Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement,...
TMVP-based Polynomial Convolution for Saber and Sable on GPU using CUDA-cores and Tensor-cores
Muhammad Asfand Hafeez, Wai-Kong Lee, Angshuman Karmakar, Seong Oun Hwang
Implementation
Recently proposed lattice-based cryptography algorithms can be used to protect the IoT communication against the threat from quantum computers, but they are computationally heavy. In particular, polynomial multiplication is one of the most time-consuming operations in lattice-based cryptography. To achieve efficient implementation, the Number Theoretic Transform (NTT) algorithm is an ideal choice, but it has certain limitations on the parameters, which not all lattice-based schemes can...
PQ.V.ALU.E: Post-Quantum RISC-V Custom ALU Extensions on Dilithium and Kyber
Konstantina Miteloudi, Joppe Bos, Olivier Bronchain, Björn Fay, Joost Renes
Implementation
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices.
The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation.
This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Two Algorithms for Fast GPU Implementation of NTT
Ali Şah Özcan, Erkay Savaş
Implementation
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...
HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU
Xinyi Ji, Jiankuo Dong, Pinchang Zhang, Deng Tonggui, Hua Jiafeng, Fu Xiao
Implementation
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...
REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption
Aikata Aikata, Ahmet Can Mert, Sunmin Kwon, Maxim Deryabin, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with...
Leaking Secrets in Homomorphic Encryption with Side-Channel Attacks
Furkan Aydin, Aydin Aysu
Homomorphic encryption (HE) allows computing encrypted data in the ciphertext domain without knowing the encryption key. It is possible, however, to break fully homomorphic encryption (FHE) algorithms by using side channels. This article demonstrates side-channel leakages of the Microsoft SEAL HE library. The proposed attack can steal encryption keys during the key generation phase by abusing the leakage of ternary value assignments that occurs during the number theoretic transform (NTT)...
PQC Cloudization: Rapid Prototyping of Scalable NTT/INTT Architecture to Accelerate Kyber
Mojtaba Bisheh-Niasar, Daniel Lo, Anjana Parthasarathy, Blake Pelton, Bharat Pillilli, Bryan Kelly
Public-key cryptography
The advent of quantum computers poses a serious challenge to the security of cloud infrastructures and services, as they can potentially break the existing public-key cryptosystems, such as Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC). Even though the gap between today’s quantum computers and the threats they pose to current public-key cryptography is large, the cloud landscape should act proactively and initiate the transition to the post-quantum era as early as...
How to Recover a Secret with O(n) Additions
Benny Applebaum, Oded Nir, Benny Pinkas
Foundations
Threshold cryptography is typically based on the idea of secret-sharing a private-key $s\in F$ ``in the exponent'' of some cryptographic group $G$, or more generally, encoding $s$ in some linearly homomorphic domain. In each invocation of the threshold system (e.g., for signing or decrypting) an ``encoding'' of the secret is being recovered and so the complexity, measured as the number of group multiplications over $G$, is equal to the number of $F$-additions that are needed to reconstruct...
A Fast RLWE-Based IPFE Library and its Application to Privacy-Preserving Biometric Authentication
Supriya Adhikary, Angshuman Karmakar
Public-key cryptography
With the increased use of data and communication through the internet and the abundant misuse of personal data by many organizations, people are more sensitive about their privacy. Privacy-preserving computation is becoming increasingly important in this era. Functional encryption allows a user to evaluate a function on encrypted data without revealing sensitive information. Most implementations of functional encryption schemes are too time-consuming for practical use. Mera et al. first...
PIE: $p$-adic Encoding for High-Precision Arithmetic in Homomorphic Encryption
Luke Harmon, Gaetan Delavignette, Arnab Roy, David Silva
Cryptographic protocols
A large part of current research in homomorphic encryption (HE) aims towards making HE practical for real-world applications. In any practical HE, an important issue is to convert the application data (type) to the data type suitable for the HE.
The main purpose of this work is to investigate an efficient HE-compatible encoding method that is generic, and can be easily adapted to apply to the HE schemes over integers or polynomials.
$p$-adic number theory provides a way to transform...
Efficient Accelerator for NTT-based Polynomial Multiplication
Raziyeh Salarifard, Hadi Soleimany
Implementation
The Number Theoretic Transform (NTT) is used to efficiently execute polynomial multiplication. It has become an important part of lattice-based post-quantum methods and the subsequent generation of standard cryptographic systems. However, implementing post-quantum schemes is challenging
since they rely on intricate structures. This paper demonstrates how to develop a high-speed NTT multiplier highly optimized
for FPGAs with few logical resources. We describe a novel architecture for NTT...
A 334µW 0.158mm2 ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Extended Version
Archisman Ghosh, Jose Maria Bermudo Mera, Angshuman Karmakar, Debayan Das, Santosh Ghosh, Ingrid Verbauwhede, Shreyas Sen
Public-key cryptography
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \&...
Breaking DPA-protected Kyber via the pair-pointwise multiplication
Estuardo Alpirez Bock, Gustavo Banegas, Chris Brzuska, Łukasz Chmielewski, Kirthivaasan Puniamurthy, Milan Šorf
Attacks and cryptanalysis
We introduce a novel template attack for secret key recovery in Kyber, leveraging side-channel information from polynomial multiplication during decapsulation. Conceptually, our attack exploits that Kyber's incomplete number-theoretic transform (NTT) causes each secret coefficient to be used multiple times, unlike when performing a complete NTT.
Our attack is a single trace \emph{known} ciphertext attack that avoids machine-learning techniques and instead relies on correlation-matching...
Algorithmic Views of Vectorized Polynomial Multipliers for NTRU and NTRU Prime (Long Paper)
Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, Chi-Ting Liu, Bo-Yin Yang
Implementation
This paper explores the design space of vector-optimized polynomial multiplications in the lattice-based key-encapsulation mechanisms NTRU and NTRU Prime. Since NTRU and NTRU Prime do not support straightforward applications of number– theoretic transforms, the state-of-the-art vector code either resorted to Toom–Cook, or introduced various techniques for coefficient ring extensions. All these techniques lead to a large number of small-degree polynomial multiplications, which is the...
Kavach: Lightweight masking techniques for polynomial arithmetic in lattice-based cryptography
Aikata Aikata, Andrea Basso, Gaetan Cassiers, Ahmet Can Mert, Sujoy Sinha Roy
Public-key cryptography
Lattice-based cryptography has laid the foundation of various modern-day cryptosystems that cater to several applications, including post-quantum cryptography. For structured lattice-based schemes, polynomial arithmetic is a fundamental part. In several instances, the performance optimizations come from implementing compact multipliers due to the small range of the secret polynomial coefficients. However, this optimization does not easily translate to side-channel protected implementations...
Towards A Correct-by-Construction FHE Model
Zhenkun Yang, Wen Wang, Jeremy Casas, Pasquale Cocchini, Jin Yang
Implementation
This paper presents a correct-by-construction method of designing an FHE model based on the automated program verifier Dafny. We model FHE operations from the ground up, including fundamentals like GCD, coprimality, Montgomery multiplications, and polynomial operations, etc., and higher level optimizations such as Residue Number System (RNS) and Number Theoretic Transform (NTT). The fully formally verified FHE model serves as a reference design for both software stack development and...
Proteus: A Pipelined NTT Architecture Generator
Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Number Theoretic Transform (NTT) is a fundamental building block in emerging cryptographic constructions like fully homomorphic encryption, post-quantum cryptography and zero-knowledge proof. In this work, we introduce Proteus, an open-source parametric hardware to generate pipelined architectures for the NTT. For a given parameter set including the polynomial degree and size of the coefficient modulus, Proteus can generate Radix-2 NTT architectures using Single-path Delay Feedback (SDF) and...
Faster Amortized FHEW bootstrapping using Ring Automorphisms
Gabrielle De Micheli, Duhyeong Kim, Daniele Micciancio, Adam Suhl
Public-key cryptography
Amortized bootstrapping offers a way to simultaneously refresh many ciphertexts of a fully homomorphic encryption scheme, at a total cost comparable to that of refreshing a single ciphertext. An amortization method for FHEW-style cryptosystems was first proposed by (Micciancio and Sorrell, ICALP 2018), who showed that the amortized cost of bootstrapping n FHEW-style ciphertexts can be reduced from $O(n)$ basic cryptographic operations to just $O(n^{\epsilon})$, for any constant...
Verification of the (1–δ)-Correctness Proof of CRYSTALS-KYBER with Number Theoretic Transform
Katharina Kreuzer
Public-key cryptography
This paper describes a formalization of the specification and the algorithm of the cryptographic scheme CRYSTALS-KYBER as well as the verification of its (1 − δ)-correctness proof. During the formalization, a problem in the correctness proof was uncovered. In order to amend this
issue, a necessary property on the modulus parameter of the CRYSTALS-KYBER algorithm was introduced. This property is already implicitly fulfilled by the structure of the modulus prime used in the number theoretic...
CycloneNTT: An NTT/FFT Architecture Using Quasi-Streaming of Large Datasets on DDR- and HBM-based FPGA Platforms
Kaveh Aasaraai, Emanuele Cesena, Rahul Maganti, Nicolas Stalder, Javier Varela, Kevin Bowers
Implementation
Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets ($\ge2^{24}$, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of...
cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption
Hao Yang, Shiyu Shen, Zhe Liu, Yunlei Zhao
Applications
Private comparison schemes constructed on homomorphic encryption offer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and float inputs, offers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...
Peregrine: Toward Fastest FALCON Based on GPV Framework
Eun-Young Seo, Young-Sik Kim, Joon-Woo Lee, Jong-Seon No
Public-key cryptography
FALCON and Crystals-Dilithium are the digital signatures algorithms selected as NIST PQC standards at the end of the third round. FALCON has the advantage of the shortest size of the combined public key and signature but has the disadvantage of the relatively long signing time. Since FALCON algorithm is faithfully designed based on theoretical security analysis, the implementation of the algorithms is quite complex and needs considerable complexity. In order to implement the FALCON...
Circuit Privacy for FHEW/TFHE-Style Fully Homomorphic Encryption in Practice
Kamil Kluczniak
Public-key cryptography
A fully homomorphic encryption (FHE) scheme allows a client to encrypt and delegate its data to a server that performs computation on the encrypted data that the client can then decrypt. While FHE gives confidentiality to clients' data, it does not protect the server's input and computation. Nevertheless, FHE schemes are still helpful in building delegation protocols that reduce communication complexity, as the ciphertext's size is independent of the size of the computation performed on...
Breaking and Protecting the Crystal: Side-Channel Analysis of Dilithium in Hardware
Hauke Steffen, Georg Land, Lucie Kogelheide, Tim Güneysu
Attacks and cryptanalysis
The lattice-based CRYSTALS-Dilithium signature scheme has been selected for standardization by the NIST. As part of the selection process, a large number of implementations for platforms like x86, ARM Cortex-M4, or – on the hardware side – Xilinx Artix-7 have been presented and discussed by experts. While software implementations have been subject to side-channel analysis with several attacks being published, an analysis of Dilithium hardware implementations and their peculiarities has not...
cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
Tao Lu, Chengkun Wei, Ruijing Yu, Chaochao Chen, Wenjing Fang, Lei Wang, Zeke Wang, Wenzhi Chen
Implementation
Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL),...
Fast and Clean: Auditable high-performance assembly via constraint solving
Amin Abdulrahman, Hanno Becker, Matthias J. Kannwischer, Fabien Klein
Implementation
Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice.
In this work, we present SLOTHY (Super (Lazy)...
2022/1267
Last updated: 2022-11-20
High-precision Leveled Homomorphic Encryption with Batching
Long Nie, ShaoWen Yao, Jing Liu
Foundations
In most homomorphic encryption schemes based on the RLWE, the native plaintexts are represented as polynomials in a ring $Z_t[x]/x^N+1$ where $t$ is a plaintext modulus and $x^N+1$ is a cyclotomic polynomial with degree power of two. An encoding scheme should be used to transform some natural data types(such as integers and rational numbers) into polynomials in the ring. After a homomorphic computation on the polynomial is finished, the decoding procedure is invoked to obtain the result....
Homomorphic Encryption on GPU
Ali Şah Özcan, Can Ayduman, Enes Recep Türkoğlu, Erkay Savaş
Implementation
Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....
Liberating TFHE: Programmable Bootstrapping with General Quotient Polynomials
Marc Joye, Michael Walter
Public-key cryptography
All known instantiations for fully homomorphic encryption (FHE) produce noisy ciphertexts and rely on a technique called bootstrapping to reduce the noise so as to enable an arbitrary number of homomorphic operations. Bootstrapping is the main performance bottleneck and arguably the biggest obstacle to widespread adoption of FHE. Among the FHE schemes, TFHE and its variations present the appealing property of having a bootstrapping procedure---as well as its extension to programmable ...
Automatic Certified Verification of Cryptographic Programs with COQCRYPTOLINE
Ming-Hsien Tsai, Yu-Fu Fu, Xiaomu Shi, Jiaxiang Liu, Bow-Yaw Wang, Bo-Yin Yang
Implementation
COQCRYPTOLINE is an automatic certified verification tool for cryptographic programs. It is built on OCAML programs extracted from algorithms fully certified in COQ with SS- REFLECT. Similar to other automatic tools, COQCRYPTO- LINE calls external decision procedures during verification. To ensure correctness, all answers from external decision procedures are validated by certified certificate checkers in COQCRYPTOLINE. We evaluate COQCRYPTOLINE on cryp- tographic programs from BITCOIN,...
Fiddling the Twiddle Constants - Fault Injection Analysis of the Number Theoretic Transform
Prasanna Ravi, Bolin Yang, Shivam Bhasin, Fan Zhang, Anupam Chattopadhyay
Attacks and cryptanalysis
In this work, we present the first fault injection analysis of the Number Theoretic Transform (NTT). The NTT is an integral computation unit, widely used for polynomial multiplication in several structured lattice-based key encapsulation mechanisms (KEMs) and digital signature schemes. We identify a critical single fault vulnerability in the NTT, which severely reduces the entropy of its output. This in turn enables us to perform a wide-range of attacks applicable to lattice-based KEMs as...
Radix-3 NTT-Based Polynomial Multiplication for Lattice-Based Cryptography
Chenar Abdulla Hassan, Oğuz Yayla
Public-key cryptography
The lattice-based cryptography is considered a strong candidate amongst many other proposed quantum-safe schemes for the currently deployed asymmetric cryptosystems that do not seem to stay secure when quantum computers come into play. Lattice-based algorithms possess a time-consuming operation of polynomial multiplication. As it is relatively the highest time-consuming operation in lattice-based cryptosystems, one can obtain fast polynomial multiplication by using number theoretic...
CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes
Shiyu Shen, Hao Yang, Yu Liu, Zhe Liu, Yunlei Zhao
Implementation
Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...
Compact and Efficient KEMs over NTRU Lattices
Zhichuang Liang, Boyue Fang, Jieyu Zheng, Yunlei Zhao
Public-key cryptography
The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency {as a whole}, that is as good as and even better than their \{R,M\}LWE-based counterparts. In this work, we solve this problem by...
Adapting Belief Propagation to Counter Shuffling of NTTs
Julius Hermelink, Silvan Streit, Emanuele Strieder, Katharina Thieme
Public-key cryptography
The Number Theoretic Transform (NTT) is a major building block in recently introduced lattice based post-quantum (PQ) cryptography.
The NTT was target of a number of recently proposed Belief Propagation (BP)-based Side Channel Attacks (SCAs). Ravi et al. have recently proposed a number of countermeasures mitigating these attacks.
In 2021, Hamburg et al. presented a chosen-ciphertext enabled SCA improving noise-resistance, which we use as a starting point to state our findings.
We...
cuFE: High Performance Privacy Preserving Support Vector Machine with Inner-Product Functional Encryption
KyungHyun Han, Wai-Kong Lee, Angshuman Karmakar, Jose Maria Bermudo Mera, Seong Oun Hwang
Public-key cryptography
Privacy preservation is a sensitive issue in our modern society. It is becoming increasingly important in many applications in this ever-growing and highly connected digital era. Functional encryption is a computation on encrypted data paradigm that allows users to retrieve the evaluation of a function on encrypted data without revealing the data, thus effectively protecting users' privacy. However, existing functional encryption implementations are still very time-consuming for practical...
Side-Channel Analysis of Lattice-Based Post-Quantum Cryptography: Exploiting Polynomial Multiplication
Catinca Mujdei, Arthur Beckers, Jose Maria Bermudo Mera, Angshuman Karmakar, Lennert Wouters, Ingrid Verbauwhede
Polynomial multiplication algorithms such as Toom-Cook and the Number Theoretic Transform are fundamental building blocks for lattice-based post-quantum cryptography. In this work, we present correlation power analysis-based side-channel analysis methodologies targeting every polynomial multiplication strategy for all lattice-based post-quantum key encapsulation mechanisms in the final round of the NIST post-quantum standardization procedure. We perform practical experiments on real...
Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms
Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Lorenz Panny, Bo-Yin Yang
Implementation
Conventional wisdom purports that FFT-based integer multiplication methods (such as the Schönhage-Strassen algorithm) begin to compete with Karatsuba and Toom-Cook only for integers of several tens of thousands of bits. In this work, we challenge this belief, leveraging recent advances in the implementation of number-theoretic transforms (NTT) stimulated by their use in post-quantum cryptography. We report on implementations of NTT-based integer arithmetic on two Arm Cortex-M CPUs on...
Dilithium for Memory Constrained Devices
Joppe W. Bos, Joost Renes, Amber Sprenkels
Implementation
We investigate the use of the Dilithium post-quantum digital signature scheme on memory-constrained systems. Reference and optimized implementations of Dilithium in the benchmarking framework pqm4 (Cortex-M4) require 50 – 100 KiB of memory, demonstrating the significant challenge to use Dilithium on small IoT platforms. We show that compressing polynomials, using an alternative number theoretic transform, and falling back to the schoolbook method for certain multiplications reduces the...
Faster NTRU on ARM Cortex-M4 with TMVP-based multiplication
Irem Keskinkurt Paksoy, Murat Cenk
Applications
The Number Theoretic Transform (NTT), Toom-Cook, and
Karatsuba are the most commonly used algorithms for implementing
lattice-based ?nalists of the NIST PQC competition. In this paper, we
propose Toeplitz matrix-vector product (TMVP) based algorithms for
multiplication for all parameter sets of NTRU. We implement the pro-
posed algorithms on ARM Cortex-M4. The results show that TMVP-
based multiplication algorithms using the four-way TMVP formula are
more e?cient for NTRU. Our algorithms...
High-Performance Hardware Implementation of Lattice-Based Digital Signatures
Luke Beckwith, Duc Tri Nguyen, Kris Gaj
Implementation
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
Faster Kyber and Dilithium on the Cortex-M4
Amin Abdulrahman, Vincent Hwang, Matthias J. Kannwischer, Amber Sprenkels
Implementation
This paper presents faster implementations of the lattice-based schemes Dilithium and Kyber on the Cortex-M4. Dilithium is one of the three signature finalists in the NIST post-quantum project (NIST PQC), while Kyber is one of the four key-encapsulation mechanism (KEM) finalists.
Our optimizations affect the core polynomial arithmetic using the number-theoretic transform (NTT) of both schemes. Our main contributions are threefold: We present a faster signed Barrett reduction for Kyber,...
Public Key Compression and Fast Polynomial Multiplication for NTRU using the Corrected Hybridized NTT-Karatsuba Method
Rohon Kundu, Alessandro de Piccoli, Andrea Visconti
Public-key cryptography
NTRU is a lattice-based public-key cryptosystem that has been selected as one of the Round III finalists at the NIST Post-Quantum Cryptography Standardization. Compressing the key sizes to increase efficiency has been a long-standing open question for lattice-based cryptosystems. In this paper we provide a solution to three seemingly opposite demands for NTRU cryptosystem: compress the key size, increase the security level, optimize performance by implementing fast polynomial...
Performance of Hierarchical Transforms in Homomorphic Encryption: A case study on Logistic Regression inference
Pedro Geraldo M. R. Alves, Jheyne N. Ortiz, Diego F. Aranha
Implementation
Recent works challenged the Number-Theoretic Transform (NTT) as the most efficient method for
polynomial multiplication in GPU implementations of Fully Homomorphic Encryption schemes such
as CKKS and BFV. In particular, these works argue that the Discrete Galois Transform (DGT) is a
better candidate for this particular case. However, these claims were never rigorously validated, and
only intuition was used to argue in favor of each transform. This work brings some light on the dis-
cussion...
Timing leakage analysis of non-constant-time NTT implementations with Harvey butterflies
Nir Drucker, Tomer Pelleg
Implementation
Harvey butterflies and their variants are core primitives in many optimized number-theoretic transform (NTT) implementations, such as those used by the HElib and SEAL homomorphic encryption libraries. However, these butterflies are not constant-time algorithms and may leak secret data when incorrectly implemented. Luckily for SEAL and HElib, the compilers optimize the code to run in constant-time. We claim that relying on the compiler is risky and demonstrate how a simple code modification...
Lattice-based Signatures with Tight Adaptive Corruptions and More
Jiaxin Pan, Benedikt Wagner
Public-key cryptography
We construct the first tightly secure signature schemes in the multi-user setting with adaptive corruptions from lattices. In stark contrast to the previous tight constructions whose security is solely based on number-theoretic assumptions, our schemes are based on the Learning with Errors (LWE) assumption which is supposed to be post-quantum secure. The security of our scheme is independent of the numbers of users and signing queries, and it is in the non-programmable random oracle model....
Roulette: A Diverse Family of Feasible Fault Attacks on Masked Kyber
Jeroen Delvaux
Implementation
At Indocrypt 2021, Hermelink, Pessl, and Pöppelmann presented a fault attack against Kyber in which a system of linear inequalities over the private key is generated and solved. The attack requires a laser and is, understandably, demonstrated with simulations—not actual equipment. We facilitate and diversify the attack in four ways, thereby admitting cheaper and more forgiving fault-injection setups. Firstly, the attack surface is enlarged: originally, the two input operands of the...
Time-memory Trade-offs for Saber+ on Memory-constrained RISC-V
Jipeng Zhang, Junhao Huang, Zhe Liu, Sujoy Sinha Roy
Public-key cryptography
Saber is a module-lattice-based key encapsulation scheme that has been selected as a finalist in the NIST Post-Quantum Cryptography Standardization Project. As Saber computes on considerably large matrices and vectors of polynomials, its efficient implementation on memory-constrained IoT devices is very challenging. In this paper, we present an implementation of Saber with a minor tweak to the original Saber protocol for achieving reduced memory consumption and better performance. We call...
NTT software optimization using an extended Harvey butterfly
Jonathan Bradbury, Nir Drucker, Marius Hillenbrand
Implementation
Software implementations of the number-theoretic transform (NTT) method often leverage Harvey’s butterfly to gain speedups. This is the case in cryptographic libraries such as IBM’s HElib, Microsoft’s SEAL, and Intel’s HEXL, which provide optimized implementations of fully homomorphic encryption schemes or their primitives.
We extend the Harvey butterfly to the radix-4 case for primes in the range [2^31, 2^52). This enables us to use the vector multiply sum logical (VMSL) instruction, which...
APAS: Application-Specific Accelerators for RLWE-based Homomorphic Linear Transformations
Song Bian, Dur E Shahwar Kundi, Kazuma Hirozawa, Weiqiang Liu, Takashi Sato
Applications
Recently, the application of multi-party secure computing schemes based on homomorphic encryption in the field of machine learning attracts attentions across the research fields. Previous studies have demonstrated that secure protocols adopting packed additive homomorphic encryption (PAHE) schemes based on the ring learning with errors (RLWE) problem exhibit significant practical merits, and are particularly promising in enabling efficient secure inference in machine-learning-as-a-service...
Non-Malleable Vector Commitments via Local Equivocability
Lior Rotem, Gil Segev
Vector commitments (VCs), enabling to commit to a vector and locally reveal any of its entries, play a key role in a variety of both classic and recently-evolving applications. However, security notions for VCs have so far focused on passive attacks, and non-malleability notions considering active attacks have not been explored. Moreover, existing frameworks that may enable to capture the non-malleability of VCs seem either too weak (non-malleable non-interactive commitments that do not...
Online Linear Extractors for Independent Sources
Yevgeniy Dodis, Siyao Guo, Noah Stephens-Davidowitz, Zhiye Xie
Foundations
In this work, we characterize online linear extractors. In other words, given a matrix $A \in \mathbb{F}_2^{n \times n}$, we study the convergence of the iterated process $\mathbf{S} \leftarrow A\mathbf{S} \oplus \mathbf{X} $, where $\mathbf{X} \sim D$ is repeatedly sampled independently from some fixed (but unknown) distribution $D$ with (min)-entropy at least $k$. Here, we think of $\mathbf{S} \in \{0,1\}^n$ as the state of an online extractor, and $\mathbf{X} \in \{0,1\}^n$ as its...
Polynomial multiplication on embedded vector architectures
Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede
Public-key cryptography
High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those...
Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4
Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang
Public-key cryptography
The U.S. National Institute of Standards and Technology (NIST) has designated ARM microcontrollers as an important benchmarking platform for its Post-Quantum Cryptography standardization process (NISTPQC). In view of this, we explore the design space of the NISTPQC finalist Saber on the Cortex-M4 and its close relation, the Cortex-M3. In the process, we investigate various optimization strategies and memory-time tradeoffs for number-theoretic transforms (NTTs). Recent work by [Chung et al.,...
Chosen Ciphertext k-Trace Attacks on Masked CCA2 Secure Kyber
Mike Hamburg, Julius Hermelink, Robert Primas, Simona Samardjiska, Thomas Schamberger, Silvan Streit, Emanuele Strieder, Christine van Vredendaal
Public-key cryptography
Single-trace attacks are a considerable threat to implementations of classic public-key schemes, and their implications on newer lattice-based schemes are still not well understood.
Two recent works have presented successful single-trace attacks targeting the Number Theoretic Transform (NTT), which is at the heart of many lattice-based schemes.
However, these attacks either require a quite powerful side-channel adversary or are restricted to specific scenarios such as the encryption of...
Bounded Collusion ABE for TMs from IBE
Rishab Goyal, Ridwan Syed, Brent Waters
Public-key cryptography
We give an attribute-based encryption system for Turing Machines that is provably secure assuming only the existence of identity-based encryption (IBE) for large identity spaces. Currently, IBE is known to be realizable from most mainstream number theoretic assumptions that imply public key cryptography including factoring, the search Diffie-Hellman assumption, and the Learning with Errors assumption.
Our core construction provides security against an attacker that makes a single key query...
Attribute-Based Conditional Proxy Re-Encryption in the Standard Model under LWE
Xiaojian Liang, Jian Weng, Anjia Yang, Lisha Yao, Zike Jiang, Zhenghao Wu
Public-key cryptography
Attribute-based conditional proxy re-encryption (AB-CPRE) allows delegators to carry out attribute-based control on the delegation of decryption by setting policies and attribute vectors. The fine-grained control of AB-CPRE makes it suitable for a variety of applications, such as cloud storage and distributed file systems. However, all existing AB-CPRE schemes are constructed under classical number-theoretic assumptions, which are vulnerable to quantum cryptoanalysis. Therefore, we propose...
High-Speed NTT-based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography
Mojtaba Bisheh-Niasar, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Implementation
This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several...
Kyber on ARM64: Compact Implementations of Kyber on 64-bit ARM Cortex-A Processors
Pakize Sanal, Emrah Karagoz, Hwajeong Seo, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Implementation
Public-key cryptography based on the lattice problem is efficient and believed to be secure in a post-quantum era. In this paper, we introduce carefully optimized implementations of Kyber encryption schemes for 64-bit ARM Cortex-A processors. Our research contribution includes several optimizations for Number Theoretic Transform (NTT), noise sampling, and AES accelerator based symmetric function implementations. The proposed Kyber512 implementation on ARM64 improved previous works by 1.72×,...
Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography
Tim Fritzmann, Michiel Van Beirendonck, Debapriya Basu Roy, Patrick Karl, Thomas Schamberger, Ingrid Verbauwhede, Georg Sigl
Public-key cryptography
Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we...
Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52
Fabian Boemer, Sejun Kim, Gelila Seifu, Fillipe D. M. de Souza, Vinodh Gopal
Implementation
Modern implementations of homomorphic encryption (HE) rely heavily on polynomial arithmetic over a finite field. This is particularly true of the CKKS, BFV, and BGV HE schemes. Two of the biggest performance bottlenecks in HE primitives and applications are polynomial modular multiplication and the forward and inverse number- theoretic transform (NTT). Here, we introduce Intel® Homomorphic Encryption Acceleration Library (Intel® HEXL), a C++ library which provides optimized implementations...
TensorCrypto
Wai-Kong Lee, Hwajeong Seo, Zhenfei Zhang, Seongoun Hwang
Implementation
Tensor core is a specially designed hardware included in new NVIDIA GPU chips, aimed at accelerating deep learning applications. With the introduction of tensor core, the matrix multiplication at low precision can be computed much faster than using conventional integer and floating point units in NVIDIA GPU. In the past, applications of tensor core were mainly restricted to machine learning and mixed precision scientific computing. In this paper, we show that for the first time, tensor core...
Efficient Number Theoretic Transform Implementation on GPU for Homomorphic Encryption
Ozgun Ozerk, Can Elgezen, Ahmet Can Mert, Erdinc Ozturk, Erkay Savas
Implementation
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
Implementing CRYSTALS-Dilithium Signature Scheme on FPGAs
Sara Ricci, Lukas Malina, Petr Jedlicka, David Smekal, Jan Hajny, Petr Cibik, Patrik Dobias
Implementation
In July 2020, the lattice-based CRYSTALS-Dilithium digital signature scheme has been chosen as one of the three third-round finalists in the post-quantum cryptography standardization process by the National Institute of Standards and Technology (NIST). In this work, we present the first Very High Speed Integrated Circuit Hardware Description Language (VHDL) implementation of the CRYSTALS-Dilithium signature scheme for Field-Programmable Gate Arrays (FPGAs). Due to our parallelization-based...
NTT Multiplication for NTT-unfriendly Rings
Chi-Ming Marvin Chung, Vincent Hwang, Matthias J. Kannwischer, Gregor Seiler, Cheng-Jhih Shih, Bo-Yin Yang
Implementation
In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial...
When to Barrett reduce in the inverse NTT
Bas Westerbaan
Implementation
We show that lazily Barrett reducing when computing the inverse number theoretic transform (NTT) is optimal.
TMVP-based Multiplication for Polynomial Quotient Rings and Application to Saber on ARM Cortex-M4
İrem Keskinkurt Paksoy, Murat Cenk
Implementation
Lattice-based NIST PQC finalists need efficient multiplication in $\mathbb{Z}_q[x]/(f(x))$. Multiplication in this ring can be performed very efficiently via number theoretic transform (NTT) as done in CRYSTALS-Kyber if the parameters of the scheme allow it. If NTT is not supported, other multiplication algorithms must be employed. For example, if the modulus $q$ of the scheme is a power of two, as in Saber and NTRU, then NTT can not be used directly. In this case, Karatsuba and Toom-Cook...
On Configurable SCA Countermeasures Against Single Trace Attacks for the NTT - A Performance Evaluation Study over Kyber and Dilithium on the ARM Cortex-M4
Prasanna Ravi, Romain Poussier, Shivam Bhasin, Anupam Chattopadhyay
Implementation
The Number Theoretic Transform (NTT) is a critical sub-block used in several structured lattice-based schemes, including Kyber and Dilithium, which are finalist candidates in the NIST's standardization process for post-quantum cryptography. The NTT was shown to be susceptible to single trace side-channel attacks by Primas et al. in CHES 2017 and Pessl et al. in Latincrypt 2019 who demonstrated full key recovery from single traces on the ARM Cortex-M4 microcontroller. However, the cost of...
CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware
Hamid Nejatollahi, Saransh Gupta, Mohsen Imani, Tajana Simunic Rosing, Rosario Cammarota, Nikil Dutt
Implementation
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptography (such as digital signature and key agreement) insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public-key cryptographic protocol that could replace standardized public-key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical...
Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography
Jose Maria Bermudo Mera, Angshuman Karmakar, Ingrid Verbauwhede
Public-key cryptography
Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the...
Cortex-M4 Optimizations for \{R,M\}LWE Schemes
Erdem Alkim, Yusuf Alper Bilgin, Murat Cenk, François Gérard
Public-key cryptography
This paper proposes various optimizations for lattice-based key-encapsulation mechanisms (KEM) using the Number Theoretic Transform (NTT) on the popular ARM Cortex-M4 microcontroller. Improvements come in the form of a faster code using more efficient modular reductions, small polynomial multiplications and more aggressive layer merging in the NTT but also reduced stack usage. We test those optimizations in software implementations of Kyber and NewHope, both round 2 candidates in the NIST...
Exploring Energy Efficient Quantum-resistant Signal Processing Using Array Processors
Hamid Nejatollahi, Sina Shahhosseini, Rosario Cammarota, Nikil Dutt
Public-key cryptography
Quantum computers threaten to compromise public-key cryptography schemes such as DSA and ECDSA in polynomial time, which poses an imminent threat to secure signal processing. The cryptography community has responded with the
development and standardization of post-quantum cryptography (PQC) algorithms, a class of public-key algorithms based on hard problems that no known quantum algorithms can solve in polynomial time. Ring learning with error (RLWE) lattice-
based cryptographic (LBC)...
Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols (Extended Version)
Utsav Banerjee, Tenzin S. Ukyab, Anantha P. Chandrakasan
Implementation
Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a...
When NTT Meets Karatsuba: Preprocess-then-NTT Technique Revisited
Yiming Zhu, Zhen Liu, Yanbin Pan
Public-key cryptography
The Number Theoretic Transform (NTT) technique is widely used in the implementation of the cryptographic schemes based on the Ring Learning With Errors problem(RLWE), since it provides efficient algorithm for multiplication of polynomials over the finite field. However, to employ NTT, the finite field is required to have some special root of unity, such as $n$-th root, which makes the module $q$ in RLWE big since we need $q\equiv 1\mod 2n$ to ensure such root exits. At Inscrypt 2018, Zhou...
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
In computer arithmetic operations, the Number Theoretic Transform (NTT) plays a significant role in the efficient implementation of cyclic and nega-cyclic convolutions with the application of multiplying large integers and large degree polynomials. Multiplying polynomials is a common operation in lattice-based cryptography. Hence, the NTT is a core component of several lattice-based cryptographic algorithms. Two well-known examples are the key encapsulation mechanism Kyber and the...
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes. The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation. We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
In this work, we introduce a family of asymmetric cryptographic functions based on dynamic number theoretic transformations with multiple rounds of modular arithmetic to enhance diffusion and difficulty of inversion. This function acts as a basic cryptographic building block for a novel communication-efficient zero-knowledge crypto-system. The system as defined exhibits partial homomorphism and behaves as an additive positive accumulator. By using a novel technique to constructively embed...
Post-Quantum Cryptography (PQC) was proposed due to the potential threats quantum computer attacks against conventional public key cryptosystems, and four PQC algorithms besides CRYSTALS-Dilithium (Dilithium for short) have so far been selected for NIST standardization. However, the selected algorithms are still vulnerable to side-channel attacks in practice, and their physical security need to be further evaluated. This study introduces two efficient power analysis attacks, the optimized...
Number theoretic transform (NTT) has been a very useful tool in computations for number theory, algebra and cryptography. Its performance affects some post-quantum cryptosystems. In this paper, we discuss the butterfly operation of NTT. This basic module of NTT requires heavy modular arithmetics. Montgomery reduction is commonly used in this setting. Recently several variants of Montgomery have been proposed for the purpose of speeding up NTT. We observe that the Chinese remainder...
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting NTT because it is usually the most time-consuming part of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that...
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
The splitting field $F$ of the polynomial $Y^n-2$ is an extension over $\mathbb{Q}$ generated by $\zeta_n=\exp(2 \pi \sqrt{-1} /n)$ and $\sqrt[n]{2}$. In this paper, we lay the foundation for applying the Order-LWE in the integral ring $\mathcal{R}=\mathbb{Z}[\zeta_n, \sqrt[n]{2}]$ to cryptographic uses when $n$ is a power-of-two integer. We explicitly compute the Galois group $\text{Gal}\left(F/\mathbb{Q} \right)$ and the canonical embedding of $F$, based on which we study the properties of...
In 2022, NIST selected Kyber and Dilithium as post-quantum cryptographic standard algorithms. The Number Theoretic Transformation (NTT) algorithm, which facilitates polynomial multiplication, has become a primary target for side-channel attacks. In this work, we embed the NTT transformation matrix in Dilithium and Kyber into the SIS search problem, and further, we propose a divide and conquer strategy for dimensionality reduction of the SIS problem by utilizing the properties of NTT, and...
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of post quantum cryptography (PQC). Polynomial multiplication is the core of Ring Learning with Error (RLWE) lattice based cryptography (LBC) which is one of the most promising PQC candidates. In this work, we present the design of fast and energy-efficient pipelined Number...
The Number Theoretic Transform (NTT) plays a central role in efficient implementations of cryptographic primitives selected for Post Quantum Cryptography. Although it certainly exists, academic papers that cite the NTT omit the connection between the NTT and residues of a polynomial modulo factors of $X^{2^d} + 1$ and mention only the final expressions of what the NTT computes. This short paper establishes that connection and, in doing so, elucidates key aspects of computing the NTT. Based...
CRYSTALS-Kyber is a key-encapsulation mechanism, whose security is based on the hardness of solving the learning-with-errors (LWE) problem over module lattices. As in its specification, Kyber prescribes the usage of the Number Theoretic Transform (NTT) for efficient polynomial multiplication. Side-channel assisted attacks against Post-Quantum Cryptography (PQC) algorithms like Kyber remain a concern in the ongoing standardization process of quantum-computer-resistant cryptosystems. Among the...
The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST's Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement,...
Recently proposed lattice-based cryptography algorithms can be used to protect the IoT communication against the threat from quantum computers, but they are computationally heavy. In particular, polynomial multiplication is one of the most time-consuming operations in lattice-based cryptography. To achieve efficient implementation, the Number Theoretic Transform (NTT) algorithm is an ideal choice, but it has certain limitations on the parameters, which not all lattice-based schemes can...
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices. The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation. This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with...
Homomorphic encryption (HE) allows computing encrypted data in the ciphertext domain without knowing the encryption key. It is possible, however, to break fully homomorphic encryption (FHE) algorithms by using side channels. This article demonstrates side-channel leakages of the Microsoft SEAL HE library. The proposed attack can steal encryption keys during the key generation phase by abusing the leakage of ternary value assignments that occurs during the number theoretic transform (NTT)...
The advent of quantum computers poses a serious challenge to the security of cloud infrastructures and services, as they can potentially break the existing public-key cryptosystems, such as Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC). Even though the gap between today’s quantum computers and the threats they pose to current public-key cryptography is large, the cloud landscape should act proactively and initiate the transition to the post-quantum era as early as...
Threshold cryptography is typically based on the idea of secret-sharing a private-key $s\in F$ ``in the exponent'' of some cryptographic group $G$, or more generally, encoding $s$ in some linearly homomorphic domain. In each invocation of the threshold system (e.g., for signing or decrypting) an ``encoding'' of the secret is being recovered and so the complexity, measured as the number of group multiplications over $G$, is equal to the number of $F$-additions that are needed to reconstruct...
With the increased use of data and communication through the internet and the abundant misuse of personal data by many organizations, people are more sensitive about their privacy. Privacy-preserving computation is becoming increasingly important in this era. Functional encryption allows a user to evaluate a function on encrypted data without revealing sensitive information. Most implementations of functional encryption schemes are too time-consuming for practical use. Mera et al. first...
A large part of current research in homomorphic encryption (HE) aims towards making HE practical for real-world applications. In any practical HE, an important issue is to convert the application data (type) to the data type suitable for the HE. The main purpose of this work is to investigate an efficient HE-compatible encoding method that is generic, and can be easily adapted to apply to the HE schemes over integers or polynomials. $p$-adic number theory provides a way to transform...
The Number Theoretic Transform (NTT) is used to efficiently execute polynomial multiplication. It has become an important part of lattice-based post-quantum methods and the subsequent generation of standard cryptographic systems. However, implementing post-quantum schemes is challenging since they rely on intricate structures. This paper demonstrates how to develop a high-speed NTT multiplier highly optimized for FPGAs with few logical resources. We describe a novel architecture for NTT...
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \&...
We introduce a novel template attack for secret key recovery in Kyber, leveraging side-channel information from polynomial multiplication during decapsulation. Conceptually, our attack exploits that Kyber's incomplete number-theoretic transform (NTT) causes each secret coefficient to be used multiple times, unlike when performing a complete NTT. Our attack is a single trace \emph{known} ciphertext attack that avoids machine-learning techniques and instead relies on correlation-matching...
This paper explores the design space of vector-optimized polynomial multiplications in the lattice-based key-encapsulation mechanisms NTRU and NTRU Prime. Since NTRU and NTRU Prime do not support straightforward applications of number– theoretic transforms, the state-of-the-art vector code either resorted to Toom–Cook, or introduced various techniques for coefficient ring extensions. All these techniques lead to a large number of small-degree polynomial multiplications, which is the...
Lattice-based cryptography has laid the foundation of various modern-day cryptosystems that cater to several applications, including post-quantum cryptography. For structured lattice-based schemes, polynomial arithmetic is a fundamental part. In several instances, the performance optimizations come from implementing compact multipliers due to the small range of the secret polynomial coefficients. However, this optimization does not easily translate to side-channel protected implementations...
This paper presents a correct-by-construction method of designing an FHE model based on the automated program verifier Dafny. We model FHE operations from the ground up, including fundamentals like GCD, coprimality, Montgomery multiplications, and polynomial operations, etc., and higher level optimizations such as Residue Number System (RNS) and Number Theoretic Transform (NTT). The fully formally verified FHE model serves as a reference design for both software stack development and...
Number Theoretic Transform (NTT) is a fundamental building block in emerging cryptographic constructions like fully homomorphic encryption, post-quantum cryptography and zero-knowledge proof. In this work, we introduce Proteus, an open-source parametric hardware to generate pipelined architectures for the NTT. For a given parameter set including the polynomial degree and size of the coefficient modulus, Proteus can generate Radix-2 NTT architectures using Single-path Delay Feedback (SDF) and...
Amortized bootstrapping offers a way to simultaneously refresh many ciphertexts of a fully homomorphic encryption scheme, at a total cost comparable to that of refreshing a single ciphertext. An amortization method for FHEW-style cryptosystems was first proposed by (Micciancio and Sorrell, ICALP 2018), who showed that the amortized cost of bootstrapping n FHEW-style ciphertexts can be reduced from $O(n)$ basic cryptographic operations to just $O(n^{\epsilon})$, for any constant...
This paper describes a formalization of the specification and the algorithm of the cryptographic scheme CRYSTALS-KYBER as well as the verification of its (1 − δ)-correctness proof. During the formalization, a problem in the correctness proof was uncovered. In order to amend this issue, a necessary property on the modulus parameter of the CRYSTALS-KYBER algorithm was introduced. This property is already implicitly fulfilled by the structure of the modulus prime used in the number theoretic...
Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets ($\ge2^{24}$, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of...
Private comparison schemes constructed on homomorphic encryption offer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and float inputs, offers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...
FALCON and Crystals-Dilithium are the digital signatures algorithms selected as NIST PQC standards at the end of the third round. FALCON has the advantage of the shortest size of the combined public key and signature but has the disadvantage of the relatively long signing time. Since FALCON algorithm is faithfully designed based on theoretical security analysis, the implementation of the algorithms is quite complex and needs considerable complexity. In order to implement the FALCON...
A fully homomorphic encryption (FHE) scheme allows a client to encrypt and delegate its data to a server that performs computation on the encrypted data that the client can then decrypt. While FHE gives confidentiality to clients' data, it does not protect the server's input and computation. Nevertheless, FHE schemes are still helpful in building delegation protocols that reduce communication complexity, as the ciphertext's size is independent of the size of the computation performed on...
The lattice-based CRYSTALS-Dilithium signature scheme has been selected for standardization by the NIST. As part of the selection process, a large number of implementations for platforms like x86, ARM Cortex-M4, or – on the hardware side – Xilinx Artix-7 have been presented and discussed by experts. While software implementations have been subject to side-channel analysis with several attacks being published, an analysis of Dilithium hardware implementations and their peculiarities has not...
Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL),...
Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice. In this work, we present SLOTHY (Super (Lazy)...
In most homomorphic encryption schemes based on the RLWE, the native plaintexts are represented as polynomials in a ring $Z_t[x]/x^N+1$ where $t$ is a plaintext modulus and $x^N+1$ is a cyclotomic polynomial with degree power of two. An encoding scheme should be used to transform some natural data types(such as integers and rational numbers) into polynomials in the ring. After a homomorphic computation on the polynomial is finished, the decoding procedure is invoked to obtain the result....
Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....
All known instantiations for fully homomorphic encryption (FHE) produce noisy ciphertexts and rely on a technique called bootstrapping to reduce the noise so as to enable an arbitrary number of homomorphic operations. Bootstrapping is the main performance bottleneck and arguably the biggest obstacle to widespread adoption of FHE. Among the FHE schemes, TFHE and its variations present the appealing property of having a bootstrapping procedure---as well as its extension to programmable ...
COQCRYPTOLINE is an automatic certified verification tool for cryptographic programs. It is built on OCAML programs extracted from algorithms fully certified in COQ with SS- REFLECT. Similar to other automatic tools, COQCRYPTO- LINE calls external decision procedures during verification. To ensure correctness, all answers from external decision procedures are validated by certified certificate checkers in COQCRYPTOLINE. We evaluate COQCRYPTOLINE on cryp- tographic programs from BITCOIN,...
In this work, we present the first fault injection analysis of the Number Theoretic Transform (NTT). The NTT is an integral computation unit, widely used for polynomial multiplication in several structured lattice-based key encapsulation mechanisms (KEMs) and digital signature schemes. We identify a critical single fault vulnerability in the NTT, which severely reduces the entropy of its output. This in turn enables us to perform a wide-range of attacks applicable to lattice-based KEMs as...
The lattice-based cryptography is considered a strong candidate amongst many other proposed quantum-safe schemes for the currently deployed asymmetric cryptosystems that do not seem to stay secure when quantum computers come into play. Lattice-based algorithms possess a time-consuming operation of polynomial multiplication. As it is relatively the highest time-consuming operation in lattice-based cryptosystems, one can obtain fast polynomial multiplication by using number theoretic...
Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...
The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency {as a whole}, that is as good as and even better than their \{R,M\}LWE-based counterparts. In this work, we solve this problem by...
The Number Theoretic Transform (NTT) is a major building block in recently introduced lattice based post-quantum (PQ) cryptography. The NTT was target of a number of recently proposed Belief Propagation (BP)-based Side Channel Attacks (SCAs). Ravi et al. have recently proposed a number of countermeasures mitigating these attacks. In 2021, Hamburg et al. presented a chosen-ciphertext enabled SCA improving noise-resistance, which we use as a starting point to state our findings. We...
Privacy preservation is a sensitive issue in our modern society. It is becoming increasingly important in many applications in this ever-growing and highly connected digital era. Functional encryption is a computation on encrypted data paradigm that allows users to retrieve the evaluation of a function on encrypted data without revealing the data, thus effectively protecting users' privacy. However, existing functional encryption implementations are still very time-consuming for practical...
Polynomial multiplication algorithms such as Toom-Cook and the Number Theoretic Transform are fundamental building blocks for lattice-based post-quantum cryptography. In this work, we present correlation power analysis-based side-channel analysis methodologies targeting every polynomial multiplication strategy for all lattice-based post-quantum key encapsulation mechanisms in the final round of the NIST post-quantum standardization procedure. We perform practical experiments on real...
Conventional wisdom purports that FFT-based integer multiplication methods (such as the Schönhage-Strassen algorithm) begin to compete with Karatsuba and Toom-Cook only for integers of several tens of thousands of bits. In this work, we challenge this belief, leveraging recent advances in the implementation of number-theoretic transforms (NTT) stimulated by their use in post-quantum cryptography. We report on implementations of NTT-based integer arithmetic on two Arm Cortex-M CPUs on...
We investigate the use of the Dilithium post-quantum digital signature scheme on memory-constrained systems. Reference and optimized implementations of Dilithium in the benchmarking framework pqm4 (Cortex-M4) require 50 – 100 KiB of memory, demonstrating the significant challenge to use Dilithium on small IoT platforms. We show that compressing polynomials, using an alternative number theoretic transform, and falling back to the schoolbook method for certain multiplications reduces the...
The Number Theoretic Transform (NTT), Toom-Cook, and Karatsuba are the most commonly used algorithms for implementing lattice-based ?nalists of the NIST PQC competition. In this paper, we propose Toeplitz matrix-vector product (TMVP) based algorithms for multiplication for all parameter sets of NTRU. We implement the pro- posed algorithms on ARM Cortex-M4. The results show that TMVP- based multiplication algorithms using the four-way TMVP formula are more e?cient for NTRU. Our algorithms...
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
This paper presents faster implementations of the lattice-based schemes Dilithium and Kyber on the Cortex-M4. Dilithium is one of the three signature finalists in the NIST post-quantum project (NIST PQC), while Kyber is one of the four key-encapsulation mechanism (KEM) finalists. Our optimizations affect the core polynomial arithmetic using the number-theoretic transform (NTT) of both schemes. Our main contributions are threefold: We present a faster signed Barrett reduction for Kyber,...
NTRU is a lattice-based public-key cryptosystem that has been selected as one of the Round III finalists at the NIST Post-Quantum Cryptography Standardization. Compressing the key sizes to increase efficiency has been a long-standing open question for lattice-based cryptosystems. In this paper we provide a solution to three seemingly opposite demands for NTRU cryptosystem: compress the key size, increase the security level, optimize performance by implementing fast polynomial...
Recent works challenged the Number-Theoretic Transform (NTT) as the most efficient method for polynomial multiplication in GPU implementations of Fully Homomorphic Encryption schemes such as CKKS and BFV. In particular, these works argue that the Discrete Galois Transform (DGT) is a better candidate for this particular case. However, these claims were never rigorously validated, and only intuition was used to argue in favor of each transform. This work brings some light on the dis- cussion...
Harvey butterflies and their variants are core primitives in many optimized number-theoretic transform (NTT) implementations, such as those used by the HElib and SEAL homomorphic encryption libraries. However, these butterflies are not constant-time algorithms and may leak secret data when incorrectly implemented. Luckily for SEAL and HElib, the compilers optimize the code to run in constant-time. We claim that relying on the compiler is risky and demonstrate how a simple code modification...
We construct the first tightly secure signature schemes in the multi-user setting with adaptive corruptions from lattices. In stark contrast to the previous tight constructions whose security is solely based on number-theoretic assumptions, our schemes are based on the Learning with Errors (LWE) assumption which is supposed to be post-quantum secure. The security of our scheme is independent of the numbers of users and signing queries, and it is in the non-programmable random oracle model....
At Indocrypt 2021, Hermelink, Pessl, and Pöppelmann presented a fault attack against Kyber in which a system of linear inequalities over the private key is generated and solved. The attack requires a laser and is, understandably, demonstrated with simulations—not actual equipment. We facilitate and diversify the attack in four ways, thereby admitting cheaper and more forgiving fault-injection setups. Firstly, the attack surface is enlarged: originally, the two input operands of the...
Saber is a module-lattice-based key encapsulation scheme that has been selected as a finalist in the NIST Post-Quantum Cryptography Standardization Project. As Saber computes on considerably large matrices and vectors of polynomials, its efficient implementation on memory-constrained IoT devices is very challenging. In this paper, we present an implementation of Saber with a minor tweak to the original Saber protocol for achieving reduced memory consumption and better performance. We call...
Software implementations of the number-theoretic transform (NTT) method often leverage Harvey’s butterfly to gain speedups. This is the case in cryptographic libraries such as IBM’s HElib, Microsoft’s SEAL, and Intel’s HEXL, which provide optimized implementations of fully homomorphic encryption schemes or their primitives. We extend the Harvey butterfly to the radix-4 case for primes in the range [2^31, 2^52). This enables us to use the vector multiply sum logical (VMSL) instruction, which...
Recently, the application of multi-party secure computing schemes based on homomorphic encryption in the field of machine learning attracts attentions across the research fields. Previous studies have demonstrated that secure protocols adopting packed additive homomorphic encryption (PAHE) schemes based on the ring learning with errors (RLWE) problem exhibit significant practical merits, and are particularly promising in enabling efficient secure inference in machine-learning-as-a-service...
Vector commitments (VCs), enabling to commit to a vector and locally reveal any of its entries, play a key role in a variety of both classic and recently-evolving applications. However, security notions for VCs have so far focused on passive attacks, and non-malleability notions considering active attacks have not been explored. Moreover, existing frameworks that may enable to capture the non-malleability of VCs seem either too weak (non-malleable non-interactive commitments that do not...
In this work, we characterize online linear extractors. In other words, given a matrix $A \in \mathbb{F}_2^{n \times n}$, we study the convergence of the iterated process $\mathbf{S} \leftarrow A\mathbf{S} \oplus \mathbf{X} $, where $\mathbf{X} \sim D$ is repeatedly sampled independently from some fixed (but unknown) distribution $D$ with (min)-entropy at least $k$. Here, we think of $\mathbf{S} \in \{0,1\}^n$ as the state of an online extractor, and $\mathbf{X} \in \{0,1\}^n$ as its...
High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those...
The U.S. National Institute of Standards and Technology (NIST) has designated ARM microcontrollers as an important benchmarking platform for its Post-Quantum Cryptography standardization process (NISTPQC). In view of this, we explore the design space of the NISTPQC finalist Saber on the Cortex-M4 and its close relation, the Cortex-M3. In the process, we investigate various optimization strategies and memory-time tradeoffs for number-theoretic transforms (NTTs). Recent work by [Chung et al.,...
Single-trace attacks are a considerable threat to implementations of classic public-key schemes, and their implications on newer lattice-based schemes are still not well understood. Two recent works have presented successful single-trace attacks targeting the Number Theoretic Transform (NTT), which is at the heart of many lattice-based schemes. However, these attacks either require a quite powerful side-channel adversary or are restricted to specific scenarios such as the encryption of...
We give an attribute-based encryption system for Turing Machines that is provably secure assuming only the existence of identity-based encryption (IBE) for large identity spaces. Currently, IBE is known to be realizable from most mainstream number theoretic assumptions that imply public key cryptography including factoring, the search Diffie-Hellman assumption, and the Learning with Errors assumption. Our core construction provides security against an attacker that makes a single key query...
Attribute-based conditional proxy re-encryption (AB-CPRE) allows delegators to carry out attribute-based control on the delegation of decryption by setting policies and attribute vectors. The fine-grained control of AB-CPRE makes it suitable for a variety of applications, such as cloud storage and distributed file systems. However, all existing AB-CPRE schemes are constructed under classical number-theoretic assumptions, which are vulnerable to quantum cryptoanalysis. Therefore, we propose...
This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several...
Public-key cryptography based on the lattice problem is efficient and believed to be secure in a post-quantum era. In this paper, we introduce carefully optimized implementations of Kyber encryption schemes for 64-bit ARM Cortex-A processors. Our research contribution includes several optimizations for Number Theoretic Transform (NTT), noise sampling, and AES accelerator based symmetric function implementations. The proposed Kyber512 implementation on ARM64 improved previous works by 1.72×,...
Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we...
Modern implementations of homomorphic encryption (HE) rely heavily on polynomial arithmetic over a finite field. This is particularly true of the CKKS, BFV, and BGV HE schemes. Two of the biggest performance bottlenecks in HE primitives and applications are polynomial modular multiplication and the forward and inverse number- theoretic transform (NTT). Here, we introduce Intel® Homomorphic Encryption Acceleration Library (Intel® HEXL), a C++ library which provides optimized implementations...
Tensor core is a specially designed hardware included in new NVIDIA GPU chips, aimed at accelerating deep learning applications. With the introduction of tensor core, the matrix multiplication at low precision can be computed much faster than using conventional integer and floating point units in NVIDIA GPU. In the past, applications of tensor core were mainly restricted to machine learning and mixed precision scientific computing. In this paper, we show that for the first time, tensor core...
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
In July 2020, the lattice-based CRYSTALS-Dilithium digital signature scheme has been chosen as one of the three third-round finalists in the post-quantum cryptography standardization process by the National Institute of Standards and Technology (NIST). In this work, we present the first Very High Speed Integrated Circuit Hardware Description Language (VHDL) implementation of the CRYSTALS-Dilithium signature scheme for Field-Programmable Gate Arrays (FPGAs). Due to our parallelization-based...
In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial...
We show that lazily Barrett reducing when computing the inverse number theoretic transform (NTT) is optimal.
Lattice-based NIST PQC finalists need efficient multiplication in $\mathbb{Z}_q[x]/(f(x))$. Multiplication in this ring can be performed very efficiently via number theoretic transform (NTT) as done in CRYSTALS-Kyber if the parameters of the scheme allow it. If NTT is not supported, other multiplication algorithms must be employed. For example, if the modulus $q$ of the scheme is a power of two, as in Saber and NTRU, then NTT can not be used directly. In this case, Karatsuba and Toom-Cook...
The Number Theoretic Transform (NTT) is a critical sub-block used in several structured lattice-based schemes, including Kyber and Dilithium, which are finalist candidates in the NIST's standardization process for post-quantum cryptography. The NTT was shown to be susceptible to single trace side-channel attacks by Primas et al. in CHES 2017 and Pessl et al. in Latincrypt 2019 who demonstrated full key recovery from single traces on the ARM Cortex-M4 microcontroller. However, the cost of...
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptography (such as digital signature and key agreement) insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public-key cryptographic protocol that could replace standardized public-key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical...
Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the...
This paper proposes various optimizations for lattice-based key-encapsulation mechanisms (KEM) using the Number Theoretic Transform (NTT) on the popular ARM Cortex-M4 microcontroller. Improvements come in the form of a faster code using more efficient modular reductions, small polynomial multiplications and more aggressive layer merging in the NTT but also reduced stack usage. We test those optimizations in software implementations of Kyber and NewHope, both round 2 candidates in the NIST...
Quantum computers threaten to compromise public-key cryptography schemes such as DSA and ECDSA in polynomial time, which poses an imminent threat to secure signal processing. The cryptography community has responded with the development and standardization of post-quantum cryptography (PQC) algorithms, a class of public-key algorithms based on hard problems that no known quantum algorithms can solve in polynomial time. Ring learning with error (RLWE) lattice- based cryptographic (LBC)...
Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a...
The Number Theoretic Transform (NTT) technique is widely used in the implementation of the cryptographic schemes based on the Ring Learning With Errors problem(RLWE), since it provides efficient algorithm for multiplication of polynomials over the finite field. However, to employ NTT, the finite field is required to have some special root of unity, such as $n$-th root, which makes the module $q$ in RLWE big since we need $q\equiv 1\mod 2n$ to ensure such root exits. At Inscrypt 2018, Zhou...