55 results sorted by ID
Possible spell-corrected query: cpu optimization
THOR: Secure Transformer Inference with Homomorphic Encryption
Jungho Moon, Dongwoo Yoo, Xiaoqian Jiang, Miran Kim
Cryptographic protocols
As language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this, we design THOR, a secure inference framework for transformer models on encrypted data. Specifically, we first propose new fast matrix multiplication algorithms based on diagonal-major order encoding and extend them to parallel matrix computation through the compact ciphertext packing technique. Second, we design efficient protocols for secure computations of...
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications
Supriya Adhikary, Wai Kong Lee, Angshuman Karmakar, Yongwoo Lee, Seong Oun Hwang, Ramachandra Achar
Implementation
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai
Implementation
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0
Ali Şah Özcan, Erkay Savaş
Implementation
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
Eva: Efficient IVC-Based Authentication of Lossy-Encoded Videos
Chengru Zhang, Xiao Yang, David Oswald, Mark Ryan, Philipp Jovanovic
Applications
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature
Shiyu Shen, Hao Yang, Wangchen Dai, Hong Zhang, Zhe Liu, Yunlei Zhao
Implementation
Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees afforded by classical signature schemes. CRYSTALS-Dilithium is an efficient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a...
MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof
Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang
Implementation
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
GRASP: Accelerating Hash-based PQC Performance on GPU Parallel Architecture
Yijing Ning, Jiankuo Dong, Jingqiang Lin, Fangyu Zheng, Yu Fu, Zhenjiang Dong, Fu Xiao
Implementation
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...
An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator
Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara
Implementation
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
BGJ15 Revisited: Sieving with Streamed Memory Access
Ziyu Zhao, Jintai Ding, Bo-Yin Yang
Implementation
The focus of this paper is to tackle the issue of memory access within sieving algorithms for lattice problems. We have conducted an in-depth analysis of an optimized BGJ sieve (Becker-Gama-Joux 2015), and our findings suggest that its inherent structure is significantly more memory-efficient compared to the asymptotically fastest BDGL sieve (Becker-Ducas-Gama-Laarhoven 2016). Specifically, it necessitates merely $2^{0.2075n + o(n)}$ streamed (non-random) main memory accesses for the...
Secure Transformer Inference Made Non-interactive
Jiawen Zhang, Xinpeng Yang, Lipeng He, Kejia Chen, Wen-jie Lu, Yinghao Wang, Xiaoyang Hou, Jian Liu, Kui Ren, Xiaohu Yang
Cryptographic protocols
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server.
In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference...
Data Privacy Made Easy: Enhancing Applications with Homomorphic Encryption
Charles Gouert, Nektarios Georgios Tsoutsos
Applications
Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption.
The first approach, called...
Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs
Xudong Zhu, Haoqi He, Zhengbang Yang, Yi Deng, Lutan Zhao, Rui Hou
Implementation
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...
cuML-DSA: Optimized Signing Procedure and Server-Oriented GPU Design for ML-DSA
Shiyu Shen, Hao Yang, Wenqian Li, Yunlei Zhao
Implementation
The threat posed by quantum computing has precipitated an urgent need for post-quantum cryptography. Recently, the post-quantum digital signature draft FIPS 204 has been published, delineating the details of the ML-DSA, which is derived from the CRYSTALS-Dilithium. Despite these advancements, server environments, especially those equipped with GPU devices necessitating high-throughput signing, remain entrenched in classical schemes. A conspicuous void exists in the realm of GPU...
Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants
Shiyu Shen, Hao Yang, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao
Implementation
Homomorphic Encryption (HE) enhances data security by facilitating computations on encrypted data, opening new paths for privacy-focused computations. The Brakerski-Fan-Vercauteren (BFV) scheme, a promising HE scheme, raises considerable performance challenges. Graphics Processing Units (GPUs), with considerable parallel processing abilities, have emerged as an effective solution.
In this work, we present an in-depth study focusing on accelerating and comparing BFV variants on GPUs,...
XNET: A Real-Time Unified Secure Inference Framework Using Homomorphic Encryption
Hao Yang, Shiyu Shen, Siyang Jiang, Lu Zhou, Wangchen Dai, Yunlei Zhao
Applications
Homomorphic Encryption (HE) presents a promising solution to securing neural networks for Machine Learning as a Service (MLaaS). Despite its potential, the real-time applicability of current HE-based solutions remains a challenge, and the diversity in network structures often results in inefficient implementations and maintenance. To address these issues, we introduce a unified and compact network structure for real-time inference in convolutional neural networks based on HE. We further...
Two Algorithms for Fast GPU Implementation of NTT
Ali Şah Özcan, Erkay Savaş
Implementation
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...
HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU
Xinyi Ji, Jiankuo Dong, Pinchang Zhang, Deng Tonggui, Hua Jiafeng, Fu Xiao
Implementation
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...
Analysis of Parallel Implementation of Pilsung Block Cipher On Graphics Processing Unit
Siwoo Eum, Hyunjun Kim, Minho Song, Hwajeong Seo
Implementation
This paper focuses on the GPU implementation of the Pilsung block cipher used in the Red Star 3.0 operating system developed in North Korea. The Pilsung block cipher is designed based on AES. One notable feature of the Pilsung block cipher is that the table calculations required for encryption take longer than the encryption process itself. This paper emphasizes the parallel implementation of the Pilsung block cipher by leveraging the parallel processing capabilities of GPUs and evaluates...
Falkor: Federated Learning Secure Aggregation Powered by AES-CTR GPU Implementation
Mariya Georgieva Belorgey, Sofia Dandjee, Nicolas Gama, Dimitar Jetchev, Dmitry Mikushin
Cryptographic protocols
We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...
High Throughput Lattice-based Signatures on GPUs: Comparing Falcon and Mitaka
Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang
Implementation
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...
Orca: FSS-based Secure Training and Inference with GPUs
Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, Rahul Sharma
Cryptographic protocols
Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...
Phantom: A CUDA-Accelerated Word-Wise Homomorphic Encryption Library
Hao Yang, Shiyu Shen, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao
Implementation
Homomorphic encryption (HE) is a promising technique for privacy-preserving computations, especially the word-wise HE schemes that allow batching. However, the high computational overhead hinders the deployment of HE in real-word applications. GPUs are often used to accelerate execution, but a comprehensive performance comparison of different schemes on the same platform is still missing.
In this work, we fill this gap by implementing three word-wise HE schemes BGV, BFV, and CKKS on GPU,...
cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption
Hao Yang, Shiyu Shen, Zhe Liu, Yunlei Zhao
Applications
Private comparison schemes constructed on homomorphic encryption offer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and float inputs, offers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...
Parallel Isogeny Path Finding with Limited Memory
Emanuele Bellini, Jorge Chavez-Saab, Jesús-Javier Chi-Domínguez, Andre Esser, Sorina Ionica, Luis Rivera-Zamarripa, Francisco Rodríguez-Henríquez, Monika Trimoska, Floyd Zweydinger
Attacks and cryptanalysis
The security guarantees of most isogeny-based protocols rely on the computational hardness of finding an isogeny between two supersingular isogenous curves defined over a prime field $\mathbb{F}_q$ with $q$ a power of a large prime $p$. In most scenarios, the isogeny is known to be of degree $\ell^e$ for some small prime $\ell$. We call this problem the Supersingular Fixed-Degree Isogeny Path (SIPFD) problem.
It is believed that the most general version of SIPFD is not solvable faster than...
Homomorphic Encryption on GPU
Ali Şah Özcan, Can Ayduman, Enes Recep Türkoğlu, Erkay Savaş
Implementation
Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....
PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication
Charles. F. Xavier
Foundations
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck.
In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest.
We implemented...
CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes
Shiyu Shen, Hao Yang, Yu Liu, Zhe Liu, Yunlei Zhao
Implementation
Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...
DPCrypto: Acceleration of Post-quantum Cryptographic Algorithms using Dot-Product Instruction on GPUs
Wai-Kong Lee, Hwajeong Seo, Seong Oun Hwang, Angshuman Karmakar, Jose Maria Bermudo Mera, Ramachandra Achar
Implementation
Dot-product is a widely used operation in many machine learning and scientific computing algorithms. Recently, NVIDIA has introduced dot-product instructions (DP2A and DP4A) in modern GPU architectures, with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper,...
REDsec: Running Encrypted Discretized Neural Networks in Seconds
Lars Folkerts, Charles Gouert, Nektarios Georgios Tsoutsos
Applications
Machine learning as a service (MLaaS) has risen to become a prominent technology due to the large development time, amount of data, hardware costs, and level of expertise required to develop a machine learning model. However, privacy concerns prevent the adoption of MLaaS for applications with sensitive data. A promising privacy preserving solution is to use fully homomorphic encryption (FHE) to perform the ML computations. Recent advancements have lowered computational costs by several...
Optimization of Advanced Encryption Standard on Graphics Processing Units
Cihangir Tezcan
Secret-key cryptography
Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than...
CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
Sijun Tan, Brian Knott, Yuan Tian, David J. Wu
Cryptographic protocols
We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be...
Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs
Wonkyung Jung, Sangpyo Kim, Jung Ho Ahn, Jung Hee Cheon, Younho Lee
Implementation
Fully Homomorphic encryption (FHE) has been gaining popularity as an emerging way of enabling an unlimited number of operations on the encrypted message without decryption. A major drawback of FHE is its high computational cost. Especially, a bootstrapping that refreshes the noise accumulated through consequent FHE operations on the ciphertext is even taking minutes. This significantly limits the practical use of FHE in numerous real applications.
By exploiting massive parallelism available...
Let’s Take it Offline: Boosting Brute-Force Attacks on iPhone’s User Authentication through SCA
Oleksiy Lisovets, David Knichel, Thorben Moos, Amir Moradi
Implementation
In recent years, smartphones have become an increasingly important
storage facility for personal sensitive data ranging from photos and credentials up
to financial and medical records like credit cards and person’s diseases. Trivially,
it is critical to secure this information and only provide access to the genuine and
authenticated user. Smartphone vendors have already taken exceptional care to
protect user data by the means of various software and hardware security features
like code...
Efficient Number Theoretic Transform Implementation on GPU for Homomorphic Encryption
Ozgun Ozerk, Can Elgezen, Ahmet Can Mert, Erdinc Ozturk, Erkay Savas
Implementation
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
Revisiting ECM on GPUs
Jonas Wloka, Jan Richter-Brockmann, Colin Stahlke, Thorsten Kleinjung, Christine Priplata, Tim Güneysu
Public-key cryptography
Modern public-key cryptography is a crucial part of our contemporary life where a secure communication channel with another party is needed. With the advance of more powerful computing architectures – especially Graphics Processing Units (GPUs) – traditional approaches like RSA and Diffie-Hellman schemes are more and more in danger of being broken.
We present a highly optimized implementation of Lenstra’s ECM algorithm customized for GPUs. Our implementation uses state-of-the-art...
Algorithmic Acceleration of B/FV-like Somewhat Homomorphic Encryption for Compute-Enabled RAM
Jonathan Takeshita, Dayane Reis, Ting Gong, Michael Niemier, X. Sharon Hu, Taeho Jung
Implementation
Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is...
Optimized Voronoi-based algorithms for parallel shortest vector computations
Artur Mariano, Filipe Cabeleira, Gabriel Falcao, Luís Paulo Santos
Implementation
This paper addresses V ̈oronoi cell-based algorithms, specifically the ”Relevant Vectors” algorithm, used to solve the Shortest Vector Problem, a fundamental challenge in lattice-based cryptanalysis. Several optimizations are proposed to reduce the execution time of the original algorithm. It is also shown that the algorithm is highly suited for parallel execution on
both CPUs and GPUs. The proposed optimizations are based on pruning, i.e., avoiding computations that will not,
with high...
Automated enumeration of block cipher differentials: An optimized branch-and-bound GPU framework
Wei-Zhu Yeoh, Je Sen Teh, Jiageng Chen
Secret-key cryptography
Block ciphers are prevalent in various security protocols used daily such as TLS, OpenPGP, and SSH. Their primary purpose is the protection of user data, both in transit and at rest. One of the de facto methods to evaluate block cipher security is differential cryptanalysis. Differential cryptanalysis observes the propagation of input patterns (input differences) through the cipher to produce output patterns (output differences). This probabilistic propagation is known as a differential; the...
Side-channel Attacks with Multi-thread Mixed Leakage
Yiwen Gao, Yongbin Zhou
Side-channel attacks are one of the greatest practical threats to security-related applications, because they are capable of breaking ciphers that are assumed to be mathematically secure. Lots of studies have been devoted to power or electro-magnetic (EM) analysis against desktop CPUs, mobile CPUs (including ARM, MSP, AVR, etc) and FPGAs, but rarely targeted modern GPUs. Modern GPUs feature their special and specific single instruction multiple threads (SIMT) execution fashion, which makes...
Understanding Optimizations and Measuring Performances of PBKDF2
Andrea Francesco Iuorio, Andrea Visconti
Implementation
Password-based Key Derivation Functions (KDFs) are used to generate secure keys of arbitrary length implemented in many security-related systems.
The strength of these KDFs is the ability to provide countermeasures against brute-force/dictionary attacks.
One of the most implemented KDF is PBKDF2. In order to slow attackers down,
PBKDF2 uses a salt and introduces computational intensive operations based on an iterated pseudo-random function.
Since passwords are widely used to protect personal...
Implementation and Performance Evaluation of RNS Variants of the BFV Homomorphic Encryption Scheme
Ahmad Al Badawi, Yuriy Polyakov, Khin Mi Mi Aung, Bharadwaj Veeravalli, Kurt Rohloff
Implementation
Homomorphic encryption is an emerging form of encryption that provides the ability to compute on encrypted data without ever decrypting them. Potential applications include aggregating sensitive encrypted data on a cloud environment and computing on the data in the cloud without compromising data privacy. There have been several recent advances resulting in new homomorphic encryption schemes and optimized variants. We implement and evaluate the performance of two optimized variants, namely...
2016/553
Last updated: 2016-07-11
Storage Efficient Substring Searchable Symmetric Encryption
Iraklis Leontiadis, Ming Li
We address the problem of substring searchable encryption. A single user produces a big stream of data and later
on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures
to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring
searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails...
Efficient High-Speed WPA2 Brute Force Attacks using Scalable Low-Cost FPGA Clustering
Markus Kammerstetter, Markus Muellner, Daniel Burian, Christian Kudera, Wolfgang Kastner
WPA2-Personal is widely used to protect Wi-Fi networks against illicit access.
While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length.
Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks.
Well known manufacturers such as Elcomsoft have succeeded in creating...
SecureMed: Secure Medical Computation using GPU-Accelerated Homomorphic Encryption Scheme
Alhassan Khedr, Glenn Gulak
Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute on this sensitive data while being encrypted. Homomorphic Encryption (HE) presents a very useful tool...
Freestart collision for full SHA-1
Marc Stevens, Pierre Karpman, Thomas Peyrin
This article presents an explicit freestart colliding pair for SHA-1, i.e. a collision for its internal compression function. This is the first practical break of the full SHA-1, reaching all 80 out of 80 steps.
Only 10 days of computation on a 64-GPU cluster were necessary to perform this attack, for a cost of approximately $2^{57.5}$ calls to the compression function of SHA-1.
This work builds on a continuous series of cryptanalytic advancements on SHA-1 since the theoretical collision...
cuHE: A Homomorphic Encryption Accelerator Library
Wei Dai, Berk Sunar
Implementation
We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...
Optimizing MAKWA on GPU and CPU
Thomas Pornin
Secret-key cryptography
We present here optimized implementations of the MAKWA password hashing
function on an AMD Radeon HD 7990 GPU, and compare its efficiency with an Intel
i7 4770K CPU for systematic dictionary attacks. We find that the GPU seems to
get more hashing done for a given budget, but not by a large amount (the GPU is less
than twice as efficient as the CPU). Raising the MAKWA modulus size to 4096 bits,
instead of the default 2048 bits, should restore the balance in favour of the CPU. We
also find...
Practical Free-Start Collision Attacks on 76-step SHA-1
Pierre Karpman, Thomas Peyrin, Marc Stevens
Secret-key cryptography
In this paper we analyze the security of the compression function of SHA-1 against collision attacks, or equivalently free-start collisions on the hash function. While a lot of work has been dedicated to the analysis of SHA-1 in the past decade, this is the first time that free-start collisions have been considered for this function.
We exploit the additional freedom provided by this model by using a new start-from-the-middle approach in combination with improvements on the cryptanalysis...
Accelerating Somewhat Homomorphic Evaluation using FPGAs
Erdi̇̀nç Öztürk, Yarkın Doröz, Berk Sunar, Erkay Savaş
Implementation
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
SHIELD: Scalable Homomorphic Implementation of Encrypted Data-Classifiers
Alhassan Khedr, Glenn Gulak, Vinod Vaikuntanathan
Homomorphic encryption (HE) systems enable computations on encrypted data, without decrypting and without knowledge of the secret key. In this work, we describe an optimized Ring Learning With Errors (RLWE) based implementation of a variant of the HE system recently proposed by Gentry, Sahai and Waters (GSW). Although this system was widely believed to be less efficient than its contemporaries, we demonstrate quite the opposite behavior for a large class of applications.
We first highlight...
Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers
Ayesha Khalid, Deblin Bagchi, Goutam Paul, Anupam Chattopadhyay
Secret-key cryptography
The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses...
ECM at Work
Joppe W. Bos, Thorsten Kleinjung
The performance of the elliptic curve method (ECM) for integer factorization plays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. The efficient arithmetic for Edwards curves found an application by speeding up ECM. We propose techniques based on generating and combining addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. This makes our approach very suitable for...
ECC2K-130 on NVIDIA GPUs
Daniel J. Bernstein, Hsieh-Chung Chen, Chen-Mou Cheng, Tanja Lange, Ruben Niederhagen, Peter Schwabe, Bo-Yin Yang
Implementation
A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and GPUs, to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F_2^131 . Optimizations have reduced the cost of the computation to approximately 2^77 bit operations in 2^61 iterations.
GPUs are not designed for fast binary-field arithmetic; they are designed for highly vectorizable...
As language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this, we design THOR, a secure inference framework for transformer models on encrypted data. Specifically, we first propose new fast matrix multiplication algorithms based on diagonal-major order encoding and extend them to parallel matrix computation through the compact ciphertext packing technique. Second, we design efficient protocols for secure computations of...
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees afforded by classical signature schemes. CRYSTALS-Dilithium is an efficient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
The focus of this paper is to tackle the issue of memory access within sieving algorithms for lattice problems. We have conducted an in-depth analysis of an optimized BGJ sieve (Becker-Gama-Joux 2015), and our findings suggest that its inherent structure is significantly more memory-efficient compared to the asymptotically fastest BDGL sieve (Becker-Ducas-Gama-Laarhoven 2016). Specifically, it necessitates merely $2^{0.2075n + o(n)}$ streamed (non-random) main memory accesses for the...
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference...
Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption. The first approach, called...
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...
The threat posed by quantum computing has precipitated an urgent need for post-quantum cryptography. Recently, the post-quantum digital signature draft FIPS 204 has been published, delineating the details of the ML-DSA, which is derived from the CRYSTALS-Dilithium. Despite these advancements, server environments, especially those equipped with GPU devices necessitating high-throughput signing, remain entrenched in classical schemes. A conspicuous void exists in the realm of GPU...
Homomorphic Encryption (HE) enhances data security by facilitating computations on encrypted data, opening new paths for privacy-focused computations. The Brakerski-Fan-Vercauteren (BFV) scheme, a promising HE scheme, raises considerable performance challenges. Graphics Processing Units (GPUs), with considerable parallel processing abilities, have emerged as an effective solution. In this work, we present an in-depth study focusing on accelerating and comparing BFV variants on GPUs,...
Homomorphic Encryption (HE) presents a promising solution to securing neural networks for Machine Learning as a Service (MLaaS). Despite its potential, the real-time applicability of current HE-based solutions remains a challenge, and the diversity in network structures often results in inefficient implementations and maintenance. To address these issues, we introduce a unified and compact network structure for real-time inference in convolutional neural networks based on HE. We further...
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...
This paper focuses on the GPU implementation of the Pilsung block cipher used in the Red Star 3.0 operating system developed in North Korea. The Pilsung block cipher is designed based on AES. One notable feature of the Pilsung block cipher is that the table calculations required for encryption take longer than the encryption process itself. This paper emphasizes the parallel implementation of the Pilsung block cipher by leveraging the parallel processing capabilities of GPUs and evaluates...
We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...
Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...
Homomorphic encryption (HE) is a promising technique for privacy-preserving computations, especially the word-wise HE schemes that allow batching. However, the high computational overhead hinders the deployment of HE in real-word applications. GPUs are often used to accelerate execution, but a comprehensive performance comparison of different schemes on the same platform is still missing. In this work, we fill this gap by implementing three word-wise HE schemes BGV, BFV, and CKKS on GPU,...
Private comparison schemes constructed on homomorphic encryption offer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and float inputs, offers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...
The security guarantees of most isogeny-based protocols rely on the computational hardness of finding an isogeny between two supersingular isogenous curves defined over a prime field $\mathbb{F}_q$ with $q$ a power of a large prime $p$. In most scenarios, the isogeny is known to be of degree $\ell^e$ for some small prime $\ell$. We call this problem the Supersingular Fixed-Degree Isogeny Path (SIPFD) problem. It is believed that the most general version of SIPFD is not solvable faster than...
Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented...
Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...
Dot-product is a widely used operation in many machine learning and scientific computing algorithms. Recently, NVIDIA has introduced dot-product instructions (DP2A and DP4A) in modern GPU architectures, with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper,...
Machine learning as a service (MLaaS) has risen to become a prominent technology due to the large development time, amount of data, hardware costs, and level of expertise required to develop a machine learning model. However, privacy concerns prevent the adoption of MLaaS for applications with sensitive data. A promising privacy preserving solution is to use fully homomorphic encryption (FHE) to perform the ML computations. Recent advancements have lowered computational costs by several...
Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than...
We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be...
Fully Homomorphic encryption (FHE) has been gaining popularity as an emerging way of enabling an unlimited number of operations on the encrypted message without decryption. A major drawback of FHE is its high computational cost. Especially, a bootstrapping that refreshes the noise accumulated through consequent FHE operations on the ciphertext is even taking minutes. This significantly limits the practical use of FHE in numerous real applications. By exploiting massive parallelism available...
In recent years, smartphones have become an increasingly important storage facility for personal sensitive data ranging from photos and credentials up to financial and medical records like credit cards and person’s diseases. Trivially, it is critical to secure this information and only provide access to the genuine and authenticated user. Smartphone vendors have already taken exceptional care to protect user data by the means of various software and hardware security features like code...
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
Modern public-key cryptography is a crucial part of our contemporary life where a secure communication channel with another party is needed. With the advance of more powerful computing architectures – especially Graphics Processing Units (GPUs) – traditional approaches like RSA and Diffie-Hellman schemes are more and more in danger of being broken. We present a highly optimized implementation of Lenstra’s ECM algorithm customized for GPUs. Our implementation uses state-of-the-art...
Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is...
This paper addresses V ̈oronoi cell-based algorithms, specifically the ”Relevant Vectors” algorithm, used to solve the Shortest Vector Problem, a fundamental challenge in lattice-based cryptanalysis. Several optimizations are proposed to reduce the execution time of the original algorithm. It is also shown that the algorithm is highly suited for parallel execution on both CPUs and GPUs. The proposed optimizations are based on pruning, i.e., avoiding computations that will not, with high...
Block ciphers are prevalent in various security protocols used daily such as TLS, OpenPGP, and SSH. Their primary purpose is the protection of user data, both in transit and at rest. One of the de facto methods to evaluate block cipher security is differential cryptanalysis. Differential cryptanalysis observes the propagation of input patterns (input differences) through the cipher to produce output patterns (output differences). This probabilistic propagation is known as a differential; the...
Side-channel attacks are one of the greatest practical threats to security-related applications, because they are capable of breaking ciphers that are assumed to be mathematically secure. Lots of studies have been devoted to power or electro-magnetic (EM) analysis against desktop CPUs, mobile CPUs (including ARM, MSP, AVR, etc) and FPGAs, but rarely targeted modern GPUs. Modern GPUs feature their special and specific single instruction multiple threads (SIMT) execution fashion, which makes...
Password-based Key Derivation Functions (KDFs) are used to generate secure keys of arbitrary length implemented in many security-related systems. The strength of these KDFs is the ability to provide countermeasures against brute-force/dictionary attacks. One of the most implemented KDF is PBKDF2. In order to slow attackers down, PBKDF2 uses a salt and introduces computational intensive operations based on an iterated pseudo-random function. Since passwords are widely used to protect personal...
Homomorphic encryption is an emerging form of encryption that provides the ability to compute on encrypted data without ever decrypting them. Potential applications include aggregating sensitive encrypted data on a cloud environment and computing on the data in the cloud without compromising data privacy. There have been several recent advances resulting in new homomorphic encryption schemes and optimized variants. We implement and evaluate the performance of two optimized variants, namely...
We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails...
WPA2-Personal is widely used to protect Wi-Fi networks against illicit access. While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length. Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks. Well known manufacturers such as Elcomsoft have succeeded in creating...
Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute on this sensitive data while being encrypted. Homomorphic Encryption (HE) presents a very useful tool...
This article presents an explicit freestart colliding pair for SHA-1, i.e. a collision for its internal compression function. This is the first practical break of the full SHA-1, reaching all 80 out of 80 steps. Only 10 days of computation on a 64-GPU cluster were necessary to perform this attack, for a cost of approximately $2^{57.5}$ calls to the compression function of SHA-1. This work builds on a continuous series of cryptanalytic advancements on SHA-1 since the theoretical collision...
We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...
We present here optimized implementations of the MAKWA password hashing function on an AMD Radeon HD 7990 GPU, and compare its efficiency with an Intel i7 4770K CPU for systematic dictionary attacks. We find that the GPU seems to get more hashing done for a given budget, but not by a large amount (the GPU is less than twice as efficient as the CPU). Raising the MAKWA modulus size to 4096 bits, instead of the default 2048 bits, should restore the balance in favour of the CPU. We also find...
In this paper we analyze the security of the compression function of SHA-1 against collision attacks, or equivalently free-start collisions on the hash function. While a lot of work has been dedicated to the analysis of SHA-1 in the past decade, this is the first time that free-start collisions have been considered for this function. We exploit the additional freedom provided by this model by using a new start-from-the-middle approach in combination with improvements on the cryptanalysis...
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
Homomorphic encryption (HE) systems enable computations on encrypted data, without decrypting and without knowledge of the secret key. In this work, we describe an optimized Ring Learning With Errors (RLWE) based implementation of a variant of the HE system recently proposed by Gentry, Sahai and Waters (GSW). Although this system was widely believed to be less efficient than its contemporaries, we demonstrate quite the opposite behavior for a large class of applications. We first highlight...
The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses...
The performance of the elliptic curve method (ECM) for integer factorization plays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. The efficient arithmetic for Edwards curves found an application by speeding up ECM. We propose techniques based on generating and combining addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. This makes our approach very suitable for...
A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and GPUs, to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F_2^131 . Optimizations have reduced the cost of the computation to approximately 2^77 bit operations in 2^61 iterations. GPUs are not designed for fast binary-field arithmetic; they are designed for highly vectorizable...