Search results

2024/1881 (PDF) Last updated: 2024-11-19

THOR: Secure Transformer Inference with Homomorphic Encryption

Jungho Moon, Dongwoo Yoo, Xiaoqian Jiang, Miran Kim

Cryptographic protocols

As language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this, we design THOR, a secure inference framework for transformer models on encrypted data. Specifically, we first propose new fast matrix multiplication algorithms based on diagonal-major order encoding and extend them to parallel matrix computation through the compact ciphertext packing technique. Second, we design efficient protocols for secure computations of...

2024/1827 (PDF) Last updated: 2024-11-07

OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM

Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede

Implementation

The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...

2024/1729 (PDF) Last updated: 2024-10-22

cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Supriya Adhikary, Wai Kong Lee, Angshuman Karmakar, Yongwoo Lee, Seong Oun Hwang, Ramachandra Achar

Implementation

Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a...

2024/1629 (PDF) Last updated: 2024-10-11

Efficient Key-Switching for Word-Type FHE and GPU Acceleration

Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai

Implementation

Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...

2024/1543 (PDF) Last updated: 2024-10-02

HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0

Ali Şah Özcan, Erkay Savaş

Implementation

HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...

2024/1436 (PDF) Last updated: 2024-09-13

Eva: Efficient IVC-Based Authentication of Lossy-Encoded Videos

Chengru Zhang, Xiao Yang, David Oswald, Mark Ryan, Philipp Jovanovic

Applications

With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...

2024/1365 (PDF) Last updated: 2024-08-30

High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature

Shiyu Shen, Hao Yang, Wangchen Dai, Hong Zhang, Zhe Liu, Yunlei Zhao

Implementation

Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees aﬀorded by classical signature schemes. CRYSTALS-Dilithium is an eﬃcient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a...

2024/1246 (PDF) Last updated: 2024-08-06

MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof

Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang

Implementation

Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...

2024/1030 (PDF) Last updated: 2024-06-26

GRASP: Accelerating Hash-based PQC Performance on GPU Parallel Architecture

Yijing Ning, Jiankuo Dong, Jingqiang Lin, Fangyu Zheng, Yu Fu, Zhenjiang Dong, Fu Xiao

Implementation

$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...

2024/744 (PDF) Last updated: 2024-08-28

An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator

Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara

Implementation

In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...

2024/739 (PDF) Last updated: 2024-05-15

BGJ15 Revisited: Sieving with Streamed Memory Access

Ziyu Zhao, Jintai Ding, Bo-Yin Yang

Implementation

The focus of this paper is to tackle the issue of memory access within sieving algorithms for lattice problems. We have conducted an in-depth analysis of an optimized BGJ sieve (Becker-Gama-Joux 2015), and our findings suggest that its inherent structure is significantly more memory-efficient compared to the asymptotically fastest BDGL sieve (Becker-Ducas-Gama-Laarhoven 2016). Specifically, it necessitates merely $2^{0.2075n + o(n)}$ streamed (non-random) main memory accesses for the...

2024/136 (PDF) Last updated: 2024-09-16

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Xinpeng Yang, Lipeng He, Kejia Chen, Wen-jie Lu, Yinghao Wang, Xiaoyang Hou, Jian Liu, Kui Ren, Xiaohu Yang

Cryptographic protocols

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference...

2024/118 (PDF) Last updated: 2024-01-26

Data Privacy Made Easy: Enhancing Applications with Homomorphic Encryption

Charles Gouert, Nektarios Georgios Tsoutsos

Applications

Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption. The first approach, called...

2024/057 (PDF) Last updated: 2024-08-16

Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs

Xudong Zhu, Haoqi He, Zhengbang Yang, Yi Deng, Lutan Zhao, Rui Hou

Implementation

Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...

2023/1522 (PDF) Last updated: 2023-10-06

cuML-DSA: Optimized Signing Procedure and Server-Oriented GPU Design for ML-DSA

Shiyu Shen, Hao Yang, Wenqian Li, Yunlei Zhao

Implementation

The threat posed by quantum computing has precipitated an urgent need for post-quantum cryptography. Recently, the post-quantum digital signature draft FIPS 204 has been published, delineating the details of the ML-DSA, which is derived from the CRYSTALS-Dilithium. Despite these advancements, server environments, especially those equipped with GPU devices necessitating high-throughput signing, remain entrenched in classical schemes. A conspicuous void exists in the realm of GPU...

2023/1429 (PDF) Last updated: 2023-09-21

Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants

Shiyu Shen, Hao Yang, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic Encryption (HE) enhances data security by facilitating computations on encrypted data, opening new paths for privacy-focused computations. The Brakerski-Fan-Vercauteren (BFV) scheme, a promising HE scheme, raises considerable performance challenges. Graphics Processing Units (GPUs), with considerable parallel processing abilities, have emerged as an effective solution. In this work, we present an in-depth study focusing on accelerating and comparing BFV variants on GPUs,...

2023/1428 (PDF) Last updated: 2023-09-21

XNET: A Real-Time Uniﬁed Secure Inference Framework Using Homomorphic Encryption

Hao Yang, Shiyu Shen, Siyang Jiang, Lu Zhou, Wangchen Dai, Yunlei Zhao

Applications

Homomorphic Encryption (HE) presents a promising solution to securing neural networks for Machine Learning as a Service (MLaaS). Despite its potential, the real-time applicability of current HE-based solutions remains a challenge, and the diversity in network structures often results in inefficient implementations and maintenance. To address these issues, we introduce a unified and compact network structure for real-time inference in convolutional neural networks based on HE. We further...

2023/1410 (PDF) Last updated: 2023-10-06

Two Algorithms for Fast GPU Implementation of NTT

Ali Şah Özcan, Erkay Savaş

Implementation

The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...

2023/1194 (PDF) Last updated: 2023-08-06

HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU

Xinyi Ji, Jiankuo Dong, Pinchang Zhang, Deng Tonggui, Hua Jiafeng, Fu Xiao

Implementation

CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...

2023/1149 (PDF) Last updated: 2023-07-25

Analysis of Parallel Implementation of Pilsung Block Cipher On Graphics Processing Unit

Siwoo Eum, Hyunjun Kim, Minho Song, Hwajeong Seo

Implementation

This paper focuses on the GPU implementation of the Pilsung block cipher used in the Red Star 3.0 operating system developed in North Korea. The Pilsung block cipher is designed based on AES. One notable feature of the Pilsung block cipher is that the table calculations required for encryption take longer than the encryption process itself. This paper emphasizes the parallel implementation of the Pilsung block cipher by leveraging the parallel processing capabilities of GPUs and evaluates...

2023/804 (PDF) Last updated: 2023-06-01

Falkor: Federated Learning Secure Aggregation Powered by AES-CTR GPU Implementation

Mariya Georgieva Belorgey, Sofia Dandjee, Nicolas Gama, Dimitar Jetchev, Dmitry Mikushin

Cryptographic protocols

We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...

2023/399 (PDF) Last updated: 2023-03-21

High Throughput Lattice-based Signatures on GPUs: Comparing Falcon and Mitaka

Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang

Implementation

The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...

2023/206 (PDF) Last updated: 2024-05-10

Orca: FSS-based Secure Training and Inference with GPUs

Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, Rahul Sharma

Cryptographic protocols

Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...

2023/049 (PDF) Last updated: 2024-02-19

Phantom: A CUDA-Accelerated Word-Wise Homomorphic Encryption Library

Hao Yang, Shiyu Shen, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic encryption (HE) is a promising technique for privacy-preserving computations, especially the word-wise HE schemes that allow batching. However, the high computational overhead hinders the deployment of HE in real-word applications. GPUs are often used to accelerate execution, but a comprehensive performance comparison of different schemes on the same platform is still missing. In this work, we fill this gap by implementing three word-wise HE schemes BGV, BFV, and CKKS on GPU,...

2022/1621 (PDF) Last updated: 2022-11-21

cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption

Hao Yang, Shiyu Shen, Zhe Liu, Yunlei Zhao

Applications

Private comparison schemes constructed on homomorphic encryption oﬀer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and ﬂoat inputs, oﬀers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...

2022/1464 (PDF) Last updated: 2022-10-26

Parallel Isogeny Path Finding with Limited Memory

Emanuele Bellini, Jorge Chavez-Saab, Jesús-Javier Chi-Domínguez, Andre Esser, Sorina Ionica, Luis Rivera-Zamarripa, Francisco Rodríguez-Henríquez, Monika Trimoska, Floyd Zweydinger

Attacks and cryptanalysis

The security guarantees of most isogeny-based protocols rely on the computational hardness of finding an isogeny between two supersingular isogenous curves defined over a prime field $\mathbb{F}_q$ with $q$ a power of a large prime $p$. In most scenarios, the isogeny is known to be of degree $\ell^e$ for some small prime $\ell$. We call this problem the Supersingular Fixed-Degree Isogeny Path (SIPFD) problem. It is believed that the most general version of SIPFD is not solvable faster than...

2022/1222 (PDF) Last updated: 2022-11-17

Homomorphic Encryption on GPU

Ali Şah Özcan, Can Ayduman, Enes Recep Türkoğlu, Erkay Savaş

Implementation

Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....

2022/999 (PDF) Last updated: 2022-08-03

PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication

Charles. F. Xavier

Foundations

Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented...

2022/633 (PDF) Last updated: 2022-05-23

CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes

Shiyu Shen, Hao Yang, Yu Liu, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...

2021/1389 (PDF) Last updated: 2022-06-13

DPCrypto: Acceleration of Post-quantum Cryptographic Algorithms using Dot-Product Instruction on GPUs

Wai-Kong Lee, Hwajeong Seo, Seong Oun Hwang, Angshuman Karmakar, Jose Maria Bermudo Mera, Ramachandra Achar

Implementation

Dot-product is a widely used operation in many machine learning and scientific computing algorithms. Recently, NVIDIA has introduced dot-product instructions (DP2A and DP4A) in modern GPU architectures, with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper,...

2021/1100 (PDF) Last updated: 2022-10-25

REDsec: Running Encrypted Discretized Neural Networks in Seconds

Lars Folkerts, Charles Gouert, Nektarios Georgios Tsoutsos

Applications

Machine learning as a service (MLaaS) has risen to become a prominent technology due to the large development time, amount of data, hardware costs, and level of expertise required to develop a machine learning model. However, privacy concerns prevent the adoption of MLaaS for applications with sensitive data. A promising privacy preserving solution is to use fully homomorphic encryption (FHE) to perform the ML computations. Recent advancements have lowered computational costs by several...

2021/646 (PDF) Last updated: 2021-05-20

Optimization of Advanced Encryption Standard on Graphics Processing Units

Cihangir Tezcan

Secret-key cryptography

Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than...

2021/533 (PDF) Last updated: 2021-04-23

CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU

Sijun Tan, Brian Knott, Yuan Tian, David J. Wu

Cryptographic protocols

We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be...

2021/508 (PDF) Last updated: 2021-04-23

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs

Wonkyung Jung, Sangpyo Kim, Jung Ho Ahn, Jung Hee Cheon, Younho Lee

Implementation

Fully Homomorphic encryption (FHE) has been gaining popularity as an emerging way of enabling an unlimited number of operations on the encrypted message without decryption. A major drawback of FHE is its high computational cost. Especially, a bootstrapping that refreshes the noise accumulated through consequent FHE operations on the ciphertext is even taking minutes. This significantly limits the practical use of FHE in numerous real applications. By exploiting massive parallelism available...

2021/460 (PDF) Last updated: 2021-04-09

Let’s Take it Offline: Boosting Brute-Force Attacks on iPhone’s User Authentication through SCA

Oleksiy Lisovets, David Knichel, Thorben Moos, Amir Moradi

Implementation

In recent years, smartphones have become an increasingly important storage facility for personal sensitive data ranging from photos and credentials up to financial and medical records like credit cards and person’s diseases. Trivially, it is critical to secure this information and only provide access to the genuine and authenticated user. Smartphone vendors have already taken exceptional care to protect user data by the means of various software and hardware security features like code...

2021/124 (PDF) Last updated: 2021-02-05

Efficient Number Theoretic Transform Implementation on GPU for Homomorphic Encryption

Ozgun Ozerk, Can Elgezen, Ahmet Can Mert, Erdinc Ozturk, Erkay Savas

Implementation

Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...

2020/1265 (PDF) Last updated: 2020-10-14

Revisiting ECM on GPUs

Jonas Wloka, Jan Richter-Brockmann, Colin Stahlke, Thorsten Kleinjung, Christine Priplata, Tim Güneysu

Public-key cryptography

Modern public-key cryptography is a crucial part of our contemporary life where a secure communication channel with another party is needed. With the advance of more powerful computing architectures – especially Graphics Processing Units (GPUs) – traditional approaches like RSA and Diﬃe-Hellman schemes are more and more in danger of being broken. We present a highly optimized implementation of Lenstra’s ECM algorithm customized for GPUs. Our implementation uses state-of-the-art...

2020/1223 (PDF) Last updated: 2021-05-17

Algorithmic Acceleration of B/FV-like Somewhat Homomorphic Encryption for Compute-Enabled RAM

Jonathan Takeshita, Dayane Reis, Ting Gong, Michael Niemier, X. Sharon Hu, Taeho Jung

Implementation

Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is...

2020/1124 (PDF) Last updated: 2020-09-21

Optimized Voronoi-based algorithms for parallel shortest vector computations

Artur Mariano, Filipe Cabeleira, Gabriel Falcao, Luís Paulo Santos

Implementation

This paper addresses V ̈oronoi cell-based algorithms, specifically the ”Relevant Vectors” algorithm, used to solve the Shortest Vector Problem, a fundamental challenge in lattice-based cryptanalysis. Several optimizations are proposed to reduce the execution time of the original algorithm. It is also shown that the algorithm is highly suited for parallel execution on both CPUs and GPUs. The proposed optimizations are based on pruning, i.e., avoiding computations that will not, with high...

2020/1056 (PDF) Last updated: 2022-01-20

Automated enumeration of block cipher differentials: An optimized branch-and-bound GPU framework

Wei-Zhu Yeoh, Je Sen Teh, Jiageng Chen

Secret-key cryptography

Block ciphers are prevalent in various security protocols used daily such as TLS, OpenPGP, and SSH. Their primary purpose is the protection of user data, both in transit and at rest. One of the de facto methods to evaluate block cipher security is differential cryptanalysis. Differential cryptanalysis observes the propagation of input patterns (input differences) through the cipher to produce output patterns (output differences). This probabilistic propagation is known as a differential; the...

2020/1047 (PDF) Last updated: 2020-09-21

Side-channel Attacks with Multi-thread Mixed Leakage

Yiwen Gao, Yongbin Zhou

Side-channel attacks are one of the greatest practical threats to security-related applications, because they are capable of breaking ciphers that are assumed to be mathematically secure. Lots of studies have been devoted to power or electro-magnetic (EM) analysis against desktop CPUs, mobile CPUs (including ARM, MSP, AVR, etc) and FPGAs, but rarely targeted modern GPUs. Modern GPUs feature their special and specific single instruction multiple threads (SIMT) execution fashion, which makes...

2019/161 (PDF) Last updated: 2019-02-20

Understanding Optimizations and Measuring Performances of PBKDF2

Andrea Francesco Iuorio, Andrea Visconti

Implementation

Password-based Key Derivation Functions (KDFs) are used to generate secure keys of arbitrary length implemented in many security-related systems. The strength of these KDFs is the ability to provide countermeasures against brute-force/dictionary attacks. One of the most implemented KDF is PBKDF2. In order to slow attackers down, PBKDF2 uses a salt and introduces computational intensive operations based on an iterated pseudo-random function. Since passwords are widely used to protect personal...

2018/589 (PDF) Last updated: 2019-03-06

Implementation and Performance Evaluation of RNS Variants of the BFV Homomorphic Encryption Scheme

Ahmad Al Badawi, Yuriy Polyakov, Khin Mi Mi Aung, Bharadwaj Veeravalli, Kurt Rohloff

Implementation

Homomorphic encryption is an emerging form of encryption that provides the ability to compute on encrypted data without ever decrypting them. Potential applications include aggregating sensitive encrypted data on a cloud environment and computing on the data in the cloud without compromising data privacy. There have been several recent advances resulting in new homomorphic encryption schemes and optimized variants. We implement and evaluate the performance of two optimized variants, namely...

2016/553 Last updated: 2016-07-11

Storage Efficient Substring Searchable Symmetric Encryption

Iraklis Leontiadis, Ming Li

We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails...

2016/547 (PDF) Last updated: 2016-06-02

Efficient High-Speed WPA2 Brute Force Attacks using Scalable Low-Cost FPGA Clustering

Markus Kammerstetter, Markus Muellner, Daniel Burian, Christian Kudera, Wolfgang Kastner

WPA2-Personal is widely used to protect Wi-Fi networks against illicit access. While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length. Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks. Well known manufacturers such as Elcomsoft have succeeded in creating...

2016/445 (PDF) Last updated: 2017-11-20

SecureMed: Secure Medical Computation using GPU-Accelerated Homomorphic Encryption Scheme

Alhassan Khedr, Glenn Gulak

Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute on this sensitive data while being encrypted. Homomorphic Encryption (HE) presents a very useful tool...

2015/967 (PDF) Last updated: 2016-02-22

Freestart collision for full SHA-1

Marc Stevens, Pierre Karpman, Thomas Peyrin

This article presents an explicit freestart colliding pair for SHA-1, i.e. a collision for its internal compression function. This is the first practical break of the full SHA-1, reaching all 80 out of 80 steps. Only 10 days of computation on a 64-GPU cluster were necessary to perform this attack, for a cost of approximately $2^{57.5}$ calls to the compression function of SHA-1. This work builds on a continuous series of cryptanalytic advancements on SHA-1 since the theoretical collision...

2015/818 (PDF) Last updated: 2015-08-18

cuHE: A Homomorphic Encryption Accelerator Library

Wei Dai, Berk Sunar

Implementation

We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...

2015/678 (PDF) Last updated: 2015-07-06

Optimizing MAKWA on GPU and CPU

Thomas Pornin

Secret-key cryptography

We present here optimized implementations of the MAKWA password hashing function on an AMD Radeon HD 7990 GPU, and compare its efficiency with an Intel i7 4770K CPU for systematic dictionary attacks. We find that the GPU seems to get more hashing done for a given budget, but not by a large amount (the GPU is less than twice as efficient as the CPU). Raising the MAKWA modulus size to 4096 bits, instead of the default 2048 bits, should restore the balance in favour of the CPU. We also find...

2015/530 (PDF) Last updated: 2015-06-05

Practical Free-Start Collision Attacks on 76-step SHA-1

Pierre Karpman, Thomas Peyrin, Marc Stevens

Secret-key cryptography

In this paper we analyze the security of the compression function of SHA-1 against collision attacks, or equivalently free-start collisions on the hash function. While a lot of work has been dedicated to the analysis of SHA-1 in the past decade, this is the first time that free-start collisions have been considered for this function. We exploit the additional freedom provided by this model by using a new start-from-the-middle approach in combination with improvements on the cryptanalysis...

2015/294 (PDF) Last updated: 2015-04-01

Accelerating Somewhat Homomorphic Evaluation using FPGAs

Erdi̇̀nç Öztürk, Yarkın Doröz, Berk Sunar, Erkay Savaş

Implementation

After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...

2014/838 (PDF) Last updated: 2016-04-28

SHIELD: Scalable Homomorphic Implementation of Encrypted Data-Classifiers

Alhassan Khedr, Glenn Gulak, Vinod Vaikuntanathan

Homomorphic encryption (HE) systems enable computations on encrypted data, without decrypting and without knowledge of the secret key. In this work, we describe an optimized Ring Learning With Errors (RLWE) based implementation of a variant of the HE system recently proposed by Gentry, Sahai and Waters (GSW). Although this system was widely believed to be less efficient than its contemporaries, we demonstrate quite the opposite behavior for a large class of applications. We first highlight...

2013/059 (PDF) Last updated: 2013-02-06

Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Ayesha Khalid, Deblin Bagchi, Goutam Paul, Anupam Chattopadhyay

Secret-key cryptography

The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses...

2012/089 (PDF) Last updated: 2012-09-07

ECM at Work

Joppe W. Bos, Thorsten Kleinjung

The performance of the elliptic curve method (ECM) for integer factorization plays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. The efficient arithmetic for Edwards curves found an application by speeding up ECM. We propose techniques based on generating and combining addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. This makes our approach very suitable for...

2012/002 (PDF) Last updated: 2012-01-02

ECC2K-130 on NVIDIA GPUs

Daniel J. Bernstein, Hsieh-Chung Chen, Chen-Mou Cheng, Tanja Lange, Ruben Niederhagen, Peter Schwabe, Bo-Yin Yang

Implementation

A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and GPUs, to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F_2^131 . Optimizations have reduced the cost of the computation to approximately 2^77 bit operations in 2^61 iterations. GPUs are not designed for fast binary-field arithmetic; they are designed for highly vectorizable...

What a lovely hat

Is it made out of tin foil?

55 results sorted by ID

What a lovely hat

Is it made out of tin foil?

Search Help

55 results sorted by ID