cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption

Paper 2022/1621

cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption

Hao Yang, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Shiyu Shen, School of Computer Science, Fudan University, Shanghai, China

Zhe Liu, Research Institute of Basic Theories, Zhejiang lab, Hangzhou, China, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Yunlei Zhao, School of Computer Science, Fudan University, Shanghai, China, State Key Laboratory of Cryptology, Beijing

Abstract

Private comparison schemes constructed on homomorphic encryption oﬀer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and ﬂoat inputs, oﬀers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for CUDA-enabled GPUs. Firstly, we fully utilize the shared memory and present compact GPU implementations of NTT and INTT using a single block; Secondly, we fuse multiple kernels into one AKS kernel, which conducts the automorphism and key switching operation, and reduce the grid dimension for better resource usage, data access rate and synchronization. Thirdly, we precisely measure the IO latency and choose an appropriate number of CUDA streams to enable concurrent execution of independent operations, yielding a constant term extraction kernel with perfect latency hide, i.e., CTX. Combining these approaches, we boost the overall execution time to optimum level and the speedup ratio increases with the comparison scales. For one comparison, we speedup the AKS by 23.71×, CTX by 15.58×, and scheme by 1.83× (resp., 18.29×, 11.75×, and 1.42×) compared to C (resp., AVX512) baselines, respectively. For 32 comparisons, our CTX and scheme implementations outperform the C (resp., AVX512) baselines by 112.00× and 1.99× (resp., 81.53× and 1.51×).

Metadata

Available format(s): PDF
Category: Applications
Publication info: Preprint.
Keywords: Private comparison Homomorphic encryption GPU optimization Number theoretic transform Key switching
Contact author(s): crypto @ d4rk dev
shenshiyu21 @ m fudan edu cn
zhe liu @ nuaa edu cn
ylzhao @ fudan edu cn
History: 2022-11-21: approved; 2022-11-21: received; See all versions
Short URL: https://ia.cr/2022/1621
License: CC BY

BibTeX

@misc{cryptoeprint:2022/1621,
      author = {Hao Yang and Shiyu Shen and Zhe Liu and Yunlei Zhao},
      title = {{cuXCMP}: {CUDA}-Accelerated Private Comparison Based on Homomorphic Encryption},
      howpublished = {Cryptology {ePrint} Archive, Paper 2022/1621},
      year = {2022},
      url = {https://eprint.iacr.org/2022/1621}
}

What a lovely hat

Is it made out of tin foil?

Paper 2022/1621