research-article

CUDA Leaks: A Detailed Hack for CUDA and a (Partial) Fix

Authors:

Roberto Di Pietro,

Flavio Lombardi,

Antonio VillaniAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 15, Issue 1

Article No.: 15, Pages 1 - 25

https://doi.org/10.1145/2801153

Published: 13 January 2016 Publication History

Abstract

Graphics processing units (GPUs) are increasingly common on desktops, servers, and embedded platforms. In this article, we report on new security issues related to CUDA, which is the most widespread platform for GPU computing. In particular, details and proofs-of-concept are provided about novel vulnerabilities to which CUDA architectures are subject. We show how such vulnerabilities can be exploited to cause severe information leakage. As a case study, we experimentally show how to exploit one of these vulnerabilities on a GPU implementation of the AES encryption algorithm. Finally, we also suggest software patches and alternative approaches to tackle the presented vulnerabilities.

References

[1]

Najwa Aaraj, Anand Raghunathan, and Niraj K. Jha. 2011. A framework for defending embedded systems against software attacks. ACM Transactions on Embedded Computing Systems 10, 3, Article No. 33.

Digital Library

[2]

Alessandro Barenghi, Gerardo Pelosi, and Yannick Teglia. 2011. Information leakage discovery techniques to enhance secure chip design. In Information Security Theory and Practice: Security and Privacy of Mobile Devices in Wireless Communication. Lecture Notes in Computer Science, Vol. 6633. Springer, 128--143.

Digital Library

[3]

Spiridon F. Beldianu and Sotirios G. Ziavras. 2013. Multicore-based vector coprocessor sharing for performance and energy gains. ACM Transactions on Embedded Computing Systems 13, 2, Article No. 17.

Digital Library

[4]

Nick Black and Jason Rodzik. 2010. My Other Computer Is Your GPU: System-Centric CUDA Threat Modeling with CUBAR. Retrieved December 26, 2015, from http://nick-black.com/dankwiki/images/ d/d2/Cubar2010.pdf.

[5]

Lilian Bossuet, Michael Grand, Lubos Gaspar, Viktor Fischer, and Guy Gogniat. 2013. Architectures of flexible symmetric key crypto engines—a survey: From hardware coprocessor to multi-crypto-processor system on chip. ACM Computing Surveys 45, 4, Article No. 41.

Digital Library

[6]

Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics 23, 3, 777--786.

Digital Library

[7]

Wu Chun Feng and Shucai Xiao. 2010. To GPU synchronize or not GPU synchronize? In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS'10). 3801--3804.

[8]

A. Di Biagio, A. Barenghi, G. Agosta, and G. Pelosi. 2009. Design of a parallel AES for graphics hardware using the CUDA framework. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS'09). 1--8.

Digital Library

[9]

Jack Dongarra, Erich Strohmaier, and Horst Simon. 1993. TOP500 Supercomputing Sites. Retrieved December 26, 2015, from http://www.top500.org.

[10]

Paolo D'Arco and Angel Perez del Pozo. 2013. Toward tracing and revoking schemes secure against collusion and any form of secret information leakage. International Journal of Information Security 12, 1, 1--17.

Digital Library

[11]

Donald Evans, Phillip Bond, and Arden Bement. 1994. FIPS PUB 140-2: Security Requirements for Cryptographic Modules. Available at http://www.csrc.nist.gov.

[12]

Abhijeet Gaikwad and Ioane Muni Toke. 2010. Parallel iterative linear solvers on GPU: A financial engineering case. In Proceedings of the 18th Euromicro PDP Conference. IEEE, Los Alamitos, CA, 607--614.

Digital Library

[13]

Shi Guochun. 2012. CUDA Wrapper Library. Available at http://cudawrapper.sourceforge.net.

[14]

Michael Henson and Stephen Taylor. 2014. Memory encryption: A survey of existing techniques. ACM Computing Surveys 46, 4, Article No. 53.

Digital Library

[15]

Howard M. Heys. 2002. A tutorial on linear and differential cryptanalysis. Cryptologia 26, 3, 189--221.

Digital Library

[16]

Byunghyun Jang, Dana Schaa, Perhaad Mistry, and David Kaeli. 2011b. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Transactions on Parallel and Distributed Systems 22, 1, 105--118.

Digital Library

[17]

Keon Jang, Sangjin Han, Seungyeop Han, Sue Moon, and Kyoung Soo Park. 2011a. SSLShader: Cheap SSL acceleration with commodity processors. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). 1.

Digital Library

[18]

Shinpei Kato. 2012. Gdev. Retrieved December 26, 2015, from https://github.com/shinpei0208/gdev.

[19]

Shinpei Kato, Karthik Lakshmanan, Ragunathan Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIXATC'11). 2--16.

Digital Library

[20]

Junsung Kim, Ragunathan (Raj) Rajkumar, and Shinpei Kato. 2013. Towards adaptive GPU resource management for embedded real-time systems. ACM SIGBED Review 10, 1, 14--17.

Digital Library

[21]

V. V. Kindratenko, J. J. Enos, G. Shi, M. T. Showerman, G. W. Arnold, J. E. Stone, J. C. Phillips, and W.-M. Hwu. 2009. GPU clusters for high-performance computing. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops (CLUSTER'09). 1--8. CLUSTR.2009.5289128

[22]

Paul Kocher, Joshua Jaffe, and Benjamin Jun. 1999. Differential Power Analysis. Springer-Verlag.

[23]

Robert Kotcher, Yutong Pei, Pranjal Jumde, and Collin Jackson. 2013. Cross-origin pixel stealing: Timing attacks using CSS filters. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS'13). ACM, New York, NY, 1055--1062.

Digital Library

[24]

Michael Larabel. 2012. NVIDIA Linux Driver Hack Gives You Root Access. Retrieved December 26, 2015, from http://www.phoronix.com/scan.php?page=news_item&px=MTE1MTk.

[25]

Ruby B. Lee, Peter C. S. Kwan, John P. McGregor, Jeffrey Dwoskin, and Zhenghong Wang. 2005. Architecture for protecting critical secrets in microprocessors. SIGARCH Computer Architecture News 33, 2, 2--13.

Digital Library

[26]

Sangho Lee, Youngsok Kim, Jangwoo Kim, and Jong Kim. 2014. Stealing webpages rendered on your browser by exploiting GPU vulnerabilities. In Proceedings of the 35th IEEE Symposium on Security and Privacy (S&P'14).

Digital Library

[27]

Flavio Lombardi and Roberto Di Pietro. 2010. CUDACS: Securing the cloud with CUDA-enabled secure virtualization. In Proceedings of the 12th International Conference on Information and Communications Security (ICICS'10). 92--106.

Digital Library

[28]

Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2014. Confidentiality issues on a GPU in a virtualized environment. In Proceedings of the 18th International Conference on Financial Cryptography and Data Security (FC'14).

[29]

Rebecca T. Mercuri and Peter G. Neumann. 2003. Security by obscurity. Communications of the ACM 46, 11, 160--166.

Digital Library

[30]

Paulius Micikevicius. 2011. Local Memory and Register Spilling. Retrieved December 26, 2015, from http://on-demand.gputechconf.com/gtc-express/2011/presentations/registe r_spilling.pdf.

[31]

N. Nishikawa, K. Iwai, and T. Kurokawa. 2011. High-performance symmetric block ciphers on CUDA. In Proceedings of the 2011 2nd International Conference on Networking and Computing (ICNC'11). 221--227.

Digital Library

[32]

NVIDIA. 2014a. CUDA C Programming Guide. Retrieved December 26, 2015, from http://docs.nvidia.com/ cuda/cuda-c-programming-guide/index.html.

[33]

NVIDIA. 2014b. GRID GPUs. Available at http://www.nvidia.com/object/grid-technology.html.

[34]

S. B. Ors, F. Gurkaynak, E. Oswald, and B. Preneel. 2004. Power-analysis attack on an ASIC AES implementation. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04), Vol. 2. 546--552.

Digital Library

[35]

Marco Paolieri, Eduardo Quinones, and Francisco J. Cazorla. 2013. Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions. ACM Transactions on Embedded Computing Systems 12, 1, Article No. 64.

Digital Library

[36]

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, and Wen-Mei W. Hwu. 2013. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Transactions on Embedded Computing Systems 13, 2, Article No. 25.

Digital Library

[37]

Joel Reardon, David Basin, and Srdjan Capkun. 2013. SoK: Secure data deletion. In Proceedings of the IEEE Symposium on Security and Privacy (SP'13). IEEE, Los Alamitos, CA, 301--315.

Digital Library

[38]

Rakesh Reddy and Peter Petrov. 2010. Cache partitioning for energy-efficient and interference-free embedded multitasking. ACM Transactions on Embedded Computing Systems 9, 3, Article No. 16.

Digital Library

[39]

Marco Riccardi, Roberto Di Pietro, Marta Palanques, and Jorge Aguilí Vila. 2013. Titans' revenge: Detecting Zeus via its own flaws. Computer Networks 57, 2, 422--435.

Digital Library

[40]

Alex Shye, Joseph Blomstedt, Tipp Moseley, Vijay Janapa Reddi, and Daniel A. Connors. 2009. PLR: A software approach to transient fault tolerance for multicore architectures. IEEE Transactions on Dependable and Secure Computing 6, 2, 135--148.

Digital Library

[41]

Henk C. A. Van Tilborg and Sushil Jajodia (Eds.). 2011. Encyclopedia of Cryptography and Security (2nd ed.). Springer.

Digital Library

[42]

Giorgos Vasiliadis, Elias Athanasopoulos, Michalis Polychronakis, and Sotiris Ioannidis. 2014. PixelVault: Using GPUs for securing cryptographic operations. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS'14). ACM, New York, NY, 1131--1142.

Digital Library

[43]

Uri Verner, Assaf Schuster, and Mark Silberstein. 2011. Processing data streams with hard real-time constraints on heterogeneous systems. In Proceedings of the International Conference on Supercomputing (ICS'11). ACM, New York, NY, 120--129.

Digital Library

[44]

Antonio Villani, Davide Balzarotti, and Roberto Di Pietro. 2015. The impact of GPU-assisted malware on memory forensics: A case study. In Proceedings of the Annual Digital Forensics Research Conference (DFRWS'15).

[45]

H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'10). 235--246.

[46]

Xi Yang, Stephen M. Blackburn, Daniel Frampton, Jennifer B. Sartor, and Kathryn S. McKinley. 2011. Why nothing matters: The impact of zeroing. ACM SIGPLAN Notices 46, 10, 307--324.

Digital Library

[47]

Zillians. 2012. VGPU GPU virtualization. Available at http://www.zillians.com.

Cited By

Wu XTian DKim C(2023)Building GPU TEEs using CPU Secure Enclaves with GEVisorProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624659(249-264)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624659
Zhu JHou RMeng DMalka MKolodner HBellosa FGabel M(2022)TACCProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534943(58-71)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3534056.3534943
Park HLin FFalsafi BFerdman MLu SWenisch T(2022)GPUReplay: a 50-KB GPU stack for client MLProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507754(157-170)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507754
Show More Cited By

Index Terms

CUDA Leaks: A Detailed Hack for CUDA and a (Partial) Fix

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Fast in-place sorting with CUDA based on bitonic sort
PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 15, Issue 1

February 2016

530 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2872313

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 13 January 2016

Accepted: 01 July 2015

Revised: 01 March 2015

Received: 01 September 2014

Published in TECS Volume 15, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Prevention of and Fight against Crime Programme of the European Union European Commission—Directorate—General Home Affairs
European Antitrust Forensic IT Tools project (rif. HOME/2012/ISEC/FP/C2/4000003977)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
987
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)16

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu XTian DKim C(2023)Building GPU TEEs using CPU Secure Enclaves with GEVisorProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624659(249-264)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624659
Zhu JHou RMeng DMalka MKolodner HBellosa FGabel M(2022)TACCProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534943(58-71)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3534056.3534943
Park HLin FFalsafi BFerdman MLu SWenisch T(2022)GPUReplay: a 50-KB GPU stack for client MLProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507754(157-170)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507754
Ma HTian JGao DJia C(2022)On the Effectiveness of Using Graphics Interrupt as a Side Channel for User Behavior SnoopingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.309115919:5(3257-3270)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TDSC.2021.3091159
Zhan ZZhang ZLiang SYao FKoutsoukos X(2022)Graphics Peeping Unit: Exploiting EM Side-Channel Information of GPUs to Eavesdrop on Your Neighbors2022 IEEE Symposium on Security and Privacy (SP)10.1109/SP46214.2022.9833773(1440-1457)Online publication date: May-2022
https://doi.org/10.1109/SP46214.2022.9833773
Jiang JQi JShen TChen XZhao SWang SChen LZhang GLuo XCui HHardavellas NCampanoni SGrot BKarpuzcu U(2022)Cronus: Fault-Isolated, Secure and High-Performance Heterogeneous Computing for Trusted Execution EnvironmentProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00019(124-143)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00019
Piccolboni LGiri DCarloni L(2022)Accelerators & Security: The Socket ApproachIEEE Computer Architecture Letters10.1109/LCA.2022.317994721:2(65-68)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1109/LCA.2022.3179947
Zhang CHou R(2022)LAK: A Low-Overhead Lock-and-Key Based Schema for GPU Memory Safety2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00108(705-713)Online publication date: Oct-2022
https://doi.org/10.1109/ICCD56317.2022.00108
Saeed I(2021)Why Cs departments should consider offering CUDA as a standalone courseJournal of Computing Sciences in Colleges10.5555/3447286.344729336:4(51-58)Online publication date: 12-Jan-2021
https://dl.acm.org/doi/10.5555/3447286.3447293
Di BSun JChen HLi D(2021)Efficient Buffer Overflow Detection on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304296532:5(1161-1177)Online publication date: 1-May-2021
https://doi.org/10.1109/TPDS.2020.3042965
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents