Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3314872.3314900acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Decoding CUDA binary

Published: 16 February 2019 Publication History

Abstract

NVIDIA's software does not offer translation of assembly code to binary for their GPUs, since the specifications are closed-source. This work fills that gap. We develop a systematic method of decoding the Instruction Set Architectures (ISAs) of NVIDIA's GPUs, and generating assemblers for different generations of GPUs. Our framework enables cross-architecture binary analysis and transformation. Making the ISA accessible in this manner opens up a world of opportunities for developers and researchers, enabling numerous optimizations and explorations that are unachievable at the source-code level. Our infrastructure has already benefited and been adopted in important applications including performance tuning, binary instrumentation, resource allocation, and memory protection.

References

[1]
J. Lai and A. Seznec, “Performance upper bound analysis and optimization of sgemm on fermi and kepler gpus,” in Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, 2013, pp. 1–10.
[2]
A. Lavin, “maxdnn: an efficient convolution kernel for deep learning with maxwell gpus,” arXiv preprint arXiv:1501.06633, 2015.
[3]
S. Gray, “Maxas: Assembler for nvidia maxwell architecture,” 2014.
[4]
X. Zhang, G. Tan, S. Xue, J. Li, K. Zhou, and M. Chen, “Understanding the gpu microarchitecture to achieve bare-metal performance tuning,” in ACM SIGPLAN Notices, vol. 52, no. 8. ACM, 2017, pp. 31–43.
[5]
H. Zhou, G. Tong, and C. Liu, “Gpes: A preemptive execution system for gpgpu computing,” in Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015 IEEE. IEEE, 2015, pp. 87–97.
[6]
A. B. Hayes and E. Z. Zhang, “Unified on-chip memory allocation for simt architecture,” in Proceedings of the 28th ACM international conference on Supercomputing. ACM, 2014, pp. 293–302.
[7]
A. B. Hayes, L. Li, D. Chavarr´ıa-Miranda, S. L. Song, and E. Z. Zhang, “Orion: A framework for gpu occupancy tuning,” in Proceedings of the 17th International Middleware Conference. ACM, 2016, p. 18.
[8]
A. Li, G.-J. van den Braak, H. Corporaal, and A. Kumar, “Fine-grained synchronizations and dataflow programming on gpus,” in Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 2015, pp. 109–118.
[9]
D. Mikushin, N. Likhogrud, E. Z. Zhang, and C. Bergström, “Kernelgen–the design and implementation of a next generation compiler platform for accelerating numerical models on gpus,” in Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 2014, pp. 1011–1020.
[10]
A. B. Hayes, L. Li, M. Hedayati, J. He, E. Z. Zhang, and K. Shen, “Gpu taint tracking,” in USENIX ATC, 2017, pp. 209–220.
[11]
A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” in Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 2009, pp. 163–174.
[12]
A. Li, S. L. Song, A. Kumar, E. Z. Zhang, D. Chavarr´ıa-Miranda, and H. Corporaal, “Critical points based register-concurrency autotuning for gpus,” in Proceedings of the 2016 Conference on Design, Automation & Test in Europe. EDA Consortium, 2016, pp. 1273–1278.
[13]
A. B. Hayes, F. Hua, J. Huang, Y. Chen, and E. Z. Zhang, “Decoding cuda binary - opcodes,” Feb. 2019. {Online}. Available:
[14]
——, “Decoding cuda binary - decoded instructions,” Feb. 2019. {Online}. Available:
[15]
——, “Decoding cuda binary - file format,” Feb. 2019. {Online}. Available:
[16]
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, “Demystifying gpu microarchitecture through microbenchmarking,” in Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010, pp. 235–246.
[17]
Y. Hou, J. Lai, and D. Mikushin, “Asfermi: An assembler for the nvidia fermi instruction set,”
[18]
Z. Jia, M. Maggioni, B. Staiger, and D. P. Scarpazza, “Dissecting the nvidia volta gpu architecture via microbenchmarking,” arXiv preprint arXiv:1804.06826, 2018.
[19]
NVIDIA, “GPU computing sdk.” {Online}. Available: https://developer. nvidia.com/gpu-computing-sdk
[20]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 2009, pp. 44–54.
[21]
A. B. Hayes, F. Hua, J. Huang, Y. Chen, and E. Z. Zhang, “Decoding cuda binary - special registers,” Feb. 2019. {Online}. Available:
[22]
V. Paxson et al., “Flex–fast lexical analyzer generator,” Lawrence Berkeley Laboratory, 1995.
[23]
C. Donnelly and R. Stallman, “Bison. the yacc-compatible parser generator,” 2000.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO 2019: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization
February 2019
286 pages
ISBN:9781728114361

Sponsors

Publisher

IEEE Press

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 16 February 2019

Check for updates

Badges

Author Tags

  1. CUDA
  2. Code Generation
  3. Code Translation and Transformation
  4. GPU
  5. Instruction Set Architecture (ISA)

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1,100
    Total Downloads
  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media