Nothing Special   »   [go: up one dir, main page]

Skip to main content

Abstract

NVIDIA is defining a High-Performance Computing system architecture called Cloud Native Supercomputing to provide bare-metal system performance with security isolation and functional offload capabilities. Cloud Native Supercomputing delivers a cloud-based user experience in a way that maintains the performance and scalability that is uniquely delivered with supercomputing facilities. This new set of capabilities is being driven by the need to accommodate new scientific workflows that combine traditional simulation with experimental data from the edge and combine it with AI, data analytics and visualization frameworks in an integrated and even real-time fashion. These new workflows stress the system management, security and non-computational functions of traditional cloud or supercomputing facilities. Specifically, workflows that include data from untrusted (or non-local) sources, user experiences that range from Jupyter notebooks and interactive jobs to Gordon Bell-class capacity batch runs and I/O patterns that are unique to the emerging mix of in silico and live data sources. To achieve these objectives, we introduce a new architectural component called the Data Processing Unit (DPU), which in early embodiments is a system-on-a-chip (SoC) that includes an InfiniBand (IB) and Ethernet network adapter, programmable Arm cores, memory, PCI switches, and custom accelerators. The BlueField-1 and BlueField-2 devices are NVIDIA’s first DPU instances. This paper describes the architecture of cloud native supercomputing systems that use DPUs for isolation and acceleration, along with system services provided by that DPU. These services provide enhanced security through isolation, file-system management capabilities, monitoring, and the offloaded support for communication libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ansible: drive automation across open hybrid cloud deployments. https://www.ansible.com/overview/how-ansible-works

  2. Cloud Native Supercomputing Website. https://www.nvidia.com/en-us/networking/products/cloud-native-supercomputing/

  3. Foreman is a complete lifecycle management tool for physical and virtual servers. https://theforeman.org/

  4. Gordon Bell prize winners embrace summit to advance COVID-19 research. https://www.hpcwire.com/off-the-wire/gordon-bell-prize-winners-embrace-summit-to-advance-covid-19-research/

  5. NVIDIA base command: AI workflow and cluster management software. https://docs.nvidia.com/base-command/index.html

  6. NVIDIA base command platform

    Google Scholar 

  7. NVIDIA unveils new data center chips to speed pace of AI. https://www.datacenterknowledge.com/machine-learning/nvidia-unveils-new-data-center-chips-speed-pace-ai

  8. The world’s first cloud-native supercomputer. https://www.nvidia.com/en-us/data-center/dgx-superpod/

  9. Annas, G.J.: HIPAA regulations – a new era of medical-record privacy? N. Engl. J. Med. 348(15), 1486–1490 (2003). PMID: 12686707

    Article  Google Scholar 

  10. 0 Bezemer, C.-P., Zaidman, A.: Multi-tenant SaaS applications: maintenance dream or nightmare? In: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), IWPSE-EVOL 2010, pp. 88–92. Association for Computing Machinery, New York, NY, USA (2010)

    Google Scholar 

  11. Fagnan, K., Nashed, Y., Perdue, G., Ratner, D., Shankar, A., Yoo, S.: Data and models: a framework for advancing AI in science (2019)

    Google Scholar 

  12. Gupta, D., Cherkasova, L., Gardner, R., Vahdat, A.: Enforcing performance isolation across virtual machines in Xen. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS, vol. 4290, pp. 342–362. Springer, Heidelberg (2006). https://doi.org/10.1007/11925071_18

    Chapter  Google Scholar 

  13. Kumar, M.: An incorporation of artificial intelligence capabilities in cloud computing. Int. J. Eng. Comput. Sci. 5, 19070–19073 (2016)

    Google Scholar 

  14. Mansfield-Devine, S.: Security through isolation. Comput. Fraud Secur. 2010(5), 8–11 (2010)

    Article  Google Scholar 

  15. Peterka, T., et al.: ASCR workshop on in situ data management: enabling scientific discovery from diverse data sources (2019)

    Google Scholar 

  16. Rad, P., Chronopoulos, A.T., Lama, P., Madduri, P., Loader, C.: Benchmarking bare metal cloud servers for HPC applications. In: 2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 153–159 (2015)

    Google Scholar 

  17. Rose, S., Borchert, O., Mitchell, S., Connelly, S.: Zero trust architecture. NIST Special Publication 800-207 (2020)

    Google Scholar 

  18. SAIC. Report on HPC trends for federal government (2019)

    Google Scholar 

  19. Schneider, F.B.: Least privilege and more [computer security]. IEEE Secur. Priv. 1(5), 55–59 (2003)

    Article  Google Scholar 

  20. Stevens, R., Taylor, V., Nichols, J., Maccabe, A.B., Yelick, K., Brown, D.: AI for Science. U.S. Department of Energy Office of Science Report (2019)

    Google Scholar 

  21. Win, T.Y., Tianfield, H., Mair, Q.: Virtualization security combining mandatory access control and virtual machine introspection. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 1004–1009 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Graham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shainer, G. et al. (2022). NVIDIA’s Cloud Native Supercomputing. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96498-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96497-9

  • Online ISBN: 978-3-030-96498-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics