Nothing Special   »   [go: up one dir, main page]

Skip to main content

Software Defined Infrastructure for Operational Numerical Weather Prediction

  • Conference paper
  • First Online:
Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI (SMC 2020)

Abstract

In 2015, CSCS and the Swiss national weather and climate service (a.k.a. MeteoSwiss) have deployed the first GPU accelerated HPC system for numerical weather prediction (NWP), which has been in operation since Spring of 2016. As part of the lifecycle management, an eight-times more performant system that can support an upgraded model had to be developed, but at constant cost. This new system is scheduled to go into operation later in 2020. The performance of viable GPUs at a given price has not been sufficiently increasing in recent years. With a fixed budget envelope, the traditional design for operational NWP with two, fully redundant and self-contained systems, was no longer viable to support operations of the 2020–2024 model. We have solved the challenge with a software defined infrastructure concept from cloud infrastructure technologies, and designed a single system with builtin redundancies that would meet reliability requirements with only 1.5 x the number of (expensive) compute nodes needed for the operational NWP. Specifically, concept of network tenants is introduced to define a production, a failover/research-and-development (R&D) and a system test-and-development tenant. Moreover, operational resiliency metrics are ensured via transparent migration of components, similar to cloud environments but with subtle differences to ensure bare-metal performance and scaling of MeteoSwiss simulations. In the paper, we will describe the process for designing and operating a cloud-technology driven, high-availability operational HPC service in a cost-effective manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Consortium for small-scale modeling. http://www.cosmo-model.org/

  2. ECMWF’s high performance computing facility (HPCF). https://www.ecmwf.int/en/computing/our-facilities/supercomputer

  3. Gridtools. https://github.com/GridTools/gridtools

  4. Open Ethernet Switch Software. https://www.mellanox.com/open-ethernet

  5. Nvidia tesla v100 gpu architecture whitepaper. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/volta-architecture-whitepaper.pdf

  6. Open networking software for the modern data center. https://cumulusnetworks.com/

  7. Roce v2 considerations. https://community.mellanox.com/s/article/roce-v2-considerations

  8. Afanasyev, A., et al.: Gridtools: a framework for portable weather and climate applications (Submitted)

    Google Scholar 

  9. Basnet, S.R., Chaulagain, R.S., Pandey, S., Shakya, S.: Distributed high performance computing in openstack cloud over sdn infrastructure. In: 2017 IEEE International Conference on Smart Cloud (SmartCloud) (2017)

    Google Scholar 

  10. Benedicic, L., Cruz, F.A., Madonna, A., Mariotti, K.: Sarus: highly scalable docker containers for hpc systems. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds.) ISC High Performance 2019. LNCS, vol. 11887, pp. 46–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34356-9_5

    Chapter  Google Scholar 

  11. Fuhrer, O., et al.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomput. Front. Innov. 1(1), 45–62 (2014)

    MathSciNet  Google Scholar 

  12. Gysi, T., Osuna, C., Fuhrer, O., Bianco, M., Schulthess, T.C.: Stella: A domain-specific tool for strucutred grid methods in weather and climate models. In: Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis (2015), https://doi.org/10.1145/2807591.2807627

  13. Osuna, C., etal.: Operational numerical weather prediction on a GPU-accelerated cluster supercomputer (2016), https://www.ecmwf.int/node/16818

  14. Ranjbar, A., Antikainen, M., Aura, T.: Domain isolation in a multi-tenant software-defined network. In: 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC) (2015)

    Google Scholar 

  15. West, C.: Weathering the storm - lessons learnt in managing a 24x7x365 hpc delivery platform. In: Cray User Group Meeting (CUG) (2018)

    Google Scholar 

Download references

Acknowledgments

We would like to thank the GridTools developer team for developing the infrastructure software for enabling COSMO to run efficiently on GPUs. Additionally we would like to thank Felix Thaler and Hannes Vogt from their dedication in porting the COSMO to use GridTools libraries. This work has been partially funded by the PASC program in Switzerland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadaf R. Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alam, S.R. et al. (2020). Software Defined Infrastructure for Operational Numerical Weather Prediction. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63393-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63392-9

  • Online ISBN: 978-3-030-63393-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics