Nothing Special   »   [go: up one dir, main page]

Skip to main content

Methodology for Design and Implementation an Efficient HPC Cluster

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2020)

Abstract

For years, clusters for HPC have been implemented through the typical process of obtaining the source code, configuring and compiling each of the tools that make up the infrastructure services. Each administrator based on their experience and knowledge assumes a series of considerations to design and implement a cluster that is considered efficient by installing base tools such as NTP, NFS, a task manager (that is, SLURM), LDAP, among others. In order to reduce these times, several open-source initiatives have emerged, such as Rocks, that allow the rapid implementation of an HPC cluster despite its low configuration flexibility. OpenHPC emerges as an alternative that provides the necessary tools in a software repository and that once installed allows the same flexibility of customization and adaptation as if they had been installed in a typical way. It’s worth mentioning that OpenHPC provides all of those standardized tools in order to spread best practices in building and managing HPC data centers, but unlike Rocks, OpenHPC requires pre-design of the platform, including network infrastructure, storage services, and the different tools to implement, requiring prior knowledge by the administrator about each of them. The objective of this paper is to present the fundamental basis for implementing an efficient cluster by using OpenHPC without becoming a technical installation guide, but rather a series of steps in a methodology used by the Supercomputación y Cálculo Cienfífico Laboratory SC3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GpUs Advanced computiNg Environment.

  2. 2.

    https://uptimeinstitute.com/.

  3. 3.

    https://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-performance-overview.html.

  4. 4.

    https://en.wikipedia.org/wiki/Tcl.

  5. 5.

    https://en.wikipedia.org/wiki/Lua_(programming_language).

  6. 6.

    https://lmod.readthedocs.io/en/latest/050_lua_modulefiles.html#lua-modulefile-functions-label.

  7. 7.

    https://lmod.readthedocs.io/en/latest/015_writing_modules.html.

  8. 8.

    https://spack.readthedocs.io/en/latest/.

  9. 9.

    https://easybuild.readthedocs.io/en/latest/.

  10. 10.

    https://clustershell.readthedocs.io/en/latest/.

  11. 11.

    https://github.com/duncs/clusterssh.

  12. 12.

    https://github.com/dun/munge.

  13. 13.

    https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf.

  14. 14.

    https://slurm.schedmd.com/SLUG19/cgroups_and_pam_slurm_adopt.pdf.

  15. 15.

    https://www.netlib.org/benchmark/hpl.

  16. 16.

    https://www.top500.org/lists/2019/11/.

References

  1. Schulz, K.W., et al.: Cluster computing with OpenHPC. In: HPC Systems Professionals Workshop (2016)

    Google Scholar 

  2. Thornton, J.E.: The CDC 6600 Project. Ann. Hist. Comput. 2(4), 338–348 (1980). https://doi.org/10.1109/MAHC.1980.10044

    Article  Google Scholar 

  3. Sen, S.K., Agarwal, R.P.: Computing: birth, growth, exaflops computation and beyond. In: Flaut, D., Hošková-Mayerová, Š., Ispas, C., Maturo, F., Flaut, C. (eds.) Decision Making in Social Sciences: Between Traditions and Innovations. SSDC, vol. 247, pp. 3–47. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-30659-5_1

    Chapter  Google Scholar 

  4. Papadopoulos, P.M., Katz, M.J., Bruno, G.: NPACI rocks: tools and techniques for easily deploying manageable Linux clusters. Concurr. Comput.: Pract. Exp. 15(7–8), 707–725 (2003)

    Article  Google Scholar 

  5. Scott, S.L.: OSCAR and the Beowulf arms race for the “cluster standard”. In: 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 8–11 October 2001, p. 137, Newport Beach (2001)

    Google Scholar 

  6. Aydin, S., Bay, O.F.: Building a high performance computing clusters to use in computing course applications. Procedia - Soc. Behav. Sci. 1(1), 2396–2401 (2009)

    Article  Google Scholar 

  7. Hoste, K., Timmerman, J., Georges, A., Weirdt, S.D.: EasyBuild: building software with ease. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, USA, 10–16 November 2012, pp. 572–582 (2012)

    Google Scholar 

  8. Gamblin, T., et al.: The spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, 15–20 November 2015, pp. 40:1–40:12 (2015)

    Google Scholar 

  9. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3

    Chapter  Google Scholar 

  10. Wang, L., et al.: BOPS, Not FLOPS! a new metric and roofline performance model for datacenter computing (2018). http://arxiv.org/abs/1801.09212

  11. Dongarra, J., Luszczek, P., Petitet, A.: The LINPACK benchmark: past, present and future. Concurr. Comput.: Pract. Exper. 15, 803–820 (2003). https://doi.org/10.1002/cpe.728

    Article  Google Scholar 

  12. Libri, A., Bartolini, A., Cesarini, D., Benini, L.: Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters. In: ACM International Conference Proceeding Series (2018)

    Google Scholar 

  13. Supercomputación y Cálculo Científico (SC3). https://www.sc3.uis.edu.co. Accessed 20 May 2020

  14. Top500. https://www.top500.org/. Accessed 20 May 2020

  15. Clustering fundamentals. https://developer.ibm.com/articles/l-cluster1/. Accessed 12 May 2020

  16. Lightweight Directory Access Protocol (LDAP). http://web.mit.edu/rhel-doc/5/RHEL-5-manual/Deployment_Guide-en-US/ch-ldap.html. Accessed 5 May 2020

  17. SLURM Overview. https://slurm.schedmd.com/overview.html. Accessed 8 May 2020

  18. SSSD. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/migration_planning_guide/sect-migration_guide-security_authentication-sssd. Accessed 15 May 2020

  19. Network File System (NFS). https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-nfs. Accessed 20 May 2020

  20. SLURM Priority Multifactor. https://slurm.schedmd.com/priority_multifactor.html. Accessed 15 May 2020

  21. ZYTRAX - Configuring Dynamic Groups. https://www.zytrax.com/books/ldap/ch11/dynamic.html. Accessed 2 May 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to L. A. Torres or Carlos J. Barrios .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Torres, L.A., Barrios, C.J. (2021). Methodology for Design and Implementation an Efficient HPC Cluster. In: Nesmachnow, S., Castro, H., Tchernykh, A. (eds) High Performance Computing. CARLA 2020. Communications in Computer and Information Science, vol 1327. Springer, Cham. https://doi.org/10.1007/978-3-030-68035-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68035-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68034-3

  • Online ISBN: 978-3-030-68035-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics