Nothing Special   »   [go: up one dir, main page]

Skip to main content

FFMK: An HPC OS Based on the L4Re Microkernel

  • Chapter
  • First Online:
Operating Systems for Supercomputers and High Performance Computing

Part of the book series: High-Performance Computing Series ((HPC,volume 1))

Abstract

The German research project FFMK  aims to build a new HPC operating system platform that addresses hardware and software challenges posed by future exascale systems. These challenges include massively increased parallelism (e.g., nodes and cores), overcoming performance variability, and most likely higher failure rates due to significantly increased component counts. We also expect more complex applications and the need to manage system resources in a more dynamic way than on contemporary HPC platforms, which assign resources to applications statically. The project combines and adapts existing system-software building blocks that have already matured and proven themselves in other areas. At the lowest level, the architecture is based on a microkernel to provide an extremely lightweight and fast execution environment that leaves as many resources as possible to applications. An instance of the microkernel controls each compute node, but it is complemented by a virtualized Linux kernel that provides device drivers, compatibility with existing HPC infrastructure, and rich support for programming models and HPC runtimes such as MPI . Above the level of individual nodes, the system architecture includes distributed performance and health monitoring services as well as fault-tolerant information dissemination algorithms that enable failure handling and dynamic load management. In this chapter, we will give an overview of the overall architecture of the FFMK  operating system platform. However, the focus will be on the microkernel and how it integrates with Linux to form a multi-kernel operating system architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Deutsche Forschungsgemeinschaft (DFG).

References

  • Andersen, E. (2010). \(\mu \)Clibc. https://uclibc.org.

  • Barak, A., Drezner, Z., Levy, E., Lieber, M., & Shiloh, A. (2015). Resilient gossip algorithms for collecting online management information in exascale clusters. Concurrency and Computation: Practice and Experience, 27(17), 4797–4818.

    Google Scholar 

  • Beckman, P. et al. (2015). Argo: An exascale operating system. http://www.argo-osr.org/. Accessed 20 Nov 2015.

  • Döbel, B., & Härtig, H. (2014). Can we put concurrency back into redundant multithreading? Proceedings of the 14th International Conference on Embedded Software, EMSOFT 2014 (pp. 19:1–19:10). USA: ACM.

    Google Scholar 

  • Döbel, B., Härtig, H., & Engel, M. (2012). Operating system support for redundant multithreading. Proceedings of the Tenth ACM International Conference on Embedded Software EMSOFT 2012 (pp. 83–92). USA: ACM.

    Google Scholar 

  • FFMK. FFMK Project Website. https://ffmk.tudos.org. Accessed 01 Feb 2018.

  • Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., et al. (2016). The Sunway TaihuLight supercomputer: system and applications. Science China Information Sciences, 59(7), 072001.

    Article  Google Scholar 

  • Gerofi, B., Takagi, M., Hori, A., Nakamura, G., Shirasawa, T., & Ishikawa, Y. (2016). On the scalability, performance isolation and device driver transparency of the IHK/McKernel hybrid lightweight kernel. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1041–1050).

    Google Scholar 

  • Graham, R. L., Woodall, T. S., & Squyres, J. M. (2005). Open MPI: A flexible high performance MPI. Proceedings, 6th Annual International Conference on Parallel Processing and Applied Mathematics. Poland: Poznan.

    Google Scholar 

  • Härtig, H., & Roitzsch, M. (2006). Ten Years of Research on L4-Based Real-Time. Proceedings of the Eighth Real-Time Linux Workshop. China: Lanzhou.

    Google Scholar 

  • Härtig, H., Hohmuth, M., Liedtke, J., Schönberg, S., & Wolter, J. (1997). The performance of \(\mu \)-kernel-based systems. SOSP 1997: Proceedings of the sixteenth ACM symposium on Operating systems principles (pp. 66–77). USA: ACM Press.

    Google Scholar 

  • Hoefler, T., Schneider, T., & Lumsdaine, A. (2010). Characterizing the influence of system noise on large-scale applications by simulation. Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. USA: IEEE Computer Society.

    Google Scholar 

  • Lackorzynski, A., & Warg, A. (2009). Taming subsystems: capabilities as universal resource access control in L4. IIES 2009: Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (pp. 25–30). USA: ACM.

    Google Scholar 

  • Lackorzynski, A., Weinhold, C., & Härtig, H. (2016a). Combining predictable execution with full-featured commodity systems. Proceedings of OSPERT2016, the 12th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications OSPERT 2016 (pp. 31–36).

    Google Scholar 

  • Lackorzynski, A., Weinhold, C., & Härtig, H. (2016b). Decoupled: Low-effort noise-free execution on commodity system. Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2016. USA: ACM.

    Google Scholar 

  • Lackorzynski, A., Weinhold, C., & Härtig, H. (2017). Predictable low-latency interrupt response with general-purpose systems. Proceedings of OSPERT2017, the 13th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications OSPERT 2017 (pp. 19–24).

    Google Scholar 

  • Lawrence Livermore National Laboratory. The FTQ/FWQ Benchmark.

    Google Scholar 

  • Levy, E., Barak, A., Shiloh, A., Lieber, M., Weinhold, C., & Härtig, H. (2014). Overhead of a decentralized gossip algorithm on the performance of HPC applications. Proceedings of the ROSS 2014 (pp. 10:1–10:7). New York: ACM.

    Google Scholar 

  • Lieber, M., Grützun, V., Wolke, R., Müller, M. S., & Nagel, W. E. (2012). Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4. Proceedings of the PARA 2010 (Vol. 7133, pp. 131–141). Berlin: Springer.

    Google Scholar 

  • Liedtke, J. (1995). On micro-kernel construction. SOSP 1995: Proceedings of the fifteenth ACM symposium on Operating systems principles (pp. 237–250). USA: ACM Press.

    Google Scholar 

  • microHPC. microHPC Project Website. https://microhpc.tudos.org. Accessed 01 Feb 2018.

  • mvapichweb. MVAPICH: MPI over InfiniBand. http://mvapich.cse.ohio-state.edu/. Accessed 29 Jan 2017.

  • Reussner, R., Sanders, P., & Larsson Träff, J. (2002). SKaMPI: a comprehensive benchmark for public benchmarking of MPI (pp. 10:55–10:65).

    Google Scholar 

  • Seelam, S., Fong, L., Tantawi, A., Lewars, J., Divirgilio, J., & Gildea, K. (2010). Extreme scale computing: Modeling the impact of system noise in multicore clustered systems. 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS).

    Google Scholar 

  • Singaravelu, L., Pu, C., Härtig, H., & Helmuth, C. (2006). Reducing TCB complexity for security-sensitive applications: three case studies. Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys 2006 (pp. 161–174). USA: ACM.

    Google Scholar 

  • The CP2K Developers Group. Open source molecular dynamics. http://www.cp2k.org/. Accessed 20 Nov 2015.

  • Weinhold, C. & Härtig, H. (2011). jVPFS: adding robustness to a secure stacked file system with untrusted local storage components. Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2011, (p. 32). USA: USENIX Association.

    Google Scholar 

  • Weinhold, C., & Härtig, H. (2008). VPFS: building a virtual private file system with a small trusted computing base. Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, Eurosys 2008 (pp. 81–93). USA: ACM.

    Google Scholar 

  • Weinhold, C., Lackorzynski, A., Bierbaum, J., Küttler, M., Planeta, M., Härtig, H., et al. (2016). Ffmk: A fast and fault-tolerant microkernel-based system for exascale computing. Software for Exascale Computing—SPPEXA 2013–2015 (Vol. 113, pp. 405–426).

    Google Scholar 

  • XtreemFS. XtreemFS - a cloud file system. http://www.xtreemfs.org. Accessed 16 May 2018.

Download references

Acknowledgements

We would like to thank the German priority program 1648 “Software for Exascale Computing” for supporting the project FFMK  (FFMK 2019), the ESF-funded project microHPC (microHPC 2019), and the cluster of excellence “Center for Advancing Electronics Dresden” (cfaed). We also acknowledge the Julich Supercomputing Centre, the Gauss Centre for Supercomputing, and the John von Neumann Institute for Computing for providing compute time on the JUQUEEN and JURECA supercomputers. We would also like to deeply thank TU Dresden’s ZIH for allowing us bare metal access to nodes of their Taurus system, as well as all our fellow researchers in the FFMK  project for their advise, contributions, and friendly collaboration.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carsten Weinhold .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Weinhold, C., Lackorzynski, A., Härtig, H. (2019). FFMK: An HPC OS Based on the L4Re Microkernel. In: Gerofi, B., Ishikawa, Y., Riesen, R., Wisniewski, R.W. (eds) Operating Systems for Supercomputers and High Performance Computing. High-Performance Computing Series, vol 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-6624-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6624-6_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6623-9

  • Online ISBN: 978-981-13-6624-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics