Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2491661.2481427acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R

Published: 10 June 2013 Publication History

Abstract

This paper describes our vision for Hobbes, an operating system and runtime (OS/R) framework for extreme-scale systems. The Hobbes design explicitly supports application composition, which is emerging as a key approach for applications to address scalability and power concerns anticipated with coming extreme-scale architectures. We make use of virtualization technologies to provide the flexibility to support requirements of application components for different node-level operating systems and runtimes, as well as different mappings of the components onto the hardware. We describe the architecture of the Hobbes OS/R, how we will address the cross-cutting concerns of power/energy, scheduling of massive levels of parallelism, and resilience. We also outline how the "users" of the OS/R (programming models, applications, and tools) influence the design.

References

[1]
Intel threading building blocks (Intel TBB 4.1 update 2. http://threadingbuildingblocks.org/.
[2]
"Center for Edge Physics Simulation". http://http://epsi.pppl.gov/, 2013.
[3]
A. Anjomshoaa, F. Brisard, M. Drescher, D. Fellows, A. Ly, S. McGough, D. Pulsipher, and A. Savva. Job submission description lanuguage (JSDL) specification, version 1.0. Technical Report GFD-R.056, Global Frid Forum, 7 Nov 2005. www.ggf.org/documents/GFD.56.pdf.
[4]
P. Beckman, R. Brightwell, B. R. de Supinski, M. Gokhale, S. Hofmeyr, S. Krishnamoorthy, M. Lang, B. Maccabe, J. Shalf, and M. Snir. Exascale operating systems and runtime software report. Technical report, U. S. Department of Energy, December 28 2012. http://science.energy.gov/~/media/ascr/pdf/research/cs/Exascale%20Workshop/ExaOSR-Report-Final.pdf.
[5]
J. C. Bennett, H. Abbasi, P.-T. Bremer, R. W. Grout, A. Gyulassy, T. Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci, P. Pébay, D. Thompson, H. Yu, F. Zhang, and J. Chen. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In Proceedings of SC2012: High Performance Networking and Computing, Salt Lake City, UT, Nov. 2012. ACM Press.
[6]
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science & Discovery, 2(1):31pp, 2009.
[7]
H. M. C. Consortium. http://www.hybridmemorycube.org.
[8]
N. DeBardeleben, J. Laros, J. Daly, S. Scott, C. Engelmann, and B. Harrod. High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development. Technical Report LA-UR-10-00030, LANL, SNL, DoD, ORNL, DARPA, January 2010.
[9]
W. Elwasif, D. E. Bernholdt, A. G. Shet, S. S. Foley, R. Bramley, D. B. Batchelor, and L. A. Berry. The design and implementation of the SWIM Integrated Plasma Simulator. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pages 419--427, 2010.
[10]
K. Ferreira, K. Pedretti, P. G. Bridges, R. Brightwell, D. Fiala, and F. Mueller. Evaluating operating system vulnerability to memory errors. In Workshop on Runtime and Operating Systems for Supercomputers, June 2012.
[11]
K. B. Ferreira, P. G. Bridges, R. Brightwell, and K. Pedretti. Impact of system design parameters on application noise sensitivity. In Proceedings of the 2010 IEEE International Conference on Cluster Computing, September 2010.
[12]
V. Gupta, P. Brett, D. Koufaty, D. Reddy, S. Hahn, K. Schwan, and G. Srinivasa. Heteromates: Providing high dynamic power range on client devices using heterogeneous core groups. In 2012 International Green Computing Conference (IGCC), volume 0, pages 1--10, Los Alamitos, CA, USA, 2012. IEEE Computer Society.
[13]
V. Gupta, R. Knauerhase, et al. Attaining system performance points: Revisiting the end-to-end argument in system design for heterogeneous many-core systems. SIGOPS Operating System Review, 2011.
[14]
B. Hendrickson and J. Berry. Graph analysis with high-performance computing. Computing in Science and Engineering, 10(2):14--19, March/April 2008.
[15]
H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th international conference on Autonomic computing, ICAC '10, pages 79--88, New York, NY, USA, 2010. ACM.
[16]
B. Kocoloski and J. Lange. Better than native: using virtualization to improve compute node performance. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, pages 8:1--8:8, New York, NY, USA, 2012. ACM.
[17]
L. Lamers et al. Open virtualization format specification. version 2.0.0. DMTF Standard DSP0243, Distributed Management Task Force, Dec 13 2012. http://www.dmtf.org/sites/default/files/standards/documents/DSP0243_2.0.0.pdf.
[18]
J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium, April 2010.
[19]
J. H. Laros, III, K. T. Pedretti, S. M. Kelly, W. Shu, and C. T. Vaughan. Energy based performance tuning for large scale high performance computing systems. In Proceedings of the 2012 Symposium on High Performance Computing, HPC '12, pages 6:1--6:10, San Diego, CA, USA, 2012. Society for Computer Simulation International.
[20]
J. Lofstead, R. A. Oldfield, and T. H. Kordenbrock. Experiences applying data staging technology in unconventional ways. In 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Delft, The Netherlands, May 2013. IEEE/ACM.
[21]
K. Moreland, R. Oldfield, P. Marion, S. Joudain, N. Podhorszki, V. Vishwanath, N. Fabian, C. Docan, M. Parashar, M. Hereld, M. E. Papka, and S. Klasky. Examples of in transit visualization. In Proc. of PDAC 2011: 2nd International Workshop on Petascale Data Analytics: Challenges and Opportunities, Seattle, WA, Nov. 2011.
[22]
R. Pawlowski, R. Bartlett, N. Belcourt, R. Hooper, and R. Schmidt. A theory manual for multi-physics code coupling in LIME. Technical Report SAND2011-2195, Sandia National Laboratories, March 2011.
[23]
F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of the ACM/IEEE International Conference on High-Performance Computing, Networking, and Storage (SC), 2003.
[24]
N. Podhorszki, S. Klasky, Q. Liu, C. Docan, M. Parashar, H. Abbasi, J. Lofstead, K. Schwan, M. Wolf, F. Zheng, et al. Plasma fusion code coupling using scalable I/O services and scientific workflows. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, page 8. ACM, 2009.
[25]
I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde. Falkon: a fast and light-weight task execution framework. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, number 43, Reno, NV, November 2007. ACM Press.
[26]
R. Riesen, R. Brightwell, P. G. Bridges, T. Hudson, A. B. Maccabe, P. M. Widener, and K. Ferreira. Designing and implementing lightweight kernels for capability computing. Concurrency and Computation: Practice and Experience, 21(6):793--817, April 2009.
[27]
D. Rogers, K. Moreland, R. Oldfield, and N. Fabian. Data co-processing for extreme scale analysis. Technical Report SAND2013-XXXX, Sandia National Laboratories, March 2013. To appear.
[28]
E. Rotem, A. Naveh, D. Rajwan, A. Ananthakrishnan, and E. Weissmann. Power management architecture of the second generation Intel Core microarchitecture formerly codenamed Sandy Bridge. In Hot Chips: A Symposium on High Performance Chips, August 2011.
[29]
A. Sayed and H. El-Shishiny. Computational experience with nano-material science quantum monte carlo modeling on BlueGene/L. In MEMS, NANO, and Smart Systems (ICMENS), 2009 Fifth International Conference on, pages 213--217. IEEE, 2009.
[30]
F. Sironi, D. B. Bartolini, S. Campanoni, F. Cancare, H. Hoffmann, D. Sciuto, and M. D. Santambrogio. Metronome: operating system level performance management via self-adaptive computing. In Proceedings of the 49th Annual Design Automation Conference, DAC '12, pages 856--865, New York, NY, USA, 2012. ACM.
[31]
R. Sterritt, M. Parashar, H. Tianfield, and R. Unland. A concise introduction to autonomic computing. Advanced Engineering Informatics, 19(3):181--187, July 2005.
[32]
J.-C. Tournier. A survey of configurable operating systems. Technical Report TR-CS-2005-43, University of New Mexico, Computer Science Department, 2005.
[33]
J.-C. Tournier, P. Bridges, A. B. Maccabe, P. Widener, Z. Abudayyeh, R. Brightwell, R. Riesen, and T. Hudson. Towards a framework for dedicated operating systems development in high-end computing systems. ACM SIGOPS Operating Systems Review, 40(2):16--21, April 2006.
[34]
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. Swift: Fast, reliable, loosely coupled parallel computation. In 2007 IEEE Congress on Services, pages 199--206, July 2007.
[35]
F. Zheng, H. Abbasi, C. Docan, J. Lofstead, S. Klasky, Q. Liu, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf. PreDatA - preparatory data analytics on Peta-Scale machines. In In Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium, April, Atlanta, Georgia, 2010.

Cited By

View all
  • (2021)A Performance-Stable NUMA Management Scheme for Linux-Based HPC SystemsIEEE Access10.1109/ACCESS.2021.30699919(52987-53002)Online publication date: 2021
  • (2020)Priority research directions for in situ data management: Enabling scientific discovery from diverse data sourcesThe International Journal of High Performance Computing Applications10.1177/1094342020913628(109434202091362)Online publication date: 27-Mar-2020
  • (2019)A New Age: An Overview of Multi-kernelsOperating Systems for Supercomputers and High Performance Computing10.1007/978-981-13-6624-6_13(223-226)Online publication date: 16-Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
June 2013
75 pages
ISBN:9781450321464
DOI:10.1145/2491661
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. application composition
  2. operating system
  3. supercomputing
  4. virtualization

Qualifiers

  • Research-article

Conference

ICS'13
Sponsor:

Acceptance Rates

ROSS '13 Paper Acceptance Rate 9 of 18 submissions, 50%;
Overall Acceptance Rate 58 of 169 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Performance-Stable NUMA Management Scheme for Linux-Based HPC SystemsIEEE Access10.1109/ACCESS.2021.30699919(52987-53002)Online publication date: 2021
  • (2020)Priority research directions for in situ data management: Enabling scientific discovery from diverse data sourcesThe International Journal of High Performance Computing Applications10.1177/1094342020913628(109434202091362)Online publication date: 27-Mar-2020
  • (2019)A New Age: An Overview of Multi-kernelsOperating Systems for Supercomputers and High Performance Computing10.1007/978-981-13-6624-6_13(223-226)Online publication date: 16-Oct-2019
  • (2018)Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power CapsProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225098(1-11)Online publication date: 13-Aug-2018
  • (2018)PicoDriverProceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3208040.3208060(2-13)Online publication date: 11-Jun-2018
  • (2018)Performance and Scalability of Lightweight Multi-kernel Based Operating Systems2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00022(116-125)Online publication date: May-2018
  • (2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
  • (2017)Toward Full Specialization of the HPC Software StackProceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 201710.1145/3095770.3095777(1-8)Online publication date: 27-Jun-2017
  • (2017)UNITYProceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 201710.1145/3095770.3095776(1-8)Online publication date: 27-Jun-2017
  • (2017)Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.92(310-321)Online publication date: Sep-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media