Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1366230.1366248acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
poster

Effective runtime scalability metric to measure productivity in high performance computing systems

Published: 05 May 2008 Publication History

Abstract

Current high performance computing systems all rely on parallel processing techniques to achieve high performance. With the parallel computer systems scaling up, the new generation of high performance computers puts more emphasis on "high productivity" [1], rather than "high performance" as in the past. These new systems will not only meet the traditional requirements of computing performance, but also address the ongoing technical challenges in the current high-end computing domain regarding energy consumption, reliability, etc.
For energy consumption, with the computer system scaling up, it increases dramatically [2]. High energy consumption means high maintenance cost and low system stability. For example, the peak energy consumption of the Earth Simulator and BlueGene/L is 18MW and 1.6MW respectively. For reliability, with the complexity of a computer system increasing, its meantime-between-failure (MTBF) is becoming significantly shorter than what is required by many current high performance computing applications [3], such as BlueGene/L. Therefore, energy optimization techniques and fault tolerance techniques should be introduced to computer systems to achieve low energy consumption and high reliability.
To improve the productivity of high performance computing systems, we need to find a proper way to measure it. Unfortunately, traditional measurement models can not evaluate the system productivity comprehensively and effectively [4]. To address this issue, this paper proposes an effective scalability metric for high performance computing systems based on Gustafson speedup law. The metric makes a good balance among runtime productivity factors including computing performance, energy consumption and reliability. The contribution of our work lies in the following three aspects.
First, in order to measure the scalability of an energy-consumption optimized parallel program, we should consider not only whether the program computing performance is scalable, but also whether the energy consumption increases smoothly with the computing performance scaling up. Therefore, we propose an energy-smoothed scalability metric based on a new definition of energy efficiency, which reflects the effect of energy consumption on runtime performance. This metric can be used to measure whether energy consumption increases smoothly with the computer system scaling up.
Second, when evaluating the scalability of parallel programs, we should consider the effect of fault tolerance overhead on the program performance. Therefore, we propose a reliability-assured scalability metric based on a new definition of reliable efficiency, which reflects the effect of fault tolerance overhead on runtime performance. This metric can be used to measure whether the performance with the introduction of fault tolerance overhead is scalable as the computer system scales up.
Third, based on the analyses above, we propose a synthetic scalability metric, which measures whether the systems are energy-smoothed and reliability-assured when the systems scale up. The metric simultaneously measures the multiple productivity factors regarding computing performance, energy consumption and reliability.
The metric is demonstrated by applying it to some well-known energy optimization techniques and fault tolerance techniques. Case studies indicate that using our model, it is more effective to solve the following problems: First, measuring the scalability for high performance computing systems by quantifying the effect of runtime factors including computing performance, energy consumption and reliability on scalability; Second, providing suggestions on how to keep and improve the scalability of high performance computing systems, and guiding the proper selection of energy optimization techniques and fault tolerant techniques to achieve high scalability of high performance computer systems.

References

[1]
Kepner. HPC Productivity: An Overarching View. International Journal of High Performance Computing Applications, 18(4), Nov. 2004.
[2]
Feng, R. Ge, K. W. Cameron. Power and Energy Profiling of Scientific Applications on Distributed Systems. In Proceedings of IPDPS'05, Denver, CA, April 2005.
[3]
Yang, et al. The Fault Tolerant Parallel Algorithm: the Parallel Recomputing Based Failure Recovery. In Proceedings of PACT'07, Brasov, Romania, Sept. 2007.
[4]
Gustafson. Reevaluating Amdahl's law. Communication of ACM, 31(5): 532--533. 1988.

Cited By

View all
  • (2014)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsSoftware Design and Development10.4018/978-1-4666-4301-7.ch071(1461-1480)Online publication date: 2014
  • (2012)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsMeasuring Organizational Information Systems Success10.4018/978-1-4666-0170-3.ch008(137-157)Online publication date: 2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '08: Proceedings of the 5th conference on Computing frontiers
May 2008
334 pages
ISBN:9781605580777
DOI:10.1145/1366230
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computing performance
  2. energy consumption
  3. high performance computing system
  4. productivity
  5. reliability
  6. scalability metric

Qualifiers

  • Poster

Conference

CF '08
Sponsor:
CF '08: Computing Frontiers Conference
May 5 - 7, 2008
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsSoftware Design and Development10.4018/978-1-4666-4301-7.ch071(1461-1480)Online publication date: 2014
  • (2012)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsMeasuring Organizational Information Systems Success10.4018/978-1-4666-0170-3.ch008(137-157)Online publication date: 2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media