poster

Effective runtime scalability metric to measure productivity in high performance computing systems

Authors:

Jing Du,

Xuejun Yang,

Zhiyuan WangAuthors Info & Claims

CF '08: Proceedings of the 5th conference on Computing frontiers

Pages 107 - 108

https://doi.org/10.1145/1366230.1366248

Published: 05 May 2008 Publication History

Get Access

Abstract

Current high performance computing systems all rely on parallel processing techniques to achieve high performance. With the parallel computer systems scaling up, the new generation of high performance computers puts more emphasis on "high productivity" [1], rather than "high performance" as in the past. These new systems will not only meet the traditional requirements of computing performance, but also address the ongoing technical challenges in the current high-end computing domain regarding energy consumption, reliability, etc.

For energy consumption, with the computer system scaling up, it increases dramatically [2]. High energy consumption means high maintenance cost and low system stability. For example, the peak energy consumption of the Earth Simulator and BlueGene/L is 18MW and 1.6MW respectively. For reliability, with the complexity of a computer system increasing, its meantime-between-failure (MTBF) is becoming significantly shorter than what is required by many current high performance computing applications [3], such as BlueGene/L. Therefore, energy optimization techniques and fault tolerance techniques should be introduced to computer systems to achieve low energy consumption and high reliability.

To improve the productivity of high performance computing systems, we need to find a proper way to measure it. Unfortunately, traditional measurement models can not evaluate the system productivity comprehensively and effectively [4]. To address this issue, this paper proposes an effective scalability metric for high performance computing systems based on Gustafson speedup law. The metric makes a good balance among runtime productivity factors including computing performance, energy consumption and reliability. The contribution of our work lies in the following three aspects.

First, in order to measure the scalability of an energy-consumption optimized parallel program, we should consider not only whether the program computing performance is scalable, but also whether the energy consumption increases smoothly with the computing performance scaling up. Therefore, we propose an energy-smoothed scalability metric based on a new definition of energy efficiency, which reflects the effect of energy consumption on runtime performance. This metric can be used to measure whether energy consumption increases smoothly with the computer system scaling up.

Second, when evaluating the scalability of parallel programs, we should consider the effect of fault tolerance overhead on the program performance. Therefore, we propose a reliability-assured scalability metric based on a new definition of reliable efficiency, which reflects the effect of fault tolerance overhead on runtime performance. This metric can be used to measure whether the performance with the introduction of fault tolerance overhead is scalable as the computer system scales up.

Third, based on the analyses above, we propose a synthetic scalability metric, which measures whether the systems are energy-smoothed and reliability-assured when the systems scale up. The metric simultaneously measures the multiple productivity factors regarding computing performance, energy consumption and reliability.

The metric is demonstrated by applying it to some well-known energy optimization techniques and fault tolerance techniques. Case studies indicate that using our model, it is more effective to solve the following problems: First, measuring the scalability for high performance computing systems by quantifying the effect of runtime factors including computing performance, energy consumption and reliability on scalability; Second, providing suggestions on how to keep and improve the scalability of high performance computing systems, and guiding the proper selection of energy optimization techniques and fault tolerant techniques to achieve high scalability of high performance computer systems.

References

[1]

Kepner. HPC Productivity: An Overarching View. International Journal of High Performance Computing Applications, 18(4), Nov. 2004.

Digital Library

Google Scholar

[2]

Feng, R. Ge, K. W. Cameron. Power and Energy Profiling of Scientific Applications on Distributed Systems. In Proceedings of IPDPS'05, Denver, CA, April 2005.

Digital Library

Google Scholar

[3]

Yang, et al. The Fault Tolerant Parallel Algorithm: the Parallel Recomputing Based Failure Recovery. In Proceedings of PACT'07, Brasov, Romania, Sept. 2007.

Digital Library

Google Scholar

[4]

Gustafson. Reevaluating Amdahl's law. Communication of ACM, 31(5): 532--533. 1988.

Digital Library

Google Scholar

Cited By

View all

Pusatli ORegan B(2014)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsSoftware Design and Development10.4018/978-1-4666-4301-7.ch071(1461-1480)Online publication date: 2014
https://doi.org/10.4018/978-1-4666-4301-7.ch071
Pusatli ORegan B(2012)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsMeasuring Organizational Information Systems Success10.4018/978-1-4666-0170-3.ch008(137-157)Online publication date: 2012
https://doi.org/10.4018/978-1-4666-0170-3.ch008

Index Terms

Effective runtime scalability metric to measure productivity in high performance computing systems

Recommendations

An effective speedup metric for measuring productivity in large-scale parallel computer systems

With the parallel computer systems scaling-up, the measure index for performance of the systems demands a shift from traditional "high performance" to "high productivity." This brings a new challenge to defining a synthetic, yet meaningful, measure ...
Failure-aware energy-efficient VM consolidation in cloud computing systems
Abstract
VM consolidation is an important technique used in cloud computing systems to improve energy efficiency. It migrates the running VMs from under utilized physical resources to other resources in order to reduce the energy consumption. ...
Highlights
- Reliability, energy consumption and task finishing time modelling under failures.
A novel scalability metric about iso-area of performance for parallel computing

Scalability is an important performance metric of parallel computing, but the traditional scalability metrics only try to reflect the scalability for parallel computing from one side, which makes it difficult to fully measure its overall performance. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CF '08: Proceedings of the 5th conference on Computing frontiers

May 2008

334 pages

ISBN:9781605580777

DOI:10.1145/1366230

General Chair:
Alex Ramirez
UPC, Spain
,
Program Chairs:
Gianfranco Biliardi
University of Padova, Italy
,
Michael Gschwind
IBM TJ Watson Research Center, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CF '08

Sponsor:

CF '08: Computing Frontiers Conference

May 5 - 7, 2008

Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Pusatli ORegan B(2014)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsSoftware Design and Development10.4018/978-1-4666-4301-7.ch071(1461-1480)Online publication date: 2014
https://doi.org/10.4018/978-1-4666-4301-7.ch071
Pusatli ORegan B(2012)A Model to Assist the Maintenance vs. Replacement Decision in Information SystemsMeasuring Organizational Information Systems Success10.4018/978-1-4666-0170-3.ch008(137-157)Online publication date: 2012
https://doi.org/10.4018/978-1-4666-0170-3.ch008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

An effective speedup metric for measuring productivity in large-scale parallel computer systems

Failure-aware energy-efficient VM consolidation in cloud computing systems

A novel scalability metric about iso-area of performance for parallel computing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations