research-article

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

Authors:

Debashis Ganguly,

John R. LangeAuthors Info & Claims

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017

Article No.: 8, Pages 1 - 8

https://doi.org/10.1145/3095770.3095778

Published: 27 June 2017 Publication History

Get Access

Abstract

It is generally accepted that future supercomputing workloads will consist of application compositions made up of coupled simulations as well as in-situ analytics. While these components have commonly been deployed using a space-shared configuration to minimize cross-workload interference, it is likely that not all the workload components will require the full processing capacity of the CPU cores they are running on. For instance, an analytics workload often does not need to run continuously and is not generally considered to have the same priority as simulation codes. In a space-shared configuration, this arrangement would lead to wasted resources due to periodically idle CPUs, which are generally unusable by traditional bulk synchronous parallel (BSP) applications. As a result, many have started to reconsider task based runtimes owing to their ability to dynamically utilize available CPU resources. While the dynamic behavior of task-based runtimes had historically been targeted at application induced load imbalances, the same basic situation arises due to the asymmetric performance resulting from time sharing a CPU with other workloads. Many have assumed that task based runtimes would be able to adapt easily to these new environments without significant modifications. In this paper, we present a preliminary set of experiments that measured how well asynchronous task-based runtimes are able to respond to load imbalances caused by the asymmetric performance of time shared CPUs. Our work focuses on a set of experiments using benchmarks running on both Charm++ and HPX-5 in the presence of a competing workload. The results show that while these runtimes are better suited at handling the scenarios than traditional runtimes, they are not yet capable of effectively addressing anything other than a fairly minimal level of CPU contention.

References

[1]

2014. The Qthread Library. http://www.cs.sandia.gov/qthreads/. (2014). Accessed April 5.

Abstract

References

Cited By

Index Terms

Recommendations

D3: discarding dispensable data for efficient live migration of virtual machines

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations

D³: discarding dispensable data for efficient live migration of virtual machines