Nothing Special   »   [go: up one dir, main page]

skip to main content
column

A Transfer-Aware Runtime System for Heterogeneous Asynchronous Parallel Execution

Published: 22 April 2016 Publication History

Abstract

This paper presents a novel resource management approach for efficiently managing the computation and the data movements between the host and its accelerators in a heterogeneous platform. Our approach is based on OmpSs, with support for multi-core CPUs, GPGPUs and Maxeler Data Flow Engines based on FPGA technology; it exploits data locality, data transfer costs and data dependencies. The proposed approach is supported by an offline learning process coupled with online monitoring, allowing performance to be estimated while learning from past observations during execution. Its performance is compared against the current OmpSs scheduler using five benchmarks: matrix multiplication, bitonic sort, N-body simulation, Cholesky decomposition and AdPredictor. The results show the proposed approach can achieve up to 4.25 times speed-up for Cholesky decomposition. Moreover, an evaluation with AdPredictor indicates that the FPGA version is up to 46 times faster than the CPU version for large task sizes.

References

[1]
C. Augonnet et al. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, 23(2):187--198, 2011.
[2]
A. Duran et al. Ompss: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters, 21(02):173--193, 2011.
[3]
T. Graepel and all. Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine. In Proceedings of the International Conference on Machine Learning, 2010.
[4]
S. Gratton. Cholesky Factorization in CUDA. In http://www.ast.cam.ac.uk/ stg20/cuda/cholesky/.
[5]
M. D. Linderman et al. Merge: A Programming Model For Heterogeneous Multi-Core Systems. In ASPLOS, 2008.
[6]
Maxeler Technologies. http://www.maxeler.com/.
[7]
E. O'Neill et al. Cross Resource Optimisation of Database Functionality Across Heterogeneous Processors. In Proc. of the 12th IEEE Int. Symp. on Parallel and Distributed Processing with Applications, 2014.
[8]
J. Planas et al. Self-Adaptive OmpSs Tasks in Heterogeneous Environments. In Parallel & Distributed Processing (IPDPS), pages 138--149. IEEE, 2013.
[9]
J. R. Wernsing and G. Stitt. Elastic Computing: A Framework for Transparent, Portable, and Adaptive Multi-Core Heterogeneous Computing. In Proc. of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems, pages 115--124, 2010.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 43, Issue 4
HEART '15
September 2015
98 pages
ISSN:0163-5964
DOI:10.1145/2927964
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2016
Published in SIGARCH Volume 43, Issue 4

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 72
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media