research-article

Free access

Towards exascale computing with heterogeneous architectures

Authors:

Kenneth O'Brien,

Lorenzo Di Tucci,

Gianluca Durelli,

Michaela BlottAuthors Info & Claims

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

Pages 398 - 403

Published: 27 March 2017 Publication History

Abstract

The goal of reaching exascale computing is made especially challenging by the highly heterogeneous nature of modern platforms and the energy they consume. As compute nodes typically utilize multiple multi-core CPU and are increasingly equipped with PCIe based accelerators, both are contributing to an ever more dynamic power consumption. In our study we evaluate our target application on a variety of heterogeneous platforms, including high end FPGA, GPU, and Xeon Phi accelerators, with respect to energy efficiency at a node and cluster level. We compare multiple implementations of our application, each built with a different modern parallel programming framework, with respect to execution performance, code complexity and energy efficiency. Later we extrapolate based on our findings, the implications of scaling this application towards exascale, with projections of computation achievable within the exascale power budget for our three architectures.

References

[1]

A. Sodani and C. Processor, "Race to exascale: Opportunities and challenges," in Keynote at the Annual IEEE/ACM 44th Annual International Symposium on Microarchitecture, 2011.

[2]

Kusnezov, Binkley, Harrod, and Meisner. (2013, Sep.) Doe exascale initiative. {Online}. Available: http://energy.gov/sites/prod/files/2013/09/f2/20130913-SEAB-DOE-Exascale-Initiative.pdf

[3]

K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, K. Hill, J. Hiller et al., "Exascale computing study: Technology challenges in achieving exascale systems," Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, vol. 15, 2008.

[4]

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray et al., "A reconfigurable fabric for accelerating large-scale datacenter services," in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 2014, pp. 13--24.

Digital Library

[5]

J. Fowers, G. Brown, P. Cooke, and G. Stitt, "A performance and energy comparison of fpgas, gpus, and multicores for sliding-window applications," in Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM, 2012, pp. 47--56.

Digital Library

[6]

E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J. C. Hoe, J. F. Martínez, and C. Guestrin, "Graphgen: An fpga framework for vertex-centric graph computation," in Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 2014, pp. 25--28.

Digital Library

[7]

D. Sidler, G. Alonso, M. Blott, K. Karras, K. Vissers, and R. Carley, "Scalable 10gbps tcp/ip stack architecture for reconfigurable hardware," in Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on. IEEE, 2015, pp. 36--43.

Digital Library

[8]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015, pp. 161--170.

Digital Library

[9]

G. Guidi, E. Reggiani, L. D. Tucci, G. Durelli, M. Blott, and M. D. Santambrogio, "On how to improve fpga-based systems design productivity via sdaccel," in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2016, pp. 247--252.

[10]

T. Smith and M. Waterman, "Identification of common molecular subsequences," Journal of Molecular Biology, vol. 147, no. 1, pp. 195 -- 197, 1981. {Online}. Available: http://www.sciencedirect.com/science/article/pii/0022283681900875

[11]

S. Xu, D. A. Smith, A. Mullen, and R. Cordell, "Detecting and evaluating local text reuse in social networks," ACL 2014, p. 50, 2014.

[12]

Y. Liu and B. Schmidt, "Swaphi: Smith-waterman protein database search on xeon phi coprocessors," in 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, June 2014, pp. 184--185.

[13]

Y. Liu, T. T. Tran, F. Lauenroth, and B. Schmidt, "Swaphi-ls: Smith-waterman algorithm on xeon phi coprocessors for long dna sequences," in 2014 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2014, pp. 257--265.

[14]

Y. Liu, A. Wirawan, and B. Schmidt, "Cudasw++ 3.0: accelerating smith-waterman protein database search by coupling cpu and gpu simd instructions," BMC Bioinformatics, vol. 14, no. 1, p. 117, 2013. {Online}. Available

[15]

M. Korpar and M. iki, "Sw#gpu-enabled exact alignments on genome scale," Bioinformatics, vol. 29, no. 19, pp. 2494--2495, 2013. {Online}. Available: http://bioinformatics.oxfordjournals.org/content/29/19/2494.abstract

[16]

L. Di Tucci, K. O'Brien, M. Blott, and M. D. Santambrogio, "Achitectural Optimizations for High Performance and Energy Efficient Smith-Waterman Implementation on FPGAs using OpenCL," in 2017 Design, Automation and Test in Europe. IEEE, Accepted to appear, 2016.

Digital Library

[17]

E. Roberts. (2016, Nov.) Smith-waterman algorithm. {Online}. Available: https://cs.stanford.edu/people/eroberts/courses/soco/projects/computers-and-the-hgp/smith_waterman.html

[18]

M. Korpar, M. Sosic, D. Blazeka, and M. Sikic, "Sw#db: Gpu-accelerated exact sequence similarity database search," PLOS ONE, vol. 10, no. 12, pp. 1--11, 12 2016. {Online}. Available:

[19]

S. Muralidharan, K. O'Brien, and C. Lalanne, "A semi-automated tool flow for roofline anaylsis of opencl kernels on accelerators," First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC '15), 2015.

[20]

J. L. Wegrzyn, J. D. Liechty, K. A. Stevens, L.-S. Wu, C. A. Loopstra, H. A. Vasquez-Gross, W. M. Dougherty, B. Y. Lin, J. J. Zieve, P. J. Martínez-García, C. Holt, M. Yandell, A. V. Zimin, J. A. Yorke, M. W. Crepeau, D. Puiu, S. L. Salzberg, P. J. de Jong, K. Mockaitis, D. Main, C. H. Langley, and D. B. Neale, "Unique features of the loblolly pine (pinus taeda 1.) megagenome revealed through sequence annotation," Genetics, vol. 196, no. 3, pp. 891--909, 2014. {Online}. Available: http://www.genetics.org/content/196/3/891

Towards exascale computing with heterogeneous architectures

Recommendations

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
CF '12: Proceedings of the 9th conference on Computing Frontiers

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these ...
Towards a performance-portable FFT library for heterogeneous computing
CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers

The fast Fourier transform (FFT), a spectral method that computes the discrete Fourier transform and its inverse, pervades many applications in digital signal processing, such as imaging, tomography, and software-defined radio. Its importance has caused ...
Mapping of option pricing algorithms onto heterogeneous many-core architectures

The rapid development of technologies and applications in recent years poses high demands and challenges for high-performance computing. Because of their competitive performance/price ratio, heterogeneous many-core architectures are widely used in high-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

March 2017

1814 pages

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 27 March 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
41
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents