research-article

Modular acceleration: tricky cases of functional high-performance computing

Authors:

Troels Henriksen,

Cosmin E. OanceaAuthors Info & Claims

FHPC 2018: Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing

Pages 10 - 21

https://doi.org/10.1145/3264738.3264740

Published: 17 September 2018 Publication History

Abstract

This case study examines the data-parallel functional implementation of three algorithms: generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston market parameters via a least-squares procedure. We show that while all these problems permit elegant functional implementations, good performance depends on subtle issues that must be confronted in both the implementations of the algorithms themselves, as well as the compiler that is responsible for ultimately generating high-performance code. In particular, we demonstrate a modular technique for generating quasi-random Sobol numbers in an efficient manner, study the efficient implementation of an irregular graph algorithm without sacrificing parallelism, and argue for the utility of nested regular data parallelism in the context of nonlinear parameter calibration.

References

[1]

Christian Andreetta, Vivien Bégot, Jost Berthold, Martin Elsman, Fritz Henglein, Troels Henriksen, Maj-Britt Nordfang, and Cosmin E. Oancea. 2016. FinPar: A Parallel Financial Benchmark. ACM Trans. Archit. Code Optim.13, 2, Article 18 (June 2016), 27 pages.

Digital Library

[2]

Lars Bergstrom and John Reppy. 2012. Nested Data-parallelism on the GPU. SIGPLAN Not. 47, 9 (Sept. 2012), 247-258.

Digital Library

[3]

Robert Bernecky and Sven-Bodo Scholz. 2015. Abstract Expressionism for Parallel Performance. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2015). ACM, New York, NY, USA, 54-59.

Digital Library

[4]

Guy E Blelloch. 1990. Vector models for data-parallel computing. Vol. 75. MIT press Cambridge.

Digital Library

[5]

Guy E. Blelloch. 1996. Programming Parallel Algorithms. Communications of the ACM (CACM) 39, 3 (1996), 85-97.

Digital Library

[6]

Guy E Blelloch, Jonathan C Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing 21, 1 (1994), 4-14.

Digital Library

[7]

Paul Bratley and Bennett L. Fox. 1988. Algorithm 659 Implementing Sobol's Quasirandom Sequence Generator. ACM Trans. on Math. Software (TOMS) 14(1) (1988), 88-100.

Digital Library

[8]

Manuel MT Chakravarty, Gabriele Keller, Sean Lee, Trevor L McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proc. of the sixth workshop on Declarative aspects of multicore programming. ACM, 3-14.

Digital Library

[9]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. 44-54.

Digital Library

[10]

Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 1-12.

Digital Library

[11]

Martin Elsman, Troels Henriksen, Danil Annenkov, and Cosmin E. Oancea. 2018. Static Interpretation of Higher-order Modules in Futhark: Functional GPU Programming in the Large. Proc. ACM Program. Lang. 2, ICFP, Article 97 (July 2018), 30 pages.

Digital Library

[12]

Paul Glasserman. 2004. Monte Carlo Methods in Financial Engineering. Springer, New York.

[13]

Clemens Grelck and Sven-Bodo Scholz. 2006. SAC: A Functional Array Language for Efficient Multithreaded Execution. Int. Journal of Parallel Programming 34, 4 (2006), 383-427.

Digital Library

[14]

Clemens Grelck and Fangyong Tang. 2014. Towards Hybrid Array Types in SAC. In 7th Workshop on Prg. Lang., (Soft. Eng. Conf.). 129-145.

[15]

Troels Henriksen. 2017. Design and Implementation of the Futhark Programming Language. Ph.D. Dissertation. University of Copenhagen, Universitetsparken 5, 2100 Kobenhavn.

[16]

Troels Henriksen, Martin Elsman, and Cosmin E. Oancea. 2014. Size Slicing: A Hybrid Approach to Size Inference in Futhark. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '14). ACM, New York, NY, USA, 31-42.

Digital Library

[17]

Troels Henriksen, Ken Friis Larsen, and Cosmin E. Oancea. 2016. Design and GPGPU Performance of Futhark's Redomap Construct. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2016). ACM, New York, NY, USA, 17-24.

Digital Library

[18]

Troels Henriksen and Cosmin E. Oancea. 2014. Bounds Checking: An Instance of Hybrid Analysis. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14). ACM, New York, NY, USA, Article 88, 7 pages.

Digital Library

[19]

Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 556-571.

Digital Library

[20]

Stephen Joe and Frances Y. Kuo. 2003. Remark on Algorithm 659: Implementing Sobol's Quasirandom Sequence Generator. ACM Trans. Math. Softw. 29, 1 (March 2003), 49-57.

Digital Library

[21]

Rasmus Wriedt Larsen and Troels Henriksen. 2017. Strategies for Regular Segmented Reductions on GPU. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (FHPC 2017). ACM, New York, NY, USA, 42-52.

Digital Library

[22]

A. Lee, C. Yau, M.B. Giles, A. Doucet, and C.C. Holmes. 2010. On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods. J. Comp. Graph. Stat 19, 4 (2010), 769-789.

[23]

S. Mikhailov and U Nögel. 2003. Heston's stochastic volatility model-implementation, calibration and some extensions. Wilmott magazine (January 2003), 74-79.

[24]

Cosmin E. Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. 2012. Financial Software on GPUs: Between Haskell and Fortran. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '12). ACM, New York, NY, USA, 61-72.

Digital Library

[25]

Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 4 (1997), 341-359.

Digital Library

[26]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 11, 12 pages.

Digital Library

[27]

Yongpeng Zhang and Frank Mueller. 2012. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In Proceedings of the 2012 41st International Conference on Parallel Processing (ICPP'12). IEEE Computer Society, Washington, DC, USA, 340-349.

Digital Library

Cited By

Verloop MKoopman TScholz S(2023)Modulo in high-performance code: strength reduction for modulo-based array indexing in loopsProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652573(1-13)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1145/3652561.3652573
van Eerd JGroote JHijma PMartens JOsama MWijs A(2022)Innermost Many-sorted Term Rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910(102910)Online publication date: Dec-2022
https://doi.org/10.1016/j.scico.2022.102910
van Eerd JGroote JHijma PMartens JWijs A(2021)Term Rewriting on GPUsFundamentals of Software Engineering10.1007/978-3-030-89247-0_12(175-189)Online publication date: 17-Oct-2021
https://doi.org/10.1007/978-3-030-89247-0_12
Show More Cited By

Index Terms

Modular acceleration: tricky cases of functional high-performance computing
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging languages and compilers
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Functional languages
        Parallel programming languages
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance
    2. Software system structures
      1. Software system models
        Massively parallel systems

Recommendations

One Pass to Bind Them: The First Single-Pass SYCL Compiler with Unified Code Representation Across Backends
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

Current SYCL implementations rely on multiple compiler invocations to generate code for host and device, and typically even employ one compiler invocation per required backend code format such as SPIR-V, PTX or amdgcn. This makes generating “universal” ...
Exploring the possibility of a hipSYCL-based implementation of oneAPI
IWOCL '22: Proceedings of the 10th International Workshop on OpenCL

oneAPI is an open standard for a software platform built around SYCL 2020 and accelerated libraries such as oneMKL as well as low-level building blocks such as oneAPI Level Zero. All oneAPI implementations currently are based on the DPC++ SYCL ...
Power-Performance Comparison of Single-Task Driven Many-Cores
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FHPC 2018: Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing

September 2018

21 pages

ISBN:9781450358132

DOI:10.1145/3264738

General Chairs:
Kei Davis,
Mike Rainey

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Danish Strategic Research Council

Conference

ICFP '18

Sponsor:

SIGPLAN

ICFP '18: 23nd ACM SIGPLAN International Conference on Functional Programming

September 29, 2018

MO, St. Louis, USA

Acceptance Rates

Overall Acceptance Rate 18 of 25 submissions, 72%

Upcoming Conference

ICFP '25

Sponsor:
sigplan

ACM SIGPLAN International Conference on Functional Programming

October 12 - 18, 2025

Singapore , Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
88
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Verloop MKoopman TScholz S(2023)Modulo in high-performance code: strength reduction for modulo-based array indexing in loopsProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652573(1-13)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1145/3652561.3652573
van Eerd JGroote JHijma PMartens JOsama MWijs A(2022)Innermost Many-sorted Term Rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910(102910)Online publication date: Dec-2022
https://doi.org/10.1016/j.scico.2022.102910
van Eerd JGroote JHijma PMartens JWijs A(2021)Term Rewriting on GPUsFundamentals of Software Engineering10.1007/978-3-030-89247-0_12(175-189)Online publication date: 17-Oct-2021
https://doi.org/10.1007/978-3-030-89247-0_12
Munksgaard PBreddam SHenriksen TGieseke FOancea C(2021)Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic PropertiesTrends in Functional Programming10.1007/978-3-030-83978-9_1(3-23)Online publication date: 23-Aug-2021
https://doi.org/10.1007/978-3-030-83978-9_1
Elsman MHenriksen TSerup NGibbons J(2019)Data-parallel flattening by expansionProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329955(14-24)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3315454.3329955
Henriksen TThorøe FElsman MOancea CHollingsworth JKeidar I(2019)Incremental flattening for nested data parallelismProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295707(53-67)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295707
Hovgaard AHenriksen TElsman M(2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
https://doi.org/10.1007/978-3-030-18506-0_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents