Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3264738.3264740acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Modular acceleration: tricky cases of functional high-performance computing

Published: 17 September 2018 Publication History

Abstract

This case study examines the data-parallel functional implementation of three algorithms: generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston market parameters via a least-squares procedure. We show that while all these problems permit elegant functional implementations, good performance depends on subtle issues that must be confronted in both the implementations of the algorithms themselves, as well as the compiler that is responsible for ultimately generating high-performance code. In particular, we demonstrate a modular technique for generating quasi-random Sobol numbers in an efficient manner, study the efficient implementation of an irregular graph algorithm without sacrificing parallelism, and argue for the utility of nested regular data parallelism in the context of nonlinear parameter calibration.

References

[1]
Christian Andreetta, Vivien Bégot, Jost Berthold, Martin Elsman, Fritz Henglein, Troels Henriksen, Maj-Britt Nordfang, and Cosmin E. Oancea. 2016. FinPar: A Parallel Financial Benchmark. ACM Trans. Archit. Code Optim.13, 2, Article 18 (June 2016), 27 pages.
[2]
Lars Bergstrom and John Reppy. 2012. Nested Data-parallelism on the GPU. SIGPLAN Not. 47, 9 (Sept. 2012), 247-258.
[3]
Robert Bernecky and Sven-Bodo Scholz. 2015. Abstract Expressionism for Parallel Performance. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2015). ACM, New York, NY, USA, 54-59.
[4]
Guy E Blelloch. 1990. Vector models for data-parallel computing. Vol. 75. MIT press Cambridge.
[5]
Guy E. Blelloch. 1996. Programming Parallel Algorithms. Communications of the ACM (CACM) 39, 3 (1996), 85-97.
[6]
Guy E Blelloch, Jonathan C Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing 21, 1 (1994), 4-14.
[7]
Paul Bratley and Bennett L. Fox. 1988. Algorithm 659 Implementing Sobol's Quasirandom Sequence Generator. ACM Trans. on Math. Software (TOMS) 14(1) (1988), 88-100.
[8]
Manuel MT Chakravarty, Gabriele Keller, Sean Lee, Trevor L McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proc. of the sixth workshop on Declarative aspects of multicore programming. ACM, 3-14.
[9]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. 44-54.
[10]
Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 1-12.
[11]
Martin Elsman, Troels Henriksen, Danil Annenkov, and Cosmin E. Oancea. 2018. Static Interpretation of Higher-order Modules in Futhark: Functional GPU Programming in the Large. Proc. ACM Program. Lang. 2, ICFP, Article 97 (July 2018), 30 pages.
[12]
Paul Glasserman. 2004. Monte Carlo Methods in Financial Engineering. Springer, New York.
[13]
Clemens Grelck and Sven-Bodo Scholz. 2006. SAC: A Functional Array Language for Efficient Multithreaded Execution. Int. Journal of Parallel Programming 34, 4 (2006), 383-427.
[14]
Clemens Grelck and Fangyong Tang. 2014. Towards Hybrid Array Types in SAC. In 7th Workshop on Prg. Lang., (Soft. Eng. Conf.). 129-145.
[15]
Troels Henriksen. 2017. Design and Implementation of the Futhark Programming Language. Ph.D. Dissertation. University of Copenhagen, Universitetsparken 5, 2100 Kobenhavn.
[16]
Troels Henriksen, Martin Elsman, and Cosmin E. Oancea. 2014. Size Slicing: A Hybrid Approach to Size Inference in Futhark. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '14). ACM, New York, NY, USA, 31-42.
[17]
Troels Henriksen, Ken Friis Larsen, and Cosmin E. Oancea. 2016. Design and GPGPU Performance of Futhark's Redomap Construct. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2016). ACM, New York, NY, USA, 17-24.
[18]
Troels Henriksen and Cosmin E. Oancea. 2014. Bounds Checking: An Instance of Hybrid Analysis. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14). ACM, New York, NY, USA, Article 88, 7 pages.
[19]
Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 556-571.
[20]
Stephen Joe and Frances Y. Kuo. 2003. Remark on Algorithm 659: Implementing Sobol's Quasirandom Sequence Generator. ACM Trans. Math. Softw. 29, 1 (March 2003), 49-57.
[21]
Rasmus Wriedt Larsen and Troels Henriksen. 2017. Strategies for Regular Segmented Reductions on GPU. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (FHPC 2017). ACM, New York, NY, USA, 42-52.
[22]
A. Lee, C. Yau, M.B. Giles, A. Doucet, and C.C. Holmes. 2010. On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods. J. Comp. Graph. Stat 19, 4 (2010), 769-789.
[23]
S. Mikhailov and U Nögel. 2003. Heston's stochastic volatility model-implementation, calibration and some extensions. Wilmott magazine (January 2003), 74-79.
[24]
Cosmin E. Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. 2012. Financial Software on GPUs: Between Haskell and Fortran. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '12). ACM, New York, NY, USA, 61-72.
[25]
Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 4 (1997), 341-359.
[26]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 11, 12 pages.
[27]
Yongpeng Zhang and Frank Mueller. 2012. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In Proceedings of the 2012 41st International Conference on Parallel Processing (ICPP'12). IEEE Computer Society, Washington, DC, USA, 340-349.

Cited By

View all
  • (2023)Modulo in high-performance code: strength reduction for modulo-based array indexing in loopsProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652573(1-13)Online publication date: 29-Aug-2023
  • (2022)Innermost Many-sorted Term Rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910(102910)Online publication date: Dec-2022
  • (2021)Term Rewriting on GPUsFundamentals of Software Engineering10.1007/978-3-030-89247-0_12(175-189)Online publication date: 17-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FHPC 2018: Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing
September 2018
21 pages
ISBN:9781450358132
DOI:10.1145/3264738
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. compilers
  3. parallelism

Qualifiers

  • Research-article

Funding Sources

  • Danish Strategic Research Council

Conference

ICFP '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 18 of 25 submissions, 72%

Upcoming Conference

ICFP '25
ACM SIGPLAN International Conference on Functional Programming
October 12 - 18, 2025
Singapore , Singapore

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Modulo in high-performance code: strength reduction for modulo-based array indexing in loopsProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652573(1-13)Online publication date: 29-Aug-2023
  • (2022)Innermost Many-sorted Term Rewriting on GPUsScience of Computer Programming10.1016/j.scico.2022.102910(102910)Online publication date: Dec-2022
  • (2021)Term Rewriting on GPUsFundamentals of Software Engineering10.1007/978-3-030-89247-0_12(175-189)Online publication date: 17-Oct-2021
  • (2021)Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic PropertiesTrends in Functional Programming10.1007/978-3-030-83978-9_1(3-23)Online publication date: 23-Aug-2021
  • (2019)Data-parallel flattening by expansionProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329955(14-24)Online publication date: 8-Jun-2019
  • (2019)Incremental flattening for nested data parallelismProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295707(53-67)Online publication date: 16-Feb-2019
  • (2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media