Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

FinPar: A Parallel Financial Benchmark

Published: 27 June 2016 Publication History

Abstract

Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.
This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

References

[1]
Mehdi Amini, Fabien Coelho, Francois Irigoin, and Ronan Keryell. 2011. Static compilation analysis for host-accelerator communication optimization. In Proceedings of the Conference on Languages and Compilers for Parallel Computing (LCPC’11). 237--251.
[2]
Patrick Bahr, Jost Berthold, and Martin Elsman. 2015. Certified symbolic management of financial multi-party contracts. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’15).
[3]
Erik Barendsen and Sjaak Smetsers. 1993. Conventional and uniqueness typing in graph rewrite systems. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, Vol. 761. Springer, 41--51.
[4]
Basel Committee on Banking Supervision. 2010. Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems. Bank for International Settlements, Basel, Switzerland.
[5]
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. 2010. Automatic C-to-CUDA code generation for affine programs. In Proceedings of the International Conference on Compiler Construction (CC’10). 244--263.
[6]
Nathan Bell and Jared Hoberock. 2011. Thrust: A productivity-oriented library for CUDA. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA.
[7]
Lars Bergstrom and John Reppy. 2012. Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP’12). 247--258.
[8]
R. S. Bird. 1987. An introduction to the theory of lists. In Proceedings of the NATO Advanced Study on Logic of Programming and Calculi of Discrete Design. 5--42.
[9]
F. Black and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654.
[10]
Guy Blelloch. 1996. Programming parallel algorithms. Communications of the ACM 39, 3, 85--97.
[11]
Guy E. Blelloch. 1989. Scans as primitive parallel operations. IEEE Transactions on Computers 38, 11, 1526--1538.
[12]
Guy E. Blelloch. 1990. Prefix Sums and Their Applications. Carnegie Mellon University, Pittsburgh, PA.
[13]
Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21, 1, 4--14.
[14]
Cajo J. Braak. 2006. A Markov chain Monte Carlo version of the genetic algorithm differential evolution: Easy Bayesian computing for real parameter spaces. Statistics and Computing 16, 3, 239--249.
[15]
Paul Bratley and Bennett L. Fox. 1988. Algorithm 659 implementing Sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software 14, 1, 88--100.
[16]
Richard P. Brent. 1973. Algorithms for Minimization without Derivatives. Prentice Hall.
[17]
Damiano Brigo and Fabio Mercurio. 2006. Interest Rate Models—Theory and Practice: With Smile, Inflation and Credit (2nd ed.). Springer.
[18]
Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the 6th Workshop on Aspects of Multicore Programming (DAMP’11). 3--14.
[19]
Y. Chicha, M. Lloyd, C. Oancea, and S. M. Watt. 2004. Parametric polymorphism for computer algebra software components. In Proceedings of the International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. 119--130.
[20]
Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive array constructs in an embedded GPU kernel programming language. In Proceedings of the 7th Workshop on Declarative Aspects and Applications of Multicore Programming (DAMP’12). 21--30.
[21]
J. Crank and P. Nicolson. 1947. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. Mathematical Proceedings of the Cambridge Philosophical Society 43, 1, 50--67.
[22]
Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the International Parallel and Distributed Processing Symposium (PDPS’02). 20--29.
[23]
Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a high-level language for GPUs. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’12). 1--12.
[24]
Daniel Egloff. 2011. Pricing financial derivatives with high performance finite difference solvers on GPUs. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA, 309--322.
[25]
V. Elango, F. Rastello, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. 2015. On characterizing the data access complexity of programs. In Proceedings of the 42nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15). ACM, New York, NY, 567--580.
[26]
Martin Elsman and Martin Dybdal. 2014. Compiling a subset of APL into a typed intermediate language. In Proceedings of the 1st International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY.
[27]
Martin Elsman and Anders Schack-Nielsen. 2014. Typelets—a rule-based evaluation model for dynamic, statically typed user interfaces. In Proceedings of the International Symposium on Practical Aspects of Declarative Languages (PADL’14).
[28]
Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1, 23--54.
[29]
Michael Flænø Werk, Joakim Ahnfelt-Rønne, and Ken Friis Larsen. 2012. An embedded DSL for stochastic processes: Research article. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 93--102.
[30]
M. B. Giles, G. R. Mudalige, Z. Sharif, G. Markall, and P. H. J. Kelly. 2011. Performance analysis and optimisation of the OP2 framework on many-core architectures. ACM SIGMETRICS Performance Evaluation Review 38, 4, 9--15.
[31]
Paul Glasserman. 2004. Monte Carlo Methods in Financial Engineering. Springer, New York, NY.
[32]
Clemens Grelck and Sven-Bodo Scholz. 2006. SAC: A functional array language for efficient multithreaded execution. International Journal of Parallel Programming 34, 4, 383--427.
[33]
Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In Proceedings of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP’11). ACM, New York, NY, 15--24.
[34]
G. Hains and L. M. R. Mullin. 1993. Parallel functional programming with arrays. Computer Journal 36, 3, 238--245.
[35]
Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and Monica S. Lam. 2005. Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems. 27, 4, 662--731.
[36]
Troels Henriksen. 2014. Exploiting Functional Invariants to Optimise Parallelism: A Dataflow Approach. Master’s Thesis. DIKU, Copenhagen, Denmark.
[37]
Troels Henriksen, Martin Elsman, and Cosmin Eugen Oancea. 2014. Size slicing—a hybrid approach to size inference in Futhark. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’14). ACM, New York, NY, 31--42.
[38]
Troels Henriksen and Cosmin Eugen Oancea. 2013. A T2 graph-reduction approach to fusion. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’13). ACM, New York, NY, 47--58.
[39]
Troels Henriksen and Cosmin Eugen Oancea. 2014. Bounds checking: An instance of hybrid analysis. In Proceedings of the ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 88.
[40]
Roger W. Hockney. 1965. A fast direct solution of Poisson’s equation using Fourier analysis. Journal of the ACM 12, 1, 95--113.
[41]
J. Hull. 2009. Options, Futures and Other Derivatives. Prentice Hall.
[42]
Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons.
[43]
Ajay Joshi, Aashish Phansalkar, Lieven Eeckhout, and Lizy Kurian John. 2006. Measuring benchmark similarity using inherent program characteristics. IEEE Transactios on Computers 6, 769--782.
[44]
M. S. Joshi. 2010. Graphical Asian options. Wilmott Journal 2, 2, 97--107.
[45]
Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, and Wen-Mei W. Hwu. 2011. A scalable tridiagonal solver for GPUs. In Proceedings of the International Conference on Parallel Processing (ICPP’11). IEEE, Los Alamitos, CA, 444--453.
[46]
A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics 19, 4, 769--789.
[47]
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09). 101--110.
[48]
Yuan Lin and David Padua. 2000. Analysis of irregular single-indexed arrays and its applications in compiler optimizations. In Proceedings of the International Conference on Compiler Construction. 202--218.
[49]
Frederik M. Madsen and Andrzej Filinski. 2013. Towards a streaming model for nested data parallelism. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing.
[50]
Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the 3rd ACM International Symposium on Haskell. 67--78.
[51]
Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. 1997. The Definition of Standard ML (Revised). MIT Press, Cambridge, MA.
[52]
Claus Munk. 2007. Introduction to the Numerical Solution of Partial Differential Equations in Finance. Retrieved May 10, 2016, from http://mit.econ.au.dk/vip_htm/cmunk/noter/pdenote.pdf.
[53]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2014. Deterministic Galois: On-demand, portable and parameterless. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14).
[54]
Fredrik Nord and Erwin Laure. 2011. Monte Carlo option pricing with graphics processing units. In Proceedings of the International Conference on Parallel Computing (ParCo’11).
[55]
Cosmin Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. 2012. Financial software on GPUs: Between Haskell and Fortran. In Proceedings of the Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 61--72.
[56]
Cosmin E. Oancea and Lawrence Rauchwerger. 2015. Scalable conditional induction variable (CIV) analysis. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’15).
[57]
Cosmin E. Oancea, Jason W. A. Selby, Mark Giesbrecht, and Stephen M. Watt. 2005. Distributed models of thread level speculation. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Vol. 5. 920--927.
[58]
C. E. Oancea and S. M. Watt. 2005. Domains and expressions: An interface between two approaches to computer algebra. In Proceedings of the 2005 International Symposium on Symbolic and Algebraic Computation (ISSAC’05). ACM, New York, NY, 261--269.
[59]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. 2011. Loop transformations: Convexity, pruning and optimization. In Proceedings of the 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 549--562.
[60]
James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media.
[61]
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-Mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). 73--82.
[62]
Standard Performance Evaluation Corporation. 2014. SPEC ACCEL. Retrieved May 10, 2016, from https://www.spec.org/accel/.
[63]
N. M. Steen, G. D. Byrne, and E. M. Gelbard. 1969. Gaussian quadratures for the integrals ∫0 exp( − x2)f(x)dx and ∫b0 exp( − x2)f(x)dx. Mathematics of Computation 23, 661--671.
[64]
Rainer Storn and Kenneth Price. 1997. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4, 341--359.
[65]
Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, and Wen-Mei W. Hwu. 2008. CUDA-lite: Reducing GPU programming complexity. In Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC’08). 1--15.
[66]
Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14). 51--60.
[67]
David Watkins. 1991. Fundamentals of Matrix Computations. Wiley, New York, NY.
[68]
M. J. Wichura. 1988. Algorithm AS 241: The percentage points of the normal distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics) 37, 3, 477--484.
[69]
Yi Yang, Ping Xiang, Jingfei Kong, and Huiyang Zhou. 2010. A GPGPU compiler for memory optimization and parallelism management. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’10). 86--97.

Cited By

View all
  • (2024)AUTOMAP: Inferring Rank-Polymorphic Function Applications with Integer Linear ProgrammingProceedings of the ACM on Programming Languages10.1145/36897748:OOPSLA2(1787-1813)Online publication date: 8-Oct-2024
  • (2022)Memory optimizations in an array languageProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571926(1-15)Online publication date: 13-Nov-2022
  • (2022)Distributed parallel computing with Futhark: a functional language to generate distributed parallel codeProceedings of the 8th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3520306.3534501(12-24)Online publication date: 13-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 2
June 2016
200 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2952301
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2016
Accepted: 01 February 2016
Revised: 01 February 2016
Received: 01 August 2015
Published in TACO Volume 13, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data-parallel functional language
  2. fission
  3. fusion
  4. strength reduction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Danish Council for Strategic Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)171
  • Downloads (Last 6 weeks)13
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AUTOMAP: Inferring Rank-Polymorphic Function Applications with Integer Linear ProgrammingProceedings of the ACM on Programming Languages10.1145/36897748:OOPSLA2(1787-1813)Online publication date: 8-Oct-2024
  • (2022)Memory optimizations in an array languageProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571926(1-15)Online publication date: 13-Nov-2022
  • (2022)Distributed parallel computing with Futhark: a functional language to generate distributed parallel codeProceedings of the 8th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3520306.3534501(12-24)Online publication date: 13-Jun-2022
  • (2022)Memory Optimizations in an Array LanguageSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00036(1-15)Online publication date: Nov-2022
  • (2021)Towards size-dependent types for array programmingProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464310(1-14)Online publication date: 17-Jun-2021
  • (2021)Acceleration of lattice models for pricing portfolios of fixed-income derivativesProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464309(27-38)Online publication date: 17-Jun-2021
  • (2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-449:6(761-775)Online publication date: 1-Dec-2021
  • (2021)Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic PropertiesTrends in Functional Programming10.1007/978-3-030-83978-9_1(3-23)Online publication date: 23-Aug-2021
  • (2019)A functional approach to accelerating Monte Carlo based american option pricingProceedings of the 31st Symposium on Implementation and Application of Functional Languages10.1145/3412932.3412937(1-12)Online publication date: 25-Sep-2019
  • (2019)Data-parallel flattening by expansionProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329955(14-24)Online publication date: 8-Jun-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media