Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3433701.3433767acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Tuning floating-point precision using dynamic program information and temporal locality

Published: 09 November 2020 Publication History

Abstract

We present a methodology for precision tuning of full applications. These techniques must select a search space composed of either variables or instructions and provide a scalable search strategy. In full application settings one cannot assume compiler support for practical reasons. Thus, an additional important challenge is enabling code refactoring. We argue for an instruction-based search space and we show: 1) how to exploit dynamic program information based on call stacks; and 2) how to exploit the iterative nature of scientific codes, combined with temporal locality. We applied the methodology to tune the implementation of scientific codes written in a combination of Python, CUDA, C++ and Fortran, tuning calls to math exp library functions. The iterative search refinement always reduces the search complexity and the number of steps to solution. Dynamic program information increases search efficacy. Using this approach, we obtain application runtime performance improvements up to 27%.

References

[1]
Yohan Chatelain, Pablo de Oliveira Castro, Eric Petit, David Defour, Jordan Bieder, and Marc Torrent. Veritracer: Context-enriched tracer for floating-point arithmetic analysis. In 25th IEEE Symposium on Computer Arithmetic, ARITH 2018, Amherst, MA, USA, June 25--27, 2018, pages 61--68. IEEE, 2018.
[2]
Yohan Chatelain, Eric Petit, Pablo de Oliveira Castro, Ghislain Lartigue, and David Defour. Automatic exploration of reduced floating-point representations in iterative methods. In Ramin Yahyapour, editor, Euro-Par 2019: Parallel Processing - 25th International Conference on Parallel and Distributed Computing, Göttingen, Germany, August 26--30, 2019, Proceedings, volume 11725 of Lecture Notes in Computer Science, pages 481--494. Springer, 2019.
[3]
Yohan Chatelain, Eric Petit, Pablo de Oliveira Castro, Ghislain Lartigue, and David Defour. Automatic exploration of reduced floating-point representations in iterative methods. In Ramin Yahyapour, editor, Euro-Par 2019: Parallel Processing - 25th International Conference on Parallel and Distributed Computing, Göttingen, Germany, August 26--30, 2019, Proceedings, volume 11725 of Lecture Notes in Computer Science, pages 481--494. Springer, 2019.
[4]
Nestor DEMEURE. Shaman: Evaluate the numerical accuracy of an application. https://gitlab.com/numerical_shaman/shaman, 2020. [Online; accessed 15-February-2020].
[5]
Christophe Denis, Pablo de Oliveira Castro, and Eric Petit. Verificarlo: Checking floating point accuracy through monte carlo arithmetic. In Paolo Montuschi, Michael J. Schulte, Javier Hormigo, Stuart F. Oberman, and Nathalie Revol, editors, 23nd IEEE Symposium on Computer Arithmetic, ARITH 2016, Silicon Valley, CA, USA, July 10--13, 2016, pages 55--62. IEEE Computer Society, 2016.
[6]
CCTBX developers. CCTBX framework. https://cctbx.github.io, 2020. [Online; accessed 01-January-2020].
[7]
PeleC developers. presentation website. https://github.com/AMReX-Combustion/PeleC, 2017. [Online; accessed 01-January-2020].
[8]
DOE. ECP. https://www.exascaleproject.org, 2020. [Online; accessed 22-April-2020].
[9]
François Févotte and Bruno Lathuilière. Debugging and optimization of HPC programs with the verrou tool. In Ignacio Laguna and Cindy Rubio-González, editors, 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness), Denver, CO, USA, November 18, 2019, pages 1--10. IEEE, 2019.
[10]
Santo Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 -- 174, 2010.
[11]
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. 99(12):7821--7826, 2002.
[12]
Hui Guo and Cindy Rubio-González. Exploiting community structure for floating-point precision tuning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, page 333--343, New York, NY, USA, 2018. Association for Computing Machinery.
[13]
Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J. Higham. Harnessing gpu tensor cores for fast fp16 arithmetic to speed up mixed-precision iterative refinement solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC '18. IEEE Press, 2018.
[14]
Fabienne Jezequel and Jean Chesneaux. Cadna: a library for estimating round-off error propagation. Computer Physics Communications, 178:933--955, 06 2008.
[15]
I. Laguna. Fpchecker: Detecting floating-point exceptions in gpu applications. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1126--1129, Nov 2019.
[16]
Ignacio Laguna, P. C. Wood, Ranvijay Pratap Singh, and Saurabh Bagchi. Gpumixer: Performance-driven floating-point tuning for gpu scientific applications. In ISC, 2019.
[17]
M. O. Lam, T. Vanderbruggen, H. Menon, and M. Schordan. Tool integration for source-level mixed precision. In 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness), pages 27--35, Nov 2019.
[18]
Michael O Lam and Jeffrey K Hollingsworth. Fine-grained floating-point precision analysis. The International Journal of High Performance Computing Applications, 32(2):231--245, 2018.
[19]
Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, and Matthew P. Legendre. Automatically adapting programs for mixed-precision floating-point computation. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, page 369--378, New York, NY, USA, 2013. Association for Computing Machinery.
[20]
Michael O. Lam, Jeffrey K. Hollingsworth, and G.W. Stewart. Dynamic floating-point cancellation detection. Parallel Computing, 39(3):146 --- 155, 2013. High-performance Infrastructure for Scalable Tools.
[21]
Dorothee Liebschner, Pavel V. Afonine, Matthew L. Baker, Gábor Bunkóczi, Vincent B. Chen, Tristan I. Croll, Bradley Hintze, Li-Wei Hung, Swati Jain, Airlie J. McCoy, Nigel W. Moriarty, Robert D. Oeffner, Billy K. Poon, Michael G. Prisant, Randy J. Read, Jane S. Richardson, David C. Richardson, Massimo D. Sammito, Oleg V. Sobolev, Duncan H. Stockwell, Thomas C. Terwilliger, Alexandre G. Urzhumtsev, Lizbeth L. Videau, Christopher J. Williams, and Paul D. Adams. Macro-molecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallographica Section D, 75(10):861--877, Oct 2019.
[22]
Zhenqi Lu, Johan Wahlström, and Arye Nehorai. Community detection in complex networks via clique conductance. Nature - Scientific Reports, 8(1):5982, 2018.
[23]
Derek Mendez and Billy K. Poon. CUDA kernel. https://github.com/cctbx/cctbx_project/blob/master/simtbx/nanoBragg/nanoBraggCUDA.cu, 2020. [Online; accessed 01-January-2020].
[24]
Harshitha Menon, Michael O. Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeffrey Hittinger. Adapt: Algorithmic differentiation applied to floating-point precision tuning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC '18. IEEE Press, 2018.
[25]
Uwe Naumann. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Society for Industrial and Applied Mathematics, USA, 2012.
[26]
NERSC. NESAP. https://www.nersc.gov/research-and-development/nesap/, 2020. [Online; accessed 22-April-2020].
[27]
David Poliakoff and Matt LeGendre. Gotcha: An function-wrapping interface for hpc tools. In ESPT/VPA@SC, 2017.
[28]
Pele project. presentation website. https://amrex-combustion.github.io, 2017. [Online; accessed 01-January-2020].
[29]
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, New York, NY, USA, 2013. Association for Computing Machinery.
[30]
C. Rubio-González, C. Nguyen, B. Mehne, K. Sen, J. Demmel, W. Kahan, C. Iancu, W. Lavrijsen, D. H. Bailey, and D. Hough. Floating-point precision tuning using blame analysis. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 1074--1085, May 2016.
[31]
Alex Sanchez-Stern, Pavel Panchekha, Sorin Lerner, and Zachary Tatlock. Finding root causes of floating point error with herbgrind. CoRR, abs/1705.10416, 2017.
[32]
Geoffrey Sawaya, Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan, and Dong H. Ahn. Flit: Cross-platform floating-point result-consistency tester and workload. In 2017 IEEE International Symposium on Workload Characterization, IISWC 2017, Seattle, WA, USA, October 1--3, 2017, pages 229--238. IEEE Computer Society, 2017.
[33]
Jean Vignes. Discrete stochastic arithmetic for validating results of numerical software. Numerical Algorithms, 37(1):377--390, Dec 2004.
[34]
Zhao Yang, René Algesheimer, and Claudio J. Tessone. A comparative analysis of community detection algorithms on artificial networks. Scientific Reports, 6(1):30750, 2016.
[35]
Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Software Eng., 28(2):183--200, 2002.

Cited By

View all
  • (undefined)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/3711683
  1. Tuning floating-point precision using dynamic program information and temporal locality

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2020
    1454 pages
    ISBN:9781728199986

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    IEEE Press

    Publication History

    Published: 09 November 2020

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (undefined)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/3711683

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media