research-article

Precimonious: tuning assistant for floating-point precision

Authors:

Cindy Rubio-González,

David HoughAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 27, Pages 1 - 12

https://doi.org/10.1145/2503210.2503296

Published: 17 November 2013 Publication History

Get Access

Abstract

Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we present Precimonious, a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs. Precimonious performs a search on the types of the floating-point program variables trying to lower their precision subject to accuracy constraints and performance goals. Our tool recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. We evaluate Precimonious on several widely used functions from the GNU Scientific Library, two NAS Parallel Benchmarks, and three other numerical programs. For most of the programs analyzed, Precimonious reduces precision, which results in performance improvements as high as 41%.

References

[1]

D. An, R. Blue, M. Lam, S. Piper, and G. Stoker. Fpinst: Floating point error analysis using dyninst, 2008. URL http://www.freearrow.com/downloads/files/fpinst.pdf.

Google Scholar

[2]

M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. Accelerating scientific computations with mixed precision algorithms. Computer Physics Communications, 180(12):2526--2533, 2009.

Crossref

Google Scholar

[3]

D. H. Bailey. Resolving numerical anomalies in scientific computation, 2008. URL http://www.davidhbailey.com/dhbpapers/numerical-bugs.pdf.

Google Scholar

[4]

D. H. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0, 1995.

Google Scholar

[5]

E. T. Barr, T. Vo, V. Le, and Z. Su. Automatic detection of floating-point exceptions. In R. Giacobazzi and R. Cousot, editors, POPL, pages 549--560. ACM, 2013.

Digital Library

Google Scholar

[6]

F. Benz, A. Hildebrandt, and S. Hack. A dynamic program analysis to find floating-point accuracy problems. In J. Vitek, H. Lin, and F. Tip, editors, PLDI, pages 453--462. ACM, 2012.

Digital Library

Google Scholar

[7]

J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the International Conference on Supercomputing, Vienna, Austria, July 1997. ACM SIGARC. see http://www.icsi.berkeley.edu/~bilmes/phipac.

Digital Library

Google Scholar

[8]

A. W. Brown, P. H. J. Kelly, and W. Luk. Profiling floating point value ranges for reconfigurable implementation. In Proceedings of the 1st HiPEAC Workshop on Reconfigurable Computing, pages 6--16, 2007.

Google Scholar

[9]

A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64--bit accuracy. ACM Trans. Math. Softw., 34(4):17:1--17:22, July 2008. URL http://doi.acm.org/10.1145/1377596.1377597.

Digital Library

Google Scholar

[10]

G. P. Contributors. GSL - GNU scientific library - GNU project - free software foundation (FSF). http://www.gnu.org/software/gsl/, 2010. URL http://www.gnu.org/software/gsl/.

Google Scholar

[11]

E. Darulova and V. Kuncak. Trustworthy numerical computation in scala. In C. V. Lopes and K. Fisher, editors, OOPSLA, pages 325--344. ACM, 2011.

Digital Library

Google Scholar

[12]

F. de Dinechin, C. Lauter, and G. Melquiond. Certifying the floating-point implementation of an elementary function using gappa. IEEE Trans. Comput., 60(2):242--253, Feb. 2011. URL http://dx.doi.org/10.1109/TC.2010.128.

Digital Library

Google Scholar

[13]

D. Delmas, E. Goubault, S. Putot, J. Souyris, K. Tekkal, and F. Védrine. Towards an industrial use of fluctuat on safety-critical avionics software. In Proceedings of the 14th International Workshop on Formal Methods for Industrial Critical Systems, FMICS '09, pages 53--69, Berlin, Heidelberg, 2009. Springer-Verlag. URL http://dx.doi.org/10.1007/978-3-642-04570-7_6.

Digital Library

Google Scholar

[14]

M. Frigo. A Fast Fourier Transform compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.

Digital Library

Google Scholar

[15]

D. Goldberg. What every computer scientist should know about floating point arithmetic. ACM Computing Surveys, 23(1):5--48, 1991.

Digital Library

Google Scholar

[16]

Y. Hida, X. S. Li, and D. H. Bailey. Algorithms for quad-double precision floating point arithmetic. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic, ARITH '01, pages 155--, Washington, DC, USA, 2001. IEEE Computer Society. URL http://dl.acm.org/citation.cfm?id=872021.872445.

Digital Library

Google Scholar

[17]

N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, second edition, 2002.

Digital Library

Google Scholar

[18]

W. Kahan. A Mathematical Curiosity, 2013.

Google Scholar

[19]

M. O. Lam, J. K. Hollingsworth, and G. W. Stewart. Dynamic floating-point cancellation detection. In 1st International Workshop on High-performance Infrastructure for Scalable Tools, 2011.

Google Scholar

[20]

M. O. Lam, J. K. Hollingsworth, B. R. de Supinski, and M. P. LeGendre. Automatically adapting programs for mixed-precision floating-point computation. In A. D. Malony, M. Nemirovsky, and S. P. Midkiff, editors, ICS, pages 369--378. ACM, 2013.

Digital Library

Google Scholar

[21]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.

Digital Library

Google Scholar

[22]

X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, and D. J. Yoo. Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw., 28(2):152--205, June 2002. URL http://doi.acm.org/10.1145/567806.567808.

Digital Library

Google Scholar

[23]

M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.

Crossref

Google Scholar

[24]

T. Ravitch. LLVM Whole-Program Wrapper @ONLINE, Mar. 2011. URL https://github.com/travitch/whole-program-llvm.

Google Scholar

[25]

K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'05), pages 263--272. ACM, 2005.

Digital Library

Google Scholar

[26]

I. C. Society. IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754--2008, Aug. 2008. URL http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935.

Google Scholar

[27]

R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. of SciDAC 2005, J. of Physics: Conference Series. Institute of Physics Publishing, June 2005.

Crossref

Google Scholar

[28]

C. Whaley. Automatically Tuned Linear Algebra Software (ATLAS). math-atlas.sourceforge.net, 2012.

Google Scholar

[29]

A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Software Eng., 28(2):183--200, 2002.

Digital Library

Google Scholar

Cited By

View all

Saeedi SPiri ADeveautour BO’Connor IBosio ASavino ADi Carlo S(2024)A Survey on Design Space Exploration Approaches for Approximate Computing SystemsElectronics10.3390/electronics1322444213:22(4442)Online publication date: 13-Nov-2024
https://doi.org/10.3390/electronics13224442
Chen XTian JBeaver IFreeman CYan YWang JTao D(2024)FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point DataProceedings of the VLDB Endowment10.14778/3648160.364818017:6(1418-1431)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648180
Xu JCui MLi FZhang ZYang HZhou BZhao JChristakis MPradel M(2024)Arfa: An Agile Regime-Based Floating-Point Optimization Approach for Rounding ErrorsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680378(1516-1528)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680378
Show More Cited By

Index Terms

Recommendations

Floating-point precision tuning using blame analysis
ICSE '16: Proceedings of the 38th International Conference on Software Engineering

While tremendously useful, automated techniques for tuning the precision of floating-point programs face important scalability challenges. We present Blame Analysis, a novel dynamic approach that speeds up precision tuning. Blame Analysis performs ...
Decimal Floating-Point Multiplication

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of two decimal floating-point multipliers: one ...
Simulating Low Precision Floating-Point Arithmetic

The half-precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard for floating-point arithmetic, and a more recently proposed half-precision format bfloat16, are increasingly available in GPUs and other accelerators. While the ...

Reviews

Reviewer: Sanzheng Qiao

Accuracy and error tolerance are important requirements for most software. A common practice for achieving the desirable accuracy and avoiding potential numerical problems, such as overflowunderflow, is to use high precision for all of the floating-point variables in a program. High precision comes at a cost: increased execution time. This is especially true when the extra precision such as double double is unavailable in hardware and implemented in software; the extra cost can be substantial. Very often, only a portion of the variables and computation need to be in high precision for the desirable accuracy. However, manually identifying the variables and computation that can use lower precision while satisfying the given error tolerance is impractical, if not impossible. The tool Precimonious presented in this paper automates the tuning of the floating-point precision of the variables of a program. Specifically, given a high-precision program that satisfies the error tolerance, this tool identifies the variables that can be declared in lower precision while achieving the same accuracy without degrading the performance (execution time). The basic idea behind the tool is as follows. It searches for the program variables whose precision (types) can be lowered, for example, from double double to double; suggests a type configuration mapping variables to types; transforms the original program by applying the type configuration; tests the transformed program to determine if the type configuration is valid in terms of the two criteria, accuracy (error tolerance) and performance (execution time); and repeats until a valid type configuration is found. The authors tested the tool on programs that use GNU scientific library and NAS parallel benchmarks. They reported speedup ranging from 0 to 41.7 percent, with more cases of significant speedup within a larger error threshold (10-4) than a smaller threshold (10-10). Tuning floating-point precision to improve performance is of great interest. This work is an ongoing research project rather than a user-friendly software package. Source code (a C program in the tests) is first translated into LLVM bitcode. Mapping the type configuration, which is in LLVM bitcode, to the source code is preformed manually. The type configuration suggested by the tool is valid only for the testing input data, and is not guaranteed for all possible inputs. The user should provide a set of representative input data. The tool only changes variable declaration and switches function calls. Tuning the precision of intermediate results is a challenging future work. Despite the limitations of the tool, it is of great interest for researchers and developers. The Precimonious source code and all of the data and results presented in the paper are available under BSD license at https:github.comcorvette-berkeleyprecimonious. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

258
Total Citations
View Citations
612
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)18

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Saeedi SPiri ADeveautour BO’Connor IBosio ASavino ADi Carlo S(2024)A Survey on Design Space Exploration Approaches for Approximate Computing SystemsElectronics10.3390/electronics1322444213:22(4442)Online publication date: 13-Nov-2024
https://doi.org/10.3390/electronics13224442
Chen XTian JBeaver IFreeman CYan YWang JTao D(2024)FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point DataProceedings of the VLDB Endowment10.14778/3648160.364818017:6(1418-1431)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648180
Xu JCui MLi FZhang ZYang HZhou BZhao JChristakis MPradel M(2024)Arfa: An Agile Regime-Based Floating-Point Optimization Approach for Rounding ErrorsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680378(1516-1528)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680378
Appel AKellison ATimany ATraytel DPientka BBlazy S(2024)VCFloat2: Floating-Point Error Analysis in CoqProceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3636501.3636953(14-29)Online publication date: 9-Jan-2024
https://dl.acm.org/doi/10.1145/3636501.3636953
Briggs ILad YPanchekha P(2024)Implementation and Synthesis of Math Library FunctionsProceedings of the ACM on Programming Languages10.1145/36328748:POPL(942-969)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632874
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Wang YRubio-González CRoychoudhury APaiva AAbreu RStorey M(2024)Predicting Performance and Accuracy of Mixed-Precision Programs for Precision TuningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623338(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623338
Freytag GKünas CRech PNavaux P(2024)Interleaved Execution of Approximated CUDA Kernels in Iterative Applications2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00017(60-67)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00017
Raţiu CAssunção WHerac EHaas RLauwerys CEgyed A(2024)Using reactive links to propagate changes across engineering modelsSoftware and Systems Modeling10.1007/s10270-024-01186-wOnline publication date: 10-Jun-2024
https://doi.org/10.1007/s10270-024-01186-w
Titolo LMoscato MFeliu MMasci PMuñoz C(2024)Rigorous Floating-Point Round-Off Error Analysis in PRECiSA 4.0Formal Methods10.1007/978-3-031-71177-0_2(20-38)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71177-0_2
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Floating-point precision tuning using blame analysis

Decimal Floating-Point Multiplication

Simulating Low Precision Floating-Point Arithmetic

Reviews

Access critical reviews of Computing literature here