Nothing Special   »   [go: up one dir, main page]

skip to main content
Skip header Section
Handbook of Floating-Point ArithmeticDecember 2009
Skip Bibliometrics Section
Reflects downloads up to 23 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

Floating-point arithmetic is by far the most widely used way of implementing real-number arithmetic on modern computers. Although the basic principles of floating-point arithmetic can be explained in a short amount of time, making such an arithmetic reliable and portable, yet fast, is a very difficult task. From the 1960s to the early 1980s, many different arithmetics were developed, but their implementation varied widely from one machine to another, making it difficult for nonexperts to design, learn, and use the required algorithms. As a result, floating-point arithmetic is far from being exploited to its full potential. This handbook aims to provide a complete overview of modern floating-point arithmetic, including a detailed treatment of the newly revised (IEEE 754-2008) standard for floating-point arithmetic. Presented throughout are algorithms for implementing floating-point arithmetic as well as algorithms that use floating-point arithmetic. So that the techniques presented can be put directly into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program. Key topics and features include: * Presentation of the history and basic concepts of floating-point arithmetic and various aspects of the past and current standards * Development of smart and nontrivial algorithms, and algorithmic possibilities induced by the availability of a fused multiply-add (fma) instruction, e.g., correctly rounded software division and square roots * Implementation of floating-point arithmetic, either in softwareon an integer processoror hardware, and a discussion of issues related to compilers and languages * Coverage of several recent advances related to elementary functions: correct rounding of these functions and computation of very accurate approximations under constraints * Extensions of floating-point arithmetic such as certification, verification, and big precision Handbook of Floating-Point Arithmetic is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

Cited By

  1. ACM
    Weber A, Macoveiciuc E and Reissig G ABS: A formally correct software tool for space-efficient symbolic synthesis 25th ACM International Conference on Hybrid Systems: Computation and Control, (1-10)
  2. ACM
    Muller J and Rideau L (2022). Formalization of Double-Word Arithmetic, and Comments on “Tight and Rigorous Error Bounds for Basic Building Blocks of Double-Word Arithmetic”, ACM Transactions on Mathematical Software, 48:1, (1-24), Online publication date: 31-Mar-2022.
  3. ACM
    Das A, Krishnamoorthy S, Briggs I, Gopalakrishnan G and Tipireddy R (2020). FPDetect, ACM Transactions on Architecture and Code Optimization, 17:3, (1-27), Online publication date: 30-Sep-2020.
  4. ACM
    Lange M and Rump S (2020). Faithfully Rounded Floating-point Computations, ACM Transactions on Mathematical Software, 46:3, (1-20), Online publication date: 30-Sep-2020.
  5. Zhao R, Steinfeld R and Sakzad A (2020). FACCT: FAst, Compact, and Constant-Time Discrete Gaussian Sampler over Integers, IEEE Transactions on Computers, 69:1, (126-137), Online publication date: 1-Jan-2020.
  6. Mascagni M, Iakymchuk R, Graillat S, Defour D and Quintana-Ortí E (2020). Hierarchical approach for deriving a reproducible unblocked LU factorization, International Journal of High Performance Computing Applications, 33:5, (791-803), Online publication date: 1-Sep-2019.
  7. ACM
    Bentley M, Briggs I, Gopalakrishnan G, Ahn D, Laguna I, Lee G and Jones H Multi-Level Analysis of Compiler-Induced Variability and Performance Tradeoffs Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, (61-72)
  8. ACM
    Shterenlikht A (2019). On Quality of Implementation of Fortran 2008 Complex Intrinsic Functions on Branch Cuts, ACM Transactions on Mathematical Software, 45:1, (1-9), Online publication date: 31-Mar-2019.
  9. Antoñana M, Makazaga J and Murua A (2018). New Integration Methods for Perturbed ODEs Based on Symplectic Implicit Runge---Kutta Schemes with Application to Solar System Simulations, Journal of Scientific Computing, 76:1, (630-650), Online publication date: 1-Jul-2018.
  10. ACM
    Joldes M, Muller J and Popescu V (2017). Tight and Rigorous Error Bounds for Basic Building Blocks of Double-Word Arithmetic, ACM Transactions on Mathematical Software, 44:2, (1-27), Online publication date: 30-Jun-2018.
  11. Antoñana M, Makazaga J and Murua A (2018). Efficient implementation of symplectic implicit Runge-Kutta schemes with simplified Newton iterations, Numerical Algorithms, 78:1, (63-86), Online publication date: 1-May-2018.
  12. Antoñana M, Makazaga J and Murua A (2017). Reducing and monitoring round-off error propagation for symplectic implicit Runge-Kutta schemes, Numerical Algorithms, 76:4, (861-880), Online publication date: 1-Dec-2017.
  13. ACM
    Bourke T, Carcenac F, Colaço J, Pagano B, Pasteur C and Pouzet M (2017). A Synchronous Look at the Simulink Standard Library, ACM Transactions on Embedded Computing Systems, 16:5s, (1-24), Online publication date: 10-Oct-2017.
  14. Damouche N, Martel M and Chapoutot A (2017). Improving the numerical accuracy of programs by automatic transformation, International Journal on Software Tools for Technology Transfer (STTT), 19:4, (427-448), Online publication date: 1-Aug-2017.
  15. Baikov N (2017). Algorithm and Implementation Details for Complementary Error Function, IEEE Transactions on Computers, 66:7, (1106-1118), Online publication date: 1-Jul-2017.
  16. ACM
    Damouche N, Martel M and Chapoutot A Numerical Accuracy Improvement by Interprocedural Program Transformation Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems, (1-10)
  17. ACM
    Fortin P, Gouicem M and Graillat S (2016). GPU-Accelerated Generation of Correctly Rounded Elementary Functions, ACM Transactions on Mathematical Software, 43:3, (1-26), Online publication date: 16-Jan-2017.
  18. ACM
    Rump S (2016). IEEE754 Precision-k base-β Arithmetic Inherited by Precision-m Base-β Arithmetic for k < m, ACM Transactions on Mathematical Software, 43:3, (1-15), Online publication date: 16-Jan-2017.
  19. Pimentel J, Bohnenstiehl B and Baas B (2017). Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput Tradeoffs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25:1, (100-113), Online publication date: 1-Jan-2017.
  20. Mahzoon A and Alizadeh B (2017). OptiFEX, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25:1, (198-209), Online publication date: 1-Jan-2017.
  21. Martin-Dorel É and Melquiond G (2016). Proving Tight Bounds on Univariate Expressions with Elementary Functions in Coq, Journal of Automated Reasoning, 57:3, (187-217), Online publication date: 1-Oct-2016.
  22. Wilson H and Keich U (2016). Accurate pairwise convolutions of non-negative vectors via FFT, Computational Statistics & Data Analysis, 101:C, (300-315), Online publication date: 1-Sep-2016.
  23. ACM
    Filip S (2016). A Robust and Scalable Implementation of the Parks-McClellan Algorithm for Designing FIR Filters, ACM Transactions on Mathematical Software, 43:1, (1-24), Online publication date: 29-Aug-2016.
  24. Joldeş M, Marty O, Muller J and Popescu V (2016). Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions, IEEE Transactions on Computers, 65:4, (1197-1210), Online publication date: 1-Apr-2016.
  25. ACM
    Fu Z, Bai Z and Su Z (2015). Automated backward error analysis for numerical code, ACM SIGPLAN Notices, 50:10, (639-654), Online publication date: 18-Dec-2015.
  26. ACM
    McCleeary R, Brain M and Stump A (2015). A lazy approach to adaptive exact real arithmetic using floating-point operations, ACM Communications in Computer Algebra, 49:3, (83-86), Online publication date: 24-Nov-2015.
  27. Jeannerod C Exploiting Structure in Floating-Point Arithmetic Revised Selected Papers of the 6th International Conference on Mathematical Aspects of Computer and Information Sciences - Volume 9582, (25-34)
  28. ACM
    Graillat S, Lauter C, Tang P, Yamanaka N and Oishi S (2015). Efficient Calculations of Faithfully Rounded l2-Norms of n-Vectors, ACM Transactions on Mathematical Software, 41:4, (1-20), Online publication date: 26-Oct-2015.
  29. ACM
    Fu Z, Bai Z and Su Z Automated backward error analysis for numerical code Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, (639-654)
  30. ACM
    Trefethen L (2015). Computing numerically with functions instead of numbers, Communications of the ACM, 58:10, (91-97), Online publication date: 28-Sep-2015.
  31. Isupov K and Knyazkov V A Modular-Positional Computation Technique for Multiple-Precision Floating-Point Arithmetic Proceedings of the 13th International Conference on Parallel Computing Technologies - Volume 9251, (47-61)
  32. Damouche N, Martel M and Chapoutot A Impact of Accuracy Optimization on the Convergence of Numerical IterativeźMethods Revised Selected Papers of the 25th International Symposium on Logic-Based Program Synthesis and Transformation - Volume 9527, (143-160)
  33. ACM
    Frechtling M and Leong P (2015). MCALIB, ACM Transactions on Programming Languages and Systems, 37:2, (1-25), Online publication date: 16-Apr-2015.
  34. ACM
    Muller J (2015). On the Error of Computing ab+cd using Cornea, Harrison and Tang's Method, ACM Transactions on Mathematical Software, 41:2, (1-8), Online publication date: 4-Feb-2015.
  35. ACM
    Joldes M, Popescu V and Tucker W (2014). Searching for Sinks for the Hénon Map using a Multipleprecision GPU Arithmetic Library, ACM SIGARCH Computer Architecture News, 42:4, (63-68), Online publication date: 3-Dec-2014.
  36. ACM
    Neron P Elimination of Square Roots and Divisions by Partial Inlining Proceedings of the 16th International Symposium on Principles and Practice of Declarative Programming, (81-92)
  37. ACM
    Lupon M, Gibert E, Magklis G, Samudrala S, Martínez R, Stavrou K and Ditzel D (2014). Speculative hardware/software co-designed floating-point multiply-add fusion, ACM SIGARCH Computer Architecture News, 42:1, (623-638), Online publication date: 5-Apr-2014.
  38. ACM
    Lupon M, Gibert E, Magklis G, Samudrala S, Martínez R, Stavrou K and Ditzel D (2014). Speculative hardware/software co-designed floating-point multiply-add fusion, ACM SIGPLAN Notices, 49:4, (623-638), Online publication date: 5-Apr-2014.
  39. ACM
    Lupon M, Gibert E, Magklis G, Samudrala S, Martínez R, Stavrou K and Ditzel D Speculative hardware/software co-designed floating-point multiply-add fusion Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (623-638)
  40. ACM
    de Dinechin F, Lauter C, Muller J and Torres S (2013). On Ziv's rounding test, ACM Transactions on Mathematical Software, 39:4, (1-19), Online publication date: 1-Jul-2013.
  41. ACM
    Ould-Bachir T and David J (2013). Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators, ACM Transactions on Reconfigurable Technology and Systems, 6:1, (1-21), Online publication date: 1-May-2013.
  42. ACM
    Langhammer M and Pasca B Faithful single-precision floating-point tangent for FPGAs Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, (39-42)
  43. ACM
    Yan S (2012). Review of modern computer arithmetic, by Richard Brent and Paul Zimmermann, ACM SIGACT News, 43:4, (49-51), Online publication date: 19-Dec-2012.
  44. ACM
    Mironov I On significance of the least significant bits for differential privacy Proceedings of the 2012 ACM conference on Computer and communications security, (650-661)
  45. Ioualalen A and Martel M A new abstract domain for the representation of mathematically equivalent expressions Proceedings of the 19th international conference on Static Analysis, (75-93)
  46. ACM
    Benz F, Hildebrandt A and Hack S (2012). A dynamic program analysis to find floating-point accuracy problems, ACM SIGPLAN Notices, 47:6, (453-462), Online publication date: 6-Aug-2012.
  47. ACM
    Benz F, Hildebrandt A and Hack S A dynamic program analysis to find floating-point accuracy problems Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (453-462)
  48. de Dinechin F and Didier L Table-Based division by small integer constants Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications, (53-63)
  49. Manoukian M and Constantinides G Accurate floating point arithmetic through hardware error-free transformations Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications, (94-101)
  50. Goubault E and Putot S Static analysis of finite precision computations Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation, (232-247)
  51. Chapoutot A Interval slopes as a numerical abstract domain for floating-point variables Proceedings of the 17th international conference on Static analysis, (184-200)
  52. ACM
    Banescu S, de Dinechin F, Pasca B and Tudoran R (2011). Multipliers for floating-point double precision and beyond on FPGAs, ACM SIGARCH Computer Architecture News, 38:4, (73-79), Online publication date: 14-Sep-2010.
  53. Chevillard S, Joldeş M and Lauter C Sollya Proceedings of the Third international congress conference on Mathematical software, (28-31)
  54. ACM
    Lefèvre V, Théveny P, de Dinechin F, Jeannerod C, Mouilleron C, Pfannholzer D and Revol N (2010). LEMA, ACM Communications in Computer Algebra, 44:1/2, (41-52), Online publication date: 29-Jul-2010.
  55. ACM
    Jeannerod C, Mouilleron C, Muller J, Revy G, Bertin C, Jourdan-Lu J, Knochel H and Monat C Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, (1-9)
  56. Ioualalen A and Martel M Neural Network Precision Tuning Quantitative Evaluation of Systems, (129-143)
Contributors
  • University of Lyon
  • University of Lyon
  • National Institute of Applied Sciences of Lyon
  • Lyon Higher Normal School
  • University of Lyon
  • Paris-Saclay Normal School
  • Lyon Higher Normal School
  • The University of Sydney
  • Lyon Higher Normal School
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations