research-article

Algorithm 908: Online Exact Summation of Floating-Point Streams

Authors:

Yong-Kang Zhu,

Wayne B. HayesAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 37, Issue 3

Article No.: 37, Pages 1 - 13

https://doi.org/10.1145/1824801.1824815

Published: 01 September 2010 Publication History

Get Access

Abstract

We present a novel, online algorithm for exact summation of a stream of floating-point numbers. By “online” we mean that the algorithm needs to see only one input at a time, and can take an arbitrary length input stream of such inputs while requiring only constant memory. By “exact” we mean that the sum of the internal array of our algorithm is exactly equal to the sum of all the inputs, and the returned result is the correctly-rounded sum. The proof of correctness is valid for all inputs (including nonnormalized numbers but modulo intermediate overflow), and is independent of the number of summands or the condition number of the sum. The algorithm asymptotically needs only 5 FLOPs per summand, and due to instruction-level parallelism runs only about 2--3 times slower than the obvious, fast-but-dumb “ordinary recursive summation” loop when the number of summands is greater than 10,000. Thus, to our knowledge, it is the fastest, most accurate, and most memory efficient among known algorithms. Indeed, it is difficult to see how a faster algorithm or one requiring significantly fewer FLOPs could exist without hardware improvements. An application for a large number of summands is provided.

Supplementary Material

Zip (908.zip)

Software for Online Exact Summation of Floating-Point Streams

Download
7.08 KB

References

[1]

}}Anderson, I. J. 1999. A distillation algorithm for floating-point summation. SIAM J. Sci. Comput. 20, 5, 1797--1806.

Digital Library

Google Scholar

[2]

}}Barnes, J. and Hut, P. 1986. A hierarchical o(n logn) force-calculation algorithm. Nature 324, 446--449.

Crossref

Google Scholar

[3]

}}Barnes, J. E. and Hut, P. 1989. Error analysis of a tree code. Astrophys. J. Suppl. Series 70, 389--417.

Crossref

Google Scholar

[4]

}}Dekker, T. J. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 224--242.

Digital Library

Google Scholar

[5]

}}Demmel, J. and Hida, Y. 2003. Accurate and efficient floating point summation. SIAM J. Sci. Comput. 25, 1214--1248.

Digital Library

Google Scholar

[6]

}}Dubinski, J. 1996. A parallel tree code. New Astron. 1, 133--147.

Crossref

Google Scholar

[7]

}}Gough, B. J. and Stallman, R. M. 2004. An Introduction to GCC. Network Theory Ltd., Bristol, UK.

Digital Library

Google Scholar

[8]

}}Hayes, W. B. 1995. Efficient shadowing of high dimensional chaotic systems with the large astrophysical n-body problem as an example. M.S. thesis, Department of Computer Science, University of Toronto.

Google Scholar

[9]

}}Hayes, W. B. 2005. A survey of shadowing methods for numerical solutions of ordinary differential equations. Appl. Numer. Math. 53, 2-4, 299--321.

Digital Library

Google Scholar

[10]

}}Higham, N. J. 1993. The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 4, 783--799.

Digital Library

Google Scholar

[11]

}}Higham, N. J. 2002. Accuracy and Stability of Numerical Algorithms, 2nd Ed. Society for Industrial and Applied Mathematics, Philadelphia, PA.

Digital Library

Google Scholar

[12]

}}Hindmarsh, A. C. 1980. LSODE and LSODI, two new initial value ordinary differential equation solvers. ACM-SIGNUM Newslett. 15, 4, 10--11.

Digital Library

Google Scholar

[13]

}}Jernigan, J. G. and Porter, D. H. 1989. A tree code with logarithmic reduction of force terms, hierarchical regularization of all variables, and explicit accuracy controls. Astrophys. J. Suppl. Series 71, 871--893.

Crossref

Google Scholar

[14]

}}Knuth, D. E. 1998. The Art of Computer Programming, 3rd Ed. Vol. 2 Seminumerical Algorithms. Addison-Wesley, Reading, MA.

Digital Library

Google Scholar

[15]

}}Kulisch, U. and Bohlender, G. 1976. Formalization and implementation of floating-point matrix operations. Comput. 16, 239--261.

Crossref

Google Scholar

[16]

}}Kulisch, U. W. and Miranker, W. L. 1981. Computer Arithmetic in Theory and Practice. Academic Press, Inc., Orlando, FL.

Digital Library

Google Scholar

[17]

}}Ogita, T., Rump, S. M., and Oishi, S. 2005. Accurate sum and dot product. SIAM J. Sci. Comput. 26, 1955--1988.

Digital Library

Google Scholar

[18]

}}Rump, S. M. 2009. Ultimately fast accurate summation. SIAM J. Sci. Comput. 31, 5, 3446--3502.

Digital Library

Google Scholar

[19]

}}Rump, S. M., Ogita, T., and Oishi, S. 2008a. Accurate floating-point summation Part I: Faithful rounding. SIAM J. Sci. Comput. 31, 189--224.

Digital Library

Google Scholar

[20]

}}Rump, S. M., Ogita, T., and Oishi, S. 2008b. Accurate floating-point summation Part II: Sign, k-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31, 1269--1302.

Digital Library

Google Scholar

[21]

}}Zhu, Y.-K. 2009. Exact floating-point summation and its application in shadowing. Ph.D. thesis, Computer Science Department, University of California, Irvine, CA.

Digital Library

Google Scholar

[22]

}}Zhu, Y.-K. and Hayes, W. B. 2006. Fast, guaranteed-accurate sums of many floating-point numbers. In Proceedings of the 7th Real Numbers and Computers Conference, G. Hanrot and P. Zimmermann Eds., INRIA, France, 11--22.

Google Scholar

[23]

}}Zhu, Y.-K. and Hayes, W. B. 2009. Correct rounding and a hybrid approach to exact floating-point summation. SIAM J. Sci. Comput. 31, 2981--3001.

Digital Library

Google Scholar

[24]

}}Zhu, Y.-K., Yong, J.-H., and Zheng, G.-Q. 2005. A new distillation algorithm for floating-point summation. SIAM J. Sci. Comput. 26, 2066--2078.

Digital Library

Google Scholar

Cited By

View all

Wu WZhou Z(2024)Multiscale jump testing and estimation under complex temporal dynamicsBernoulli10.3150/23-BEJ167730:3Online publication date: 1-Aug-2024
https://doi.org/10.3150/23-BEJ1677
Ozaki KMukunoki DOgita T(2024)Extension of accurate numerical algorithms for matrix multiplication based on error-free transformationJapan Journal of Industrial and Applied Mathematics10.1007/s13160-024-00677-zOnline publication date: 29-Oct-2024
https://doi.org/10.1007/s13160-024-00677-z
Lange M(2022)Toward Accurate and Fast SummationACM Transactions on Mathematical Software10.1145/354448848:3(1-39)Online publication date: 10-Sep-2022
https://dl.acm.org/doi/10.1145/3544488
Show More Cited By

Index Terms

Algorithm 908: Online Exact Summation of Floating-Point Streams
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis

Recommendations

Toward Accurate and Fast Summation
We introduce a new accurate summation algorithm based on the error-free summation into floating-point buckets. Our algorithm exploits ideas from Zhu and Hayes’ OnlineExactSum, but it uses a significantly smaller number of accumulators and has a better ...
Error Analysis of Clenshaw's Algorithm for Evaluating Derivatives of a Polynomial
Abstract
A forward rounding error analysis is presented for the extended Clenshaw algorithm due to Skrzipek for evaluating the derivatives of a polynomial expanded in terms of orthogonal polynomials. Reformulating in matrix notation the three-term ...
A New Distillation Algorithm for Floating-Point Summation

The summation of n floating-point numbers is ubiquitous in numerical computations. We present a new distillation algorithm for floating-point summation which is stable, efficient, and accurate. The algorithm iteratively "distills" the summands without ...

Reviews

Reviewer: Kai Diethelm

At the innermost core of many algorithms in scientific computing is the task of summing up many floating-point numbers. Therefore, it is important to have algorithms that can handle this problem fast and accurately. A key issue in this context is the propagation of rounding errors. Zhu and Hayes' algorithm, OnlineExactSum, presents an impressive solution to this problem. As its name suggests, it is possible to prove that the algorithm computes the correctly rounded value of the exact sum of the input data. Moreover, the input data stream can be arbitrarily long, whereas the algorithm only needs to see one input at a time. The only restriction imposed is that the intermediate results are not allowed to contain overflows. Under this assumption, the authors prove the correctness of their method and demonstrate that it runs very fast; it only requires five floating-point operations per summand. The use of instruction-level parallelism then leads to a runtime that exceeds the potentially highly inaccurate ordinary recursive summation (which requires just one floating-point operation per summand) only by a factor in the range of two to three. The method is based on HybridSum, the authors' previous algorithm [1], which is based on an idea of Demmel and Hida [2] to use extra precision accumulators to sum floating-point numbers with identical exponents, and to provide a fast and accurate method (iFastSum) to compute a correctly rounded sum of these accumulators. The new idea is to double the number of accumulators, thus avoiding the necessity to split the input data. The algorithm is not only fast, but its memory requirements are also very modest and, in particular, independent of the length of the input stream. A nice side effect of the fact that the method always produces the correct result is that it always produces the same result, no matter the order in which the summands are processed. This is important, for example, in parallel computations such as matrix/vector multiplications or scalar products, where each processor handles a part of the vectors or matrices under consideration and different runs of the program with identical input data may, due to different workloads of the processors, lead to intermediate results arriving out of order. Less sophisticated algorithms will then lead to rounding errors being propagated in different ways and, hence, to the loss of reproducibility of the results. Readers will find that this algorithm has many applications in the field of scientific computing. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 37, Issue 3

September 2010

296 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/1824801

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2010

Accepted: 01 April 2010

Revised: 01 December 2009

Received: 01 April 2009

Published in TOMS Volume 37, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
826
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wu WZhou Z(2024)Multiscale jump testing and estimation under complex temporal dynamicsBernoulli10.3150/23-BEJ167730:3Online publication date: 1-Aug-2024
https://doi.org/10.3150/23-BEJ1677
Ozaki KMukunoki DOgita T(2024)Extension of accurate numerical algorithms for matrix multiplication based on error-free transformationJapan Journal of Industrial and Applied Mathematics10.1007/s13160-024-00677-zOnline publication date: 29-Oct-2024
https://doi.org/10.1007/s13160-024-00677-z
Lange M(2022)Toward Accurate and Fast SummationACM Transactions on Mathematical Software10.1145/354448848:3(1-39)Online publication date: 10-Sep-2022
https://dl.acm.org/doi/10.1145/3544488
Uchino YOzaki K(2021)Verified Numerical Computations for Searching for Vectors with the Maximum SumJournal of Advanced Simulation in Science and Engineering10.15748/jasse.8.538:1(53-72)Online publication date: 2021
https://doi.org/10.15748/jasse.8.53
Job VGrove TFogerty SMauney CNeuman BMonroe LRobey R(2020)Order Matters: A Case Study on Reducing Floating Point Error in Sums Via Ordering and Grouping2020 IEEE/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness51934.2020.00007(10-19)Online publication date: Nov-2020
https://doi.org/10.1109/Correctness51934.2020.00007
Lange MOishi S(2020)A note on Dekker’s FastTwoSum algorithmNumerische Mathematik10.1007/s00211-020-01114-2145:2(383-403)Online publication date: 24-Apr-2020
https://dl.acm.org/doi/10.1007/s00211-020-01114-2
Mascagni MIakymchuk RGraillat SDefour DQuintana-Ortí E(2019)Hierarchical approach for deriving a reproducible unblocked LU factorizationInternational Journal of High Performance Computing Applications10.1177/109434201983296833:5(791-803)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1177/1094342019832968
Langrené NWarin X(2018)Fast and Stable Multivariate Kernel Density Estimation by Fast Sum UpdatingJournal of Computational and Graphical Statistics10.1080/10618600.2018.1549052(1-27)Online publication date: 28-Nov-2018
https://doi.org/10.1080/10618600.2018.1549052
Sakamoto N(2017)Parallel Online Exact Summation of Floating-point Numbers by Applying MapReduce of Java8International Journal of Software Innovation10.4018/IJSI.20170401025:2(17-32)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.4018/IJSI.2017040102
Chohra CLanglois PParello D(2017)Reproducible, Accurately Rounded and Efficient BLASEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_49(609-620)Online publication date: 28-May-2017
https://doi.org/10.1007/978-3-319-58943-5_49
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Toward Accurate and Fast Summation

Error Analysis of Clenshaw's Algorithm for Evaluating Derivatives of a Polynomial

A New Distillation Algorithm for Floating-Point Summation

Reviews

Access critical reviews of Computing literature here