Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Algorithm 908: Online Exact Summation of Floating-Point Streams

Published: 01 September 2010 Publication History

Abstract

We present a novel, online algorithm for exact summation of a stream of floating-point numbers. By “online” we mean that the algorithm needs to see only one input at a time, and can take an arbitrary length input stream of such inputs while requiring only constant memory. By “exact” we mean that the sum of the internal array of our algorithm is exactly equal to the sum of all the inputs, and the returned result is the correctly-rounded sum. The proof of correctness is valid for all inputs (including nonnormalized numbers but modulo intermediate overflow), and is independent of the number of summands or the condition number of the sum. The algorithm asymptotically needs only 5 FLOPs per summand, and due to instruction-level parallelism runs only about 2--3 times slower than the obvious, fast-but-dumb “ordinary recursive summation” loop when the number of summands is greater than 10,000. Thus, to our knowledge, it is the fastest, most accurate, and most memory efficient among known algorithms. Indeed, it is difficult to see how a faster algorithm or one requiring significantly fewer FLOPs could exist without hardware improvements. An application for a large number of summands is provided.

Supplementary Material

Zip (908.zip)
Software for Online Exact Summation of Floating-Point Streams

References

[1]
}}Anderson, I. J. 1999. A distillation algorithm for floating-point summation. SIAM J. Sci. Comput. 20, 5, 1797--1806.
[2]
}}Barnes, J. and Hut, P. 1986. A hierarchical o(n logn) force-calculation algorithm. Nature 324, 446--449.
[3]
}}Barnes, J. E. and Hut, P. 1989. Error analysis of a tree code. Astrophys. J. Suppl. Series 70, 389--417.
[4]
}}Dekker, T. J. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 224--242.
[5]
}}Demmel, J. and Hida, Y. 2003. Accurate and efficient floating point summation. SIAM J. Sci. Comput. 25, 1214--1248.
[6]
}}Dubinski, J. 1996. A parallel tree code. New Astron. 1, 133--147.
[7]
}}Gough, B. J. and Stallman, R. M. 2004. An Introduction to GCC. Network Theory Ltd., Bristol, UK.
[8]
}}Hayes, W. B. 1995. Efficient shadowing of high dimensional chaotic systems with the large astrophysical n-body problem as an example. M.S. thesis, Department of Computer Science, University of Toronto.
[9]
}}Hayes, W. B. 2005. A survey of shadowing methods for numerical solutions of ordinary differential equations. Appl. Numer. Math. 53, 2-4, 299--321.
[10]
}}Higham, N. J. 1993. The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 4, 783--799.
[11]
}}Higham, N. J. 2002. Accuracy and Stability of Numerical Algorithms, 2nd Ed. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[12]
}}Hindmarsh, A. C. 1980. LSODE and LSODI, two new initial value ordinary differential equation solvers. ACM-SIGNUM Newslett. 15, 4, 10--11.
[13]
}}Jernigan, J. G. and Porter, D. H. 1989. A tree code with logarithmic reduction of force terms, hierarchical regularization of all variables, and explicit accuracy controls. Astrophys. J. Suppl. Series 71, 871--893.
[14]
}}Knuth, D. E. 1998. The Art of Computer Programming, 3rd Ed. Vol. 2 Seminumerical Algorithms. Addison-Wesley, Reading, MA.
[15]
}}Kulisch, U. and Bohlender, G. 1976. Formalization and implementation of floating-point matrix operations. Comput. 16, 239--261.
[16]
}}Kulisch, U. W. and Miranker, W. L. 1981. Computer Arithmetic in Theory and Practice. Academic Press, Inc., Orlando, FL.
[17]
}}Ogita, T., Rump, S. M., and Oishi, S. 2005. Accurate sum and dot product. SIAM J. Sci. Comput. 26, 1955--1988.
[18]
}}Rump, S. M. 2009. Ultimately fast accurate summation. SIAM J. Sci. Comput. 31, 5, 3446--3502.
[19]
}}Rump, S. M., Ogita, T., and Oishi, S. 2008a. Accurate floating-point summation Part I: Faithful rounding. SIAM J. Sci. Comput. 31, 189--224.
[20]
}}Rump, S. M., Ogita, T., and Oishi, S. 2008b. Accurate floating-point summation Part II: Sign, k-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31, 1269--1302.
[21]
}}Zhu, Y.-K. 2009. Exact floating-point summation and its application in shadowing. Ph.D. thesis, Computer Science Department, University of California, Irvine, CA.
[22]
}}Zhu, Y.-K. and Hayes, W. B. 2006. Fast, guaranteed-accurate sums of many floating-point numbers. In Proceedings of the 7th Real Numbers and Computers Conference, G. Hanrot and P. Zimmermann Eds., INRIA, France, 11--22.
[23]
}}Zhu, Y.-K. and Hayes, W. B. 2009. Correct rounding and a hybrid approach to exact floating-point summation. SIAM J. Sci. Comput. 31, 2981--3001.
[24]
}}Zhu, Y.-K., Yong, J.-H., and Zheng, G.-Q. 2005. A new distillation algorithm for floating-point summation. SIAM J. Sci. Comput. 26, 2066--2078.

Cited By

View all
  • (2024)Multiscale jump testing and estimation under complex temporal dynamicsBernoulli10.3150/23-BEJ167730:3Online publication date: 1-Aug-2024
  • (2024)Extension of accurate numerical algorithms for matrix multiplication based on error-free transformationJapan Journal of Industrial and Applied Mathematics10.1007/s13160-024-00677-zOnline publication date: 29-Oct-2024
  • (2022)Toward Accurate and Fast SummationACM Transactions on Mathematical Software10.1145/354448848:3(1-39)Online publication date: 10-Sep-2022
  • Show More Cited By

Index Terms

  1. Algorithm 908: Online Exact Summation of Floating-Point Streams

    Recommendations

    Reviews

    Kai Diethelm

    At the innermost core of many algorithms in scientific computing is the task of summing up many floating-point numbers. Therefore, it is important to have algorithms that can handle this problem fast and accurately. A key issue in this context is the propagation of rounding errors. Zhu and Hayes' algorithm, OnlineExactSum, presents an impressive solution to this problem. As its name suggests, it is possible to prove that the algorithm computes the correctly rounded value of the exact sum of the input data. Moreover, the input data stream can be arbitrarily long, whereas the algorithm only needs to see one input at a time. The only restriction imposed is that the intermediate results are not allowed to contain overflows. Under this assumption, the authors prove the correctness of their method and demonstrate that it runs very fast; it only requires five floating-point operations per summand. The use of instruction-level parallelism then leads to a runtime that exceeds the potentially highly inaccurate ordinary recursive summation (which requires just one floating-point operation per summand) only by a factor in the range of two to three. The method is based on HybridSum, the authors' previous algorithm [1], which is based on an idea of Demmel and Hida [2] to use extra precision accumulators to sum floating-point numbers with identical exponents, and to provide a fast and accurate method (iFastSum) to compute a correctly rounded sum of these accumulators. The new idea is to double the number of accumulators, thus avoiding the necessity to split the input data. The algorithm is not only fast, but its memory requirements are also very modest and, in particular, independent of the length of the input stream. A nice side effect of the fact that the method always produces the correct result is that it always produces the same result, no matter the order in which the summands are processed. This is important, for example, in parallel computations such as matrix/vector multiplications or scalar products, where each processor handles a part of the vectors or matrices under consideration and different runs of the program with identical input data may, due to different workloads of the processors, lead to intermediate results arriving out of order. Less sophisticated algorithms will then lead to rounding errors being propagated in different ways and, hence, to the loss of reproducibility of the results. Readers will find that this algorithm has many applications in the field of scientific computing. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Mathematical Software
    ACM Transactions on Mathematical Software  Volume 37, Issue 3
    September 2010
    296 pages
    ISSN:0098-3500
    EISSN:1557-7295
    DOI:10.1145/1824801
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2010
    Accepted: 01 April 2010
    Revised: 01 December 2009
    Received: 01 April 2009
    Published in TOMS Volume 37, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Floating-point summation
    2. rounding error

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multiscale jump testing and estimation under complex temporal dynamicsBernoulli10.3150/23-BEJ167730:3Online publication date: 1-Aug-2024
    • (2024)Extension of accurate numerical algorithms for matrix multiplication based on error-free transformationJapan Journal of Industrial and Applied Mathematics10.1007/s13160-024-00677-zOnline publication date: 29-Oct-2024
    • (2022)Toward Accurate and Fast SummationACM Transactions on Mathematical Software10.1145/354448848:3(1-39)Online publication date: 10-Sep-2022
    • (2021)Verified Numerical Computations for Searching for Vectors with the Maximum SumJournal of Advanced Simulation in Science and Engineering10.15748/jasse.8.538:1(53-72)Online publication date: 2021
    • (2020)Order Matters: A Case Study on Reducing Floating Point Error in Sums Via Ordering and Grouping2020 IEEE/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness51934.2020.00007(10-19)Online publication date: Nov-2020
    • (2020)A note on Dekker’s FastTwoSum algorithmNumerische Mathematik10.1007/s00211-020-01114-2145:2(383-403)Online publication date: 24-Apr-2020
    • (2019)Hierarchical approach for deriving a reproducible unblocked LU factorizationInternational Journal of High Performance Computing Applications10.1177/109434201983296833:5(791-803)Online publication date: 1-Sep-2019
    • (2018)Fast and Stable Multivariate Kernel Density Estimation by Fast Sum UpdatingJournal of Computational and Graphical Statistics10.1080/10618600.2018.1549052(1-27)Online publication date: 28-Nov-2018
    • (2017)Parallel Online Exact Summation of Floating-point Numbers by Applying MapReduce of Java8International Journal of Software Innovation10.4018/IJSI.20170401025:2(17-32)Online publication date: 1-Apr-2017
    • (2017)Reproducible, Accurately Rounded and Efficient BLASEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_49(609-620)Online publication date: 28-May-2017
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media