Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

Published: 16 September 2013 Publication History

Abstract

In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme for polynomials are used to improve the performance. A two-level VLIW instruction RAM scheme is introduced to achieve high scalability and customizability, even for more complex key program kernels. Finally, the Quad arithmetic accelerator (QAA) with the QP_VELP array is implemented on ASIC. Compared with hyper-thread software implementation on an Intel Xeon E5620, QAA with 8 QP_VELP units achieves improvement by a factor of 18X.

Supplementary Material

a12-lei-apndx.pdf (lei.zip)
Supplemental movie, appendix, image and software files for, VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

References

[1]
Akkas, A. 2008. Dual-mode floating-point adder architectures. J. Syst. Archit. Embed. Syst. Des. 54, 12, 1129--1142.
[2]
Akkas, A. and Schulte, M. J. 2006. Dual-mode floating-point multiplier architectures with parallel operations. J. Syst. Archit. 52, 10, 549--562.
[3]
Bailey, D. H. 2005. High-precision floating-point arithmetic in scientific computation. Comput. Sci. Engin. 7, 3, 54--61.
[4]
Brent, R. P. and Zimmermann, P. 2010. Modern Computer Arithmetic. Cambridge University Press.
[5]
Brodtkorb, A. R., Dyken, C., Hagen, T. R., Hjelmervik, J. M., and Storaasli, O. O. 2010. State-of-the-art in heterogeneous computing. Sci. Program. 18, 1, 1--33.
[6]
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd ed. The MIT Press.
[7]
Detrey, J., Dinechin, F. D., and Pujol, X. 2007. Return of the hardware floating-point elementary function. In Proceeding of the 18th IEEE Symposium on Computer Arithmetic (ARITH'07). 161--168.
[8]
Dou, Y. 2005. 64-bit floating-point fpga matrix multiplication. In Proceedings of the 13th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'05). 86--95.
[9]
Dou, Y., Lei, Y., Wu, G., Guo, S., Zhou, J., and Shen, L. 2010. FPGA accelerating double/quad-double high precision floating-point application for exascale computing. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS'10). 325--335.
[10]
Ercegovac, M. 1973. Radix-16 evaluation of certain elementary functions. IEEE Trans. Comput. 22, 6, 561--566.
[11]
Fisher, J. A. 1983. Very long instruction word architectures and the eli-52. In Proceedings of the 10th Symposium on Computer Architecture (ISCA'83). 140--150.
[12]
Fousse, L., Hanrot, G., Lefevre, V., Pelissier, P., and Zimmermann, P. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33, 2, 1--15.
[13]
Gok, M. and Ozbilen, M. M. 2008. Multi-functional floating-point maf designs with dot product support. Microelectron. J. 39, 1, 30--43.
[14]
Huang, L., Ma, S., Shen, L., Wang, Z., and Xiao, N. 2012. Low cost binary128 floating-point fma unit design with simd support. IEEE Trans. Comput. 61, 5, 745--751.
[15]
Ieee. 2008. Standard for binary floating point arithmetic ANSI/IEEE standard 754-2008. http://grouper.ieee.org/groups/754/.
[16]
Intel. 2012. Intel compilers and libraries. http://software.intel.com/enus/articles/intel-compilers/.
[17]
Kaseridis, D., Stuecheli, J., and John, L. K. 2011. Minimalist open-page: A dram page-mode scheduling policy for the many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 24--35.
[18]
Lefevre, V., Muller, J. M., and Tisserand, A. 1998. Towards correctly rounded transcendentals. IEEE Trans. Comput. 47, 11, 1235--1243.
[19]
Lefevre, V., Muller, J. M., and Tisserand, A. 2001.Worst cases for correct rounding of the elementary functions in double precision. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic (ARITH'01).
[20]
Lei, Y., Dou, Y., Shen, L., Zhou, J., and Guo, S. 2011. Special-purposed vliw architecture for ieee-754 quadruple precision elementary functions on fpga. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD'11). 219--225.
[21]
Muller, J. M. 2006. Elementary Functions: Algorithms and Implementation, 2nd ed. Birkhauser, Basel, Switzerland.
[22]
Oberman, S. F. and Flynn, M. J. 1995. Implementing division and other floating-point operations: A system perspective. In Proceedings of the Conference on Scientific Computing and Validated Numerics (SCAN'95). 18--24.
[23]
Paul, G. and Wilson, M. W. 1976. Should the elementary functions be incorporated into computer instruction sets? ACM Trans. Math. Softw. 2, 2, 132--142.
[24]
Raney, R. K., Runge, H., Bamler, R., and Wong, F. H. 1994. Precision sar processing using chirp scaling. IEEE Trans. Geosci. Remote Sensing 32, 4, 786--799.
[25]
Schwarz, E., Smith, R., and Krygowski C. 1999. The s/390g5 floating point unit supporting hex and binary architecture. In Proceedings of the 13th IEEE Symposium on Computer Arithmetic (ARITH'99). 258--265.
[26]
Vallado, D. A., Crawford, P., Hujsak, R., and Kelso, T. S. 2006. Revisiting space track report #3. In Proceedings of the AIAAAstro Dynamics Specialists Conference and Exhibit.
[27]
Wilkes, M. V. 1951. The best way to design an automatic computing machine. In Report of Manchester University Computer Inaugural Conference.
[28]
Wrathall, C. and Chen, T. C. 1978. Convergence guarantee and improvements for a hardware exponential and logarithm evaluation scheme. In Proceedings of the 4th Symposium on Computer Arithmetic (ARITH'78). 175--182.
[29]
Zhou, J., Dou, Y., Lei, Y., and Xu, J. 2008. Double precision hybrid-mode floating-point fpga cordic co-processor. In Proceedings of the IEEE International Conference on High Performance Computing and Communications (HPCC'08).

Cited By

View all
  • (2018)A Variable-Size FFT Hardware Accelerator Based on Matrix TranspositionIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.284668826:10(1953-1966)Online publication date: Oct-2018
  • (2017)Mathematical model and implementation of rational processingJournal of Computational and Applied Mathematics10.1016/j.cam.2016.05.001309:C(575-586)Online publication date: 1-Jan-2017

Index Terms

  1. VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 3
    September 2013
    310 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2509420
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 September 2013
    Accepted: 01 April 2013
    Revised: 01 March 2013
    Received: 01 January 2013
    Published in TACO Volume 10, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. QP_VELP
    2. Quadruple precision floating-point
    3. elementary function
    4. very long instruction word

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)A Variable-Size FFT Hardware Accelerator Based on Matrix TranspositionIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.284668826:10(1953-1966)Online publication date: Oct-2018
    • (2017)Mathematical model and implementation of rational processingJournal of Computational and Applied Mathematics10.1016/j.cam.2016.05.001309:C(575-586)Online publication date: 1-Jan-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media