research-article

Open access

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

Authors:

Hongjian LiAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 3

Article No.: 12, Pages 1 - 22

https://doi.org/10.1145/2512430

Published: 16 September 2013 Publication History

Abstract

In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme for polynomials are used to improve the performance. A two-level VLIW instruction RAM scheme is introduced to achieve high scalability and customizability, even for more complex key program kernels. Finally, the Quad arithmetic accelerator (QAA) with the QP_VELP array is implemented on ASIC. Compared with hyper-thread software implementation on an Intel Xeon E5620, QAA with 8 QP_VELP units achieves improvement by a factor of 18X.

Supplementary Material

a12-lei-apndx.pdf (lei.zip)

Supplemental movie, appendix, image and software files for, VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

Download
492.35 KB

References

[1]

Akkas, A. 2008. Dual-mode floating-point adder architectures. J. Syst. Archit. Embed. Syst. Des. 54, 12, 1129--1142.

Digital Library

[2]

Akkas, A. and Schulte, M. J. 2006. Dual-mode floating-point multiplier architectures with parallel operations. J. Syst. Archit. 52, 10, 549--562.

Digital Library

[3]

Bailey, D. H. 2005. High-precision floating-point arithmetic in scientific computation. Comput. Sci. Engin. 7, 3, 54--61.

Digital Library

[4]

Brent, R. P. and Zimmermann, P. 2010. Modern Computer Arithmetic. Cambridge University Press.

Digital Library

[5]

Brodtkorb, A. R., Dyken, C., Hagen, T. R., Hjelmervik, J. M., and Storaasli, O. O. 2010. State-of-the-art in heterogeneous computing. Sci. Program. 18, 1, 1--33.

Digital Library

[6]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2^nd ed. The MIT Press.

Digital Library

[7]

Detrey, J., Dinechin, F. D., and Pujol, X. 2007. Return of the hardware floating-point elementary function. In Proceeding of the 18^th IEEE Symposium on Computer Arithmetic (ARITH'07). 161--168.

Digital Library

[8]

Dou, Y. 2005. 64-bit floating-point fpga matrix multiplication. In Proceedings of the 13^th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'05). 86--95.

Digital Library

[9]

Dou, Y., Lei, Y., Wu, G., Guo, S., Zhou, J., and Shen, L. 2010. FPGA accelerating double/quad-double high precision floating-point application for exascale computing. In Proceedings of the 24^th ACM International Conference on Supercomputing (ICS'10). 325--335.

Digital Library

[10]

Ercegovac, M. 1973. Radix-16 evaluation of certain elementary functions. IEEE Trans. Comput. 22, 6, 561--566.

Digital Library

[11]

Fisher, J. A. 1983. Very long instruction word architectures and the eli-52. In Proceedings of the 10^th Symposium on Computer Architecture (ISCA'83). 140--150.

Digital Library

[12]

Fousse, L., Hanrot, G., Lefevre, V., Pelissier, P., and Zimmermann, P. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33, 2, 1--15.

Digital Library

[13]

Gok, M. and Ozbilen, M. M. 2008. Multi-functional floating-point maf designs with dot product support. Microelectron. J. 39, 1, 30--43.

Digital Library

[14]

Huang, L., Ma, S., Shen, L., Wang, Z., and Xiao, N. 2012. Low cost binary128 floating-point fma unit design with simd support. IEEE Trans. Comput. 61, 5, 745--751.

Digital Library

[15]

Ieee. 2008. Standard for binary floating point arithmetic ANSI/IEEE standard 754-2008. http://grouper.ieee.org/groups/754/.

[16]

Intel. 2012. Intel compilers and libraries. http://software.intel.com/enus/articles/intel-compilers/.

[17]

Kaseridis, D., Stuecheli, J., and John, L. K. 2011. Minimalist open-page: A dram page-mode scheduling policy for the many-core era. In Proceedings of the 44^th Annual IEEE/ACM International Symposium on Microarchitecture. 24--35.

Digital Library

[18]

Lefevre, V., Muller, J. M., and Tisserand, A. 1998. Towards correctly rounded transcendentals. IEEE Trans. Comput. 47, 11, 1235--1243.

Digital Library

[19]

Lefevre, V., Muller, J. M., and Tisserand, A. 2001.Worst cases for correct rounding of the elementary functions in double precision. In Proceedings of the 15^th IEEE Symposium on Computer Arithmetic (ARITH'01).

Digital Library

[20]

Lei, Y., Dou, Y., Shen, L., Zhou, J., and Guo, S. 2011. Special-purposed vliw architecture for ieee-754 quadruple precision elementary functions on fpga. In Proceedings of the 29^th IEEE International Conference on Computer Design (ICCD'11). 219--225.

Digital Library

[21]

Muller, J. M. 2006. Elementary Functions: Algorithms and Implementation, 2^nd ed. Birkhauser, Basel, Switzerland.

Digital Library

[22]

Oberman, S. F. and Flynn, M. J. 1995. Implementing division and other floating-point operations: A system perspective. In Proceedings of the Conference on Scientific Computing and Validated Numerics (SCAN'95). 18--24.

[23]

Paul, G. and Wilson, M. W. 1976. Should the elementary functions be incorporated into computer instruction sets&quest; ACM Trans. Math. Softw. 2, 2, 132--142.

Digital Library

[24]

Raney, R. K., Runge, H., Bamler, R., and Wong, F. H. 1994. Precision sar processing using chirp scaling. IEEE Trans. Geosci. Remote Sensing 32, 4, 786--799.

[25]

Schwarz, E., Smith, R., and Krygowski C. 1999. The s/390g5 floating point unit supporting hex and binary architecture. In Proceedings of the 13^th IEEE Symposium on Computer Arithmetic (ARITH'99). 258--265.

Digital Library

[26]

Vallado, D. A., Crawford, P., Hujsak, R., and Kelso, T. S. 2006. Revisiting space track report &num;3. In Proceedings of the AIAAAstro Dynamics Specialists Conference and Exhibit.

[27]

Wilkes, M. V. 1951. The best way to design an automatic computing machine. In Report of Manchester University Computer Inaugural Conference.

[28]

Wrathall, C. and Chen, T. C. 1978. Convergence guarantee and improvements for a hardware exponential and logarithm evaluation scheme. In Proceedings of the 4^th Symposium on Computer Arithmetic (ARITH'78). 175--182.

[29]

Zhou, J., Dou, Y., Lei, Y., and Xu, J. 2008. Double precision hybrid-mode floating-point fpga cordic co-processor. In Proceedings of the IEEE International Conference on High Performance Computing and Communications (HPCC'08).

Digital Library

Cited By

Chen XLei YLu ZChen S(2018)A Variable-Size FFT Hardware Accelerator Based on Matrix TranspositionIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.284668826:10(1953-1966)Online publication date: Oct-2018
https://doi.org/10.1109/TVLSI.2018.2846688
Mora HMora-Pascual JGarcía-Chamizo JSignes-Pont M(2017)Mathematical model and implementation of rational processingJournal of Computational and Applied Mathematics10.1016/j.cam.2016.05.001309:C(575-586)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1016/j.cam.2016.05.001

Index Terms

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions
1. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits

Recommendations

FPGA Implementation of a Special-Purpose VLIW Structure for Double-Precision Elementary Function

In the current article, the capability and flexibility of field programmable gate-arrays (FPGAs) to implement IEEE-754 double-precision floating-point elementary functions are explored. To perform various elementary functions on the unified hardware ...
Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA
ICCD '11: Proceedings of the 2011 IEEE 29th International Conference on Computer Design

This work explores the feasibility to implement IEEE-754-2008 standard quadruple precision (Quad) elementary functions on recent FPGAs with plenty of embedded memories and DSP blocks. First, we analysis the implementation algorithm of Quad elementary ...
Implementation and evaluation of quadruple precision BLAS functions on GPUs
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I

We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 3

September 2013

310 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2509420

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2013

Accepted: 01 April 2013

Revised: 01 March 2013

Received: 01 January 2013

Published in TACO Volume 10, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
535
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)11

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen XLei YLu ZChen S(2018)A Variable-Size FFT Hardware Accelerator Based on Matrix TranspositionIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.284668826:10(1953-1966)Online publication date: Oct-2018
https://doi.org/10.1109/TVLSI.2018.2846688
Mora HMora-Pascual JGarcía-Chamizo JSignes-Pont M(2017)Mathematical model and implementation of rational processingJournal of Computational and Applied Mathematics10.1016/j.cam.2016.05.001309:C(575-586)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1016/j.cam.2016.05.001

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents