Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

FPGAs have been widely used for accelerating various applications. For many data intensive applications, the memory bandwidth limits the performance. 3D memories with through-silicon-via connections provide potential solutions to the latency and bandwidth limitations. In this paper, we revisit the classic 2D FFT problem to evaluate the performance of 3D memory integrated FPGA. To fully utilize the fine-grained parallelism in 3D memory, data layouts which take into account the structure and organization of the memory are required. We propose dynamic data layouts for optimizing the performance of the 3D architecture. In 2D FFT, data are accessed in row major order in the first phase, whereas the data are accessed in column major order in the second phase. This column major order results in high memory latency and low bandwidth due to high row activation overhead of memory. Using the proposed dynamic data layouts, we improve memory access performance in the second phase without degrading the performance of the first phase. With parallelism employed in the third dimension of the memory, data parallelism can be increased to further improve the performance. We adopt a model-based approach for 3D memory and we perform experiments on the FPGA to validate our analysis and evaluate the performance. Compared with the baseline architecture, our approach achieves up to \(40\times \) peak memory bandwidth utilization for columnwise FFT, thus resulting in approximately \(97\,\,\%\) improvement in throughput for the complete 2D FFT application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Chen R, Prasanna VK (2015) Energy and memory efficient bitonic sorting on FPGA. In: Proc. of ACM/SIGDA FPGA, pp 45–54

  2. Chen R, Prasanna VK (2015) Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations. In: Proc. of IEEE Conference on Field Programmable Logic and Applications (FPL), pp 1–8. IEEE

  3. Akin B, Milder PA, Franchetti F, Hoe JC (2012) Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes. In: Proc. of IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM ’12), pp 188–191

  4. Chen R, Prasanna VK (2013) Energy efficient parameterized FFT architecture. In: Proc. of IEEE International Conference on FPL

  5. Kim JS, Yu C-L, Deng L, Kestur S, Narayanan V, Chakrabarti C (2009) FPGA Architecture for 2D Discrete Fourier Transform based on 2D decomposition for large-sized data. In: Proc. of IEEE Workshop on Signal Processing Systems, pp 121–126

  6. Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification. http://hybridmemorycube.org/files/SiteDownloads/HMC_Specification

  7. Park Neungsoo, Prasanna Viktor K (2004) Dynamic data layouts for cache-conscious implementation of a class of signal transforms. IEEE Trans Signal Process 52(7):2120–2134

    Article  Google Scholar 

  8. Wang W, Duan B, Zhang C, Zhang P, Sun N (2010) Accelerating 2D FT with non-power-of-two problem size on FPGA. In: Proc. of IEEE International Conference on Reconfigurable Computing and FPGAs (ReConFig ’10), pp 208–213

  9. Akin B, Franchetti F, Hoe JC (2014) Understanding the Design Space of Dram-optimized Hardware FFT Accelerators. In: Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on, pp 248–255. IEEE

  10. Wu Hong Ren, Paoloni Frank John (1989) The structure of vector radix fast Fourier transforms. IEEE Trans Acoust Speech Signal Process 37(9):1415–1424

    Article  MATH  Google Scholar 

  11. Zhu Q, Akin B, Sumbul HE, Sadi F, Hoe JC, Pileggi L, Franchetti F (2013) A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: Proc. of IEEE International Conference on 3D Systems Integration Conference (3DIC), pp 1–7. IEEE

  12. Gadfort P, Dasu A, Akoglu A, Leow YK, Fritze M (2014) A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM. In: Proc. of IEEE International Conference on System-on-Chip Conference (SOCC), pp 11–16. IEEE

  13. Singapura SG, Panangadan A, Prasanna VK (2015) Towards performance modeling of 3D memory integrated FPGA architectures. In: Proc. of International Conference on Applied Reconfigurable Computing

  14. Singapura SG, Panangadan A, Prasanna VK (2015) Performance modeling of matrix multiplication on 3D memory integrated FPGA. In: Proc. of 22nd Reconfigurable Architectures Workshop, IPDPDS

  15. Chen R, Prasanna VK (2013) Energy-efficient architecture for stride permutation on streaming data. In: Proc. of IEEE Conference on ReConFig, pp 1–7

  16. Chen R, Park N, Prasanna VK (2013) High throughput energy efficient parallel FFT architecture on FPGAs. In: Proc. of IEEE High Performance Extreme Computing Conference (HPEC), pp 1–6. IEEE

  17. Virtex-7 FPGA Family. http://www.xilinx.com/products/virtex7

  18. Chen R, Prasanna VK (2015) DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-based Systems. In: Proc. of International Conference on Applied Reconfigurable Computing

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ren Chen.

Additional information

This material was supported by the NSF under Grant Number ACI-1339756.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Singapura, S.G. & Prasanna, V.K. Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA. J Supercomput 73, 652–663 (2017). https://doi.org/10.1007/s11227-016-1772-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1772-1

Keywords

Navigation