Nothing Special   »   [go: up one dir, main page]

Skip to main content

Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9251))

Included in the following conference series:

  • 1024 Accesses

Abstract

FPGAs have been widely used for accelerating various applications. For many data intensive applications, the memory bandwidth can limit the performance. 3D memories with through-silicon-via connections provide potential solutions to the latency and bandwidth issues. In this paper, we revisit the classic 2D FFT problem to evaluate the performance of 3D memory integrated FPGA. To fully utilize the fine grained parallelism in 3D memory, optimal data layouts so as to effectively utilize the peak bandwidth of the device are needed. Thus, we propose dynamic data layouts specifically for optimizing the performance of the 3D architecture. In 2D FFT, data is accessed in row major order in the first phase whereas, the data is accessed in column major order in the second phase. This column major order results in high memory latency and low bandwidth due to high row activation overhead of memory. Therefore, we develop dynamic data layouts to improve memory access performance in the second phase. With parallelism employed in the third dimension of the memory, data parallelism can be increased to further improve the performance. We adopt a model based approach for 3D memory and we perform experiments on the FPGA to validate our analysis and evaluate the performance. Our experimental results demonstrate up to 40x peak memory bandwidth utilization for column-wise FFT, thus resulting in approximately 97 % improvement in throughput for the complete 2D FFT application, compared to the baseline architecture.

This material is based in part upon work supported by the National Science Foundation under Grant Number ACI-1339756.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Virtex-7 FPGA Family. http://www.xilinx.com/products/virtex7

  2. Akin, B., Milder, P., Franchetti, F., Hoe, J.: Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes. In: 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 188–191. IEEE, April 2012

    Google Scholar 

  3. Chen, R., Park, N., Prasanna, V.K.: High throughput energy efficient parallel FFT architecture on FPGAs. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)

    Google Scholar 

  4. Chen, R., Prasanna, V.K.: Energy-efficient architecture for stride permutation on streaming data. In: International Conference on Reconfigurable Computing and FPGAs, pp. 1–7 (2013)

    Google Scholar 

  5. Chen, R., Prasanna, V.K.: Energy efficient parameterized FFT architecture. In: International Conference on Field-programmable Logic and Application. pp. 1–7. IEEE (2013)

    Google Scholar 

  6. Chen, R., Prasanna, V.K.: DRAM row activation energy optimization for stride memory access on FPGA-based systems. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) Applied Reconfigurable Computing. LNCS, vol. 9040, pp. 349–356. Springer, Switzerland (2015)

    Google Scholar 

  7. Chen, R., Prasanna, V.K.: Energy and memory efficient bitonic sorting on FPGA. In: International Symposium on Field-Programmable Gate Arrays, pp. 45–54. ACM/SIGDA (2015)

    Google Scholar 

  8. Gadfort, P., Dasu, A., Akoglu, A., Leow, Y.K., Fritze, M.: A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM. In: International Conference on System-on-Chip Conference (SOCC), pp. 11–16. IEEE (2014)

    Google Scholar 

  9. Hybrid Memory Cube Consortium: Hybrid Memory Cube Specification. http://hybridmemorycube.org/files/SiteDownloads/HMC_Specification%201_0.pdf

  10. Kim, J.S., Yu, C.L., Deng, L., Kestur, S., Narayanan, V., Chakrabarti, C.: FPGA architecture for 2D discrete fourier transform based on 2D decomposition for large-sized data. In: IEEE Workshop on Signal Processing Systems, pp. 121–126. IEEE, October 2009

    Google Scholar 

  11. Langemeyer, S., Pirsch, P., Blume, H.: Using SDRAMs for two-dimensional accesses of long \(2^{n}\times 2^{m}\)-point FFTs and transposing. In: International Conference on Embedded Computer Systems (SAMOS), pp. 242–248. IEEE, July 2011

    Google Scholar 

  12. Park, N., Prasanna, V.: Dynamic data layouts for cache-conscious implementation of a class of signal transforms. IEEE Trans. Signal Process. 52(7), 2120–2134 (2004)

    Article  Google Scholar 

  13. Singapura, S.G., Panangadan, A., Prasanna, V.K.: Performance modeling of matrix multiplication on 3D memory integrated FPGA. In: 22nd Reconfigurable Architectures Workshop, IPDPDS. IEEE (2015) (to appear)

    Google Scholar 

  14. Singapura, S.G., Panangadan, A., Prasanna, V.K.: Towards performance modeling of 3D memory integrated FPGA architectures. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) Applied Reconfigurable Computing. LNCS, vol. 9040, pp. 443–450. Springer, Switzerland (2015)

    Google Scholar 

  15. Wang, W., Duan, B., Zhang, C., Zhang, P., Sun, N.: Accelerating 2D FFT with non-power-of-two problem size on FPGA. In: International Conference on Reconfigurable Computing and FPGAs, pp. 208–213. IEEE, December 2010

    Google Scholar 

  16. Wu, H., Paoloni, F.: The structure of vector radix fast fourier transforms. IEEE Trans. Acoust. Speech Signal Process. 37(9), 1415–1424 (1989)

    Article  MATH  Google Scholar 

  17. Zhu, Q., Akin, B., Sumbul, H.E., Sadi, F., Hoe, J.C., Pileggi, L., Franchetti, F.: A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: IEEE International Conference on 3D Systems Integration Conference (3DIC). pp. 1–7. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ren Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, R., Singapura, S.G., Prasanna, V.K. (2015). Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2015. Lecture Notes in Computer Science(), vol 9251. Springer, Cham. https://doi.org/10.1007/978-3-319-21909-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21909-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21908-0

  • Online ISBN: 978-3-319-21909-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics