Abstract
FPGAs have been widely used for accelerating various applications. For many data intensive applications, the memory bandwidth can limit the performance. 3D memories with through-silicon-via connections provide potential solutions to the latency and bandwidth issues. In this paper, we revisit the classic 2D FFT problem to evaluate the performance of 3D memory integrated FPGA. To fully utilize the fine grained parallelism in 3D memory, optimal data layouts so as to effectively utilize the peak bandwidth of the device are needed. Thus, we propose dynamic data layouts specifically for optimizing the performance of the 3D architecture. In 2D FFT, data is accessed in row major order in the first phase whereas, the data is accessed in column major order in the second phase. This column major order results in high memory latency and low bandwidth due to high row activation overhead of memory. Therefore, we develop dynamic data layouts to improve memory access performance in the second phase. With parallelism employed in the third dimension of the memory, data parallelism can be increased to further improve the performance. We adopt a model based approach for 3D memory and we perform experiments on the FPGA to validate our analysis and evaluate the performance. Our experimental results demonstrate up to 40x peak memory bandwidth utilization for column-wise FFT, thus resulting in approximately 97 % improvement in throughput for the complete 2D FFT application, compared to the baseline architecture.
This material is based in part upon work supported by the National Science Foundation under Grant Number ACI-1339756.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Virtex-7 FPGA Family. http://www.xilinx.com/products/virtex7
Akin, B., Milder, P., Franchetti, F., Hoe, J.: Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes. In: 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 188–191. IEEE, April 2012
Chen, R., Park, N., Prasanna, V.K.: High throughput energy efficient parallel FFT architecture on FPGAs. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)
Chen, R., Prasanna, V.K.: Energy-efficient architecture for stride permutation on streaming data. In: International Conference on Reconfigurable Computing and FPGAs, pp. 1–7 (2013)
Chen, R., Prasanna, V.K.: Energy efficient parameterized FFT architecture. In: International Conference on Field-programmable Logic and Application. pp. 1–7. IEEE (2013)
Chen, R., Prasanna, V.K.: DRAM row activation energy optimization for stride memory access on FPGA-based systems. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) Applied Reconfigurable Computing. LNCS, vol. 9040, pp. 349–356. Springer, Switzerland (2015)
Chen, R., Prasanna, V.K.: Energy and memory efficient bitonic sorting on FPGA. In: International Symposium on Field-Programmable Gate Arrays, pp. 45–54. ACM/SIGDA (2015)
Gadfort, P., Dasu, A., Akoglu, A., Leow, Y.K., Fritze, M.: A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM. In: International Conference on System-on-Chip Conference (SOCC), pp. 11–16. IEEE (2014)
Hybrid Memory Cube Consortium: Hybrid Memory Cube Specification. http://hybridmemorycube.org/files/SiteDownloads/HMC_Specification%201_0.pdf
Kim, J.S., Yu, C.L., Deng, L., Kestur, S., Narayanan, V., Chakrabarti, C.: FPGA architecture for 2D discrete fourier transform based on 2D decomposition for large-sized data. In: IEEE Workshop on Signal Processing Systems, pp. 121–126. IEEE, October 2009
Langemeyer, S., Pirsch, P., Blume, H.: Using SDRAMs for two-dimensional accesses of long \(2^{n}\times 2^{m}\)-point FFTs and transposing. In: International Conference on Embedded Computer Systems (SAMOS), pp. 242–248. IEEE, July 2011
Park, N., Prasanna, V.: Dynamic data layouts for cache-conscious implementation of a class of signal transforms. IEEE Trans. Signal Process. 52(7), 2120–2134 (2004)
Singapura, S.G., Panangadan, A., Prasanna, V.K.: Performance modeling of matrix multiplication on 3D memory integrated FPGA. In: 22nd Reconfigurable Architectures Workshop, IPDPDS. IEEE (2015) (to appear)
Singapura, S.G., Panangadan, A., Prasanna, V.K.: Towards performance modeling of 3D memory integrated FPGA architectures. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) Applied Reconfigurable Computing. LNCS, vol. 9040, pp. 443–450. Springer, Switzerland (2015)
Wang, W., Duan, B., Zhang, C., Zhang, P., Sun, N.: Accelerating 2D FFT with non-power-of-two problem size on FPGA. In: International Conference on Reconfigurable Computing and FPGAs, pp. 208–213. IEEE, December 2010
Wu, H., Paoloni, F.: The structure of vector radix fast fourier transforms. IEEE Trans. Acoust. Speech Signal Process. 37(9), 1415–1424 (1989)
Zhu, Q., Akin, B., Sumbul, H.E., Sadi, F., Hoe, J.C., Pileggi, L., Franchetti, F.: A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: IEEE International Conference on 3D Systems Integration Conference (3DIC). pp. 1–7. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, R., Singapura, S.G., Prasanna, V.K. (2015). Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2015. Lecture Notes in Computer Science(), vol 9251. Springer, Cham. https://doi.org/10.1007/978-3-319-21909-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-21909-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21908-0
Online ISBN: 978-3-319-21909-7
eBook Packages: Computer ScienceComputer Science (R0)