Abstract
Highly-tuned FPGA implementations can achieve significant performance and power efficiency gains over general purpose hardware. However the limited development productivity has prevented mainstream adoption of FPGAs in many areas such as High Performance Computing. High level standard development libraries are increasingly adopted in improving productivity. We propose an approach for performance critical applications including standard library modules, benchmarking facilities and application benchmarks to support a variety of use-cases. We implement the proposed approach as an open-source library for a commercially available FPGA system and highlight applications and productivity gains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Todman, T.J., Constantinides, G.A., Wilton, S.J., Mencer, O., Luk, W., Cheung, P.Y.: Reconfigurable computing: architectures and design methods. IEE Proc.-Comput. Digit. Tech. 152(2), 193–207 (2005)
Jones, D.H., Powell, A., Bouganis, C., Cheung, P.Y.: GPU versus FPGA for high productivity computing. In: Proceedings of the FPL, pp. 119–124 (2010)
Zhang, Z., Fan, Y., Jiang, W., Han, G., Yang, C., Cong, J.: AutoPilot: a platform-based ESL synthesis system. In: Coussy, P., Morawiec, A. (eds.) High-Level Synthesis, pp. 99–112. Springer, Heidelberg (2008)
Canis, A., Choi, J., Aldham, M., Zhang, V., Kammoona, A., Anderson, J.H., Brown, S., Czajkowski, T.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the FPGA, pp. 33–36. ACM (2011)
Kulkarni, C., Brebner, G., Schelle, G.: Mapping a domain specific language to a platform FPGA. In: Proceedings DAC, pp. 924–927. ACM (2004)
George, N., Lee, H., Novo, D., Rompf, T., Brown, K.J., Sujeeth, A.K., Odersky, M., Olukotun, K., Ienne, P.: Hardware system synthesis from domain-specific languages. In: Proceedings of the FPL, pp. 1–8. IEEE (2014)
Cong, J., Sarkar, V., Reinman, G., Bui, A.: Customizable domain-specific computing. IEEE Des. Test Comput. 28(2), 6–15 (2011)
Grigoras, P., Burovskiy, P., Luk, W.: CASK: open-source custom architectures for sparse kernels. In: Proceedings of the FPGA, pp. 179–184 (2016)
Grigoras, P., Burovskiy, P., Hung, E., Luk, W.: Accelerating SpMV on FPGAs by compressing nonzero values. In: Proceedings of the FCCM (2015)
Chow, G., Grigoras, P., Burovskiy, P., Luk, W.: An efficient sparse conjugate gradient solver using a benes permutation network. In: Proceedings of the FPL (2014)
Burovskiy, P., Grigoras, P., Sherwin, S.J., Luk, W.: Efficient assembly for high order unstructured FEM meshes. In: Proceedings of the FPL (2015)
Grigoras, P., Niu, X., Coutinho, J., Luk, W., Bower, J., Pell, O.: Aspect driven compilation for dataflow designs. In: Proceedings of the ASAP (2013)
Grigoras, P., Tottenham, M., Niu, X., Coutinho, J.G.F., Luk, W.: Elastic management of reconfigurable accelerators. In: Proceedings of the ISPA, pp. 174–181. IEEE (2014)
Coutinho, J.G.F., Pell, O., O’Neill, E., Sanders, P., McGlone, J., Grigoras, P., Luk, W., Ragusa, C.: HARNESS project: managing heterogeneous computing resources for a cloud platform. In: Goehringer, D., Santambrogio, M.D., Cardoso, J.M.P., Bertels, K. (eds.) ARC 2014. LNCS, vol. 8405, pp. 324–329. Springer, Heidelberg (2014). doi:10.1007/978-3-319-05960-0_36
Arram, J., Pflanzer, M., Kaplan, T., Luk, W.: FPGA acceleration of reference-based compression for genomic data. In: Proceedings of the ICFPT, pp. 9–16. IEEE (2015)
Arram, J., Luk, W., Jiang, P.: Ramethy: reconfigurable acceleration of bisulfite sequence alignment. In: Proceedings of the FPGA, pp. 250–259. ACM (2015)
Burovskiy, P., Girdlestone, S., Davies, C., Sherwin, S., Luk, W.: Dataflow acceleration of Krylov subspace sparse banded problems. In: Proceedings of the FPL, pp. 1–6. IEEE (2014)
Grigoras, P., Burovskiy, P., Luk, W., Sherwin, S.: Optimising sparse matrix vector multiplication for large scale FEM problems on FPGA. In: Proceedings of the FPL, pp. 1–9. EPFL (2016)
Xie, J., Niu, X., Lau, A.K., Tsia, K.K., So, H.K.: Accelerated cell imaging and classification on FPGAS for quantitative-phase asymmetric-detection time-stretch optical microscopy. In: Proceedings of the ICFPT, pp. 1–8. IEEE (2015)
Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Hardware acceleration of genetic sequence alignment. In: Brisk, P., Figueiredo Coutinho, J.G., Diniz, P.C. (eds.) ARC 2013. LNCS, vol. 7806, pp. 13–24. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36812-7_2
Lindtjrn, O., Clapp, R.G., Pell, O., Mencer, O., Flynn, M.J.: Surviving the end of scaling of traditional micro processors in HPC. In: IEEE HOT CHIPS 22 (2010)
Pell, O., Mencer, O.: Surviving the end of frequency scaling with reconfigurable dataflow computing. SIGARCH Comput. Archit. News 39(4), 60–65 (2011)
Morris, G.R., Zhuo, L., Prasanna, V.K.: High-performance FPGA-based general reduction methods. In: Proceedings of the FCCM, pp. 323–324 (2005)
Zhuo, L., Morris, G.R., Prasanna, V.K.: Designing scalable FPGA-based reduction circuits using pipelined floating-point cores. In: Proceedings of the ISPDP (2005)
Wilson, D., Stitt, G.: The unified accumulator architecture: a configurable, portable, and extensible floating-point accumulator. Trans. Reconfigurable Technol. Syst. (TRETS) 9(3), 21 (2016)
Zhuo, L., Morris, G.R., Prasanna, V.K.: High-performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Trans. PDS 18(10), 1377–1392 (2007)
Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 269–278. Society for Industrial and Applied Mathematics (2001)
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)
Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)
Zhang, Y., Li, L., Yang, Y., Yang, X., He, S., Zhu, Z.: Light-weight reference-based compression of FASTQ data. BMC Bioinform. 16(1), 1 (2015)
Burrows, M., Wheeler, D.J.: A Block-sorting Lossless Data Compression Algorithm (1994)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Mitchell, A.R., Griffiths, D.F.: The Finite Difference Method in Partial Differential Equations. Wiley, Hoboken (1980)
Thomas, D.B., Luk, W.: High quality uniform random number generation using LUT optimised state-transition matrices. Vlsi Sig. Process. 47(1), 77–92 (2007)
Acknowledgement
The support of UK EPSRC (EP/I012036/1, EP/L00058X/1, EP/L016796/1 and EP/N031768/1), the European Union Horizon 2020 Research and Innovation Programme under grant agreement number 671653, the Maxeler University Programme, Altera, Intel and Xilinx is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Grigoras, P. et al. (2017). dfesnippets: An Open-Source Library for Dataflow Acceleration on FPGAs. In: Wong, S., Beck, A., Bertels, K., Carro, L. (eds) Applied Reconfigurable Computing. ARC 2017. Lecture Notes in Computer Science(), vol 10216. Springer, Cham. https://doi.org/10.1007/978-3-319-56258-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-56258-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56257-5
Online ISBN: 978-3-319-56258-2
eBook Packages: Computer ScienceComputer Science (R0)