Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297663.3310305acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures

Published: 04 April 2019 Publication History

Abstract

Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs and FPGAs for improved performance and energy efficiency. At the same time, programmability is also improving with High Level Synthesis tools (e.g., OpenCL Software Development Kits), which allow programmers to express their designs with high-level programming languages, and avoid time-consuming and error-prone register-transfer level (RTL) programming. In the traditional loosely-coupled accelerator mode, FPGAs work as offload accelerators, where an entire kernel runs on the FPGA while the CPU thread waits for the result. However, tighter integration of the CPUs and the FPGAs enables the possibility of fine-grained collaborative execution, i.e., having both devices working concurrently on the same workload. Such collaborative execution makes better use of the overall system resources by employing both CPU threads and FPGA concurrency, thereby achieving higher performance. In this paper, we explore the potential of collaborative execution between CPUs and FPGAs using OpenCL High Level Synthesis. First, we compare various collaborative techniques (namely, data partitioning and task partitioning), and evaluate the tradeoffs between them. We observe that choosing the most suitable partitioning strategy can improve performance by up to 2x. Second, we study the impact of a common optimization technique, kernel duplication, in a collaborative CPU-FPGA context. We show that the general trend is that kernel duplication improves performance until the memory bandwidth saturates. Third, we provide new insights that application developers can use when designing CPU-FPGA collaborative applications to choose between different partitioning strategies. We find that different partitioning strategies pose different tradeoffs (e.g., task partitioning enables more kernel duplication, while data partitioning has lower communication overhead and better load balance), but they generally outperform execution on conventional CPU-FPGA systems where no collaborative execution strategies are used. Therefore, we advocate even more integration in future heterogeneous CPU-FPGA systems (e.g., OpenCL 2.0 features, such as fine-grained shared virtual memory).

References

[1]
Erich Strohmaier, Jack Dongarra, Simon Horst, and Martin Meuer. Top500 List June 2018.
[2]
Feng Wu and Tom Scogland. Green500 List June 2018.
[3]
RightScale. Rightscale 2018 state of the cloud report.
[4]
Intel. Intel FPGA SDK for OpenCL. Programming Guide, October 2016.
[5]
Xilinx. SDAccel Development Environment. https://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.
[6]
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In FPGA, 2016.
[7]
Sai Rahul Chalamalasetti, Martin Margala, Wim Vanderbauwhede, Mitch Wright, and Parthasarathy Ranganathan. Evaluating FPGA-acceleration for real-time unstructured search. In ISPASS, 2012.
[8]
D. Chen, J. Cong, Y. Fan, and L. Wan. LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010.
[9]
Amazon EC2 F1 instances. https://aws.amazon.com/ec2/instance-types/f1/, 2018.
[10]
Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. A cloud-scale acceleration architecture. In MICRO, 2016.
[11]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA, 2014.
[12]
New OpenPOWER cloud boosts ecosystem for innovation and development. http://www-03.ibm.com/press/us/en/pressrelease/47082.wss, 2015.
[13]
Intel. Intel Deep Learning Inference Accelerator Product Specification and User's Guide. https://www.intel.com/content/dam/support/us/en/documents/server-products/server-accessories/Intel_DLIA_UserGuide_1.0.pdf, July 2017.
[14]
The first chip from Intel's Altera buy will be out in 2016. http://fortune.com/2015/11/18/intel-xeon-fpga-chips/, 2015.
[15]
Doug Burger. Microsoft unveils Project Brainwave for real-time AI. Microsoft Research, 2017.
[16]
K. Rupnow, Y. Liang, Y. Li, D. Min, M. Do, and D. Chen. High level synthesis of stereo matching: Productivity, performance, and software constraints. In FPT, 2011.
[17]
S. Liu, A. Papakonstantinou, H. Wang, and D. Chen. Real-time object tracking system on fpgas. In SAAHPC, 2011.
[18]
Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rupnow, Wen-mei W. Hwu, and Deming Chen. Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pages 275--284, New York, NY, USA, 2017. ACM.
[19]
X. Zhang, X. Liu, A. Ramachandran, C. Zhuge, S. Tang, P. Ouyang, Z. Cheng, K. Rupnow, and D. Chen. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In FPL, 2017.
[20]
Dimitrios Ziakas, Allen Baum, Robert A Maddox, and Robert J Safranek. Intel® quickpath interconnect architectural features supporting scalable system architectures. In HOTI, 2010.
[21]
HyperTransport Technology Consortium et al. Hypertransport i/o link specification. Revision, 1:111--118, 2008.
[22]
Altera. Accelerating High-Performance Computing With FPGAs. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01029.pdf.
[23]
Accelerator Coherency Port. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0434a/BABGHDHD.html.
[24]
AXI Coherency Extensions. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABIAFAJ.html.
[25]
Arm CoreLink Interconnect. https://developer.arm.com/products/system-ip/corelink-interconnect.
[26]
Jeffrey Stuecheli, Bart Blaner, CR Johns, and MS Siegel. Capi: A coherent accelerator processor interface. IBM Journal of Research and Development, 59(1):7--1, 2015.
[27]
Cache Coherent Interconnect for Accelerators (CCIX). http://www.ccixconsortium.com, 2016.
[28]
Xilinx. Zynq UltraScale
[29]
MPSoCs. White Paper, June 2016.
[30]
Altera. Altera's User-Customizable ARM-Based SoC, 2015.
[31]
Mark Hummel, Mike Krause, and Douglas O'Flaherty. AMD and HP: Protocol enhancements for tightly coupled accelerators. 2007.
[32]
Werner Augustin, Vincent Heuveline, and Jan-Philipp Weiss. Convey HC-1 -- the potential of FPGAs in numerical simulation. Preprint Series of the Engineering Mathematics and Computing Lab, (07), 2010.
[33]
Convey Computer. The Convey HC-2 computer. Architectural overview, 2012.
[34]
Juan Gómez-Luna, Izzat El Hajj, Li-Wen Chang, Victor Garcia-Flores, Simon Garcia de Gonzalo, Thomas Jablin, Antonio J Pena, and Wen-mei Hwu. Chai: Collaborative heterogeneous applications for integrated-architectures. In ISPASS, 2017.
[35]
Terasic. DE5-Net User Manual, 2018.
[36]
Nallatech. Nallatech 510T Product Brief, 2018.
[37]
Intel. Intel Stratix V FPGAs. https://www.intel.com/content/www/us/en/products/programmable/fpga/stratix-v.html.
[38]
Intel. Intel Arria 10 FPGAs. https://www.intel.com/content/www/us/en/products/programmable/fpga/arria-10.html.
[39]
Intel. Intel Xeon Processor E3--1240 v3. https://ark.intel.com/products/75055/Intel-Xeon-Processor-E3--1240-v3--8M-Cache-3--40-GHz-.
[40]
Intel. Intel Xeon Processor E5--2650 v3. https://ark.intel.com/products/81705/Intel-Xeon-Processor-E5--2650-v3--25M-Cache-2--30-GHz-.
[41]
John Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986.
[42]
Martin A. Fischler and Robert C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
[43]
Juan Gómez-Luna, Holger Endt, Walter Stechele, José María González-Linares, José Ignacio Benavides, and Nicolás Guil. Egomotion compensation and moving objects detection algorithm on GPU. In PARCO, 2011.
[44]
Rafael Palomar, Juan Gómez-Luna, Faouzi A. Cheikh, Joaquín Olivares-Bueno, and Ole J. Elle. High-performance computation of bézier surfaces on parallel and heterogeneous platforms. International Journal of Parallel Programming, 2018.
[45]
J. Gómez-Luna, J.M. González-Linares, J.I. Benavides, and N. Guil. An optimized approach to histogram computation on GPU. Machine Vision and Applications, 2013.
[46]
J. Gómez-Luna, J.M. Gónzalez-Linares, J.I. Benavides, and N. Guil. Performance modeling of atomic additions on GPU scratchpad memory. IEEE Transactions on Parallel and Distributed Systems, 2013.
[47]
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX SECURITY, 2007.
[48]
O. Mutlu and T. Moscibroda. Stall-Time Fair Memory access scheduling for chip multiprocessors. In MICRO, 2007.
[49]
L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013.
[50]
L. Subramanian, V. Seshadri, A. Ghosh, S. Khan, and O. Mutlu. The Application Slowdown Model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In MICRO, 2015.
[51]
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread Cluster Memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.
[52]
Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.
[53]
O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008.
[54]
Khronos group. The OpenCL specification. Version 2.0, 2015.
[55]
M. Gupta, D. Das, P. Raghavendra, T. Tye, L. Lobachev, A. Agarwal, and R. Hegde. Implementing cross-device atomics in heterogeneous processors. In IPDPS Workshops, 2015.
[56]
Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In DAC, 2016.
[57]
Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Fleming, Eriko Nurvitadhi, and James C. Hoe. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In FPGA, 2016.
[58]
M.-C. F. Chang, Y.-T. Chen, J. Cong, P.-T. Huang, C.-L. Kuo, and C. H. Yu. The SMEM seeding acceleration for DNA sequence alignment. In FCCM, 2016.
[59]
Z. István, D. Sidler, and G. Alonso. Runtime parameterizable regular expression operators for databases. In FCCM, 2016.
[60]
Chi Zhang, Ren Chen, and Viktor Prasanna. High throughput large scale sorting on a CPU-FPGA heterogeneous platform. In IPDPS, 2016.
[61]
Weikang Qiao, Jieqiong Du, Zhenman Fang, Michael Lo, Mau-Chung Frank Chang, and Jason Cong. High-throughput lossless compression on tightly coupled CPU-FPGA platforms. In FPGA, 2018.
[62]
David Sidler, Zsolt István, Muhsen Owaida, and Gustavo Alonso. Accelerating pattern matching queries in hybrid CPU-FPGA architectures. In SIGMOD, 2017.
[63]
Herman Schmit and Randy Huang. Dissecting Xeon
[64]
FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter. In ISLPED, 2016.
[65]
N. Chandramoorthy, G. Tagliavini, K. Irick, A. Pullini, S. Advani, S. A. Habsi, M. Cotter, J. Sampson, V. Narayanan, and L. Benini. Exploring architectural heterogeneity in intelligent vision systems. In HPCA, 2015.
[66]
J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, H. Huang, and G. Reinman. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED, 2013.
[67]
E. G. Cota, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In DAC, 2015.
[68]
H. Usui, L. Subramanian, K. Chang, and O. Mutlu. DASH: Deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM TACO, 2016.
[69]
Geoffrey Ndu, Javier Navaridas, and Mikel Luján. CHO: Towards a benchmark suite for OpenCL FPGA accelerators. In IWOCL, 2015.
[70]
Verma Anshuman, Ahmed E Helal, Konstantinos Krommydas, and Wu-chun Feng. Accelerating workloads on FPGAs via OpenCL: A case study with OpenDwarfs. Virginia Tech CS Tech. Rep., 2016.
[71]
Nadesh Ramanathan, John Wickerson, Felix Winterstein, and George A Constantinides. A case for work-stealing on FPGAs with OpenCL atomics. In FPGA, 2016.
[72]
Zeke Wang, Bingsheng He, Wei Zhang, and Shunning Jiang. A performance analysis framework for optimizing OpenCL applications on FPGAs. In HPCA, 2016.
[73]
S. Sridharan, P. Durante, C. Faerber, and N. Neufeld. Accelerating particle identification for high-speed data-filtering using OpenCL on FPGAs and other architectures. In FPL, 2016.
[74]
Z. Wang, J. Paul, H. Y. Cheah, B. He, and W. Zhang. Relational query processing on OpenCL-based FPGAs. In FPL, 2016.
[75]
J. Shen, A. L. Varbanescu, Y. Lu, P. Zou, and H. Sips. Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems, 2016.
[76]
C. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO, 2009.
[77]
Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, and David Kaeli. Hetero-mark, a benchmark suite for CPU-GPU collaborative computing. In IISWC, 2016.
[78]
Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, and David Kaeli. A comprehensive performance analysis of HSA and OpenCL 2.0. In ISPASS, 2016.
[79]
Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Nina Paravecino, and David Kaeli. Exploring the features of OpenCL 2.0. In IWOCL, 2015.
[80]
Li-Wen Chang, Juan Gómez-Luna, Izzat El Hajj, Sitao Huang, Deming Chen, and Wen-mei Hwu. Collaborative computing for heterogeneous integrated systems. In ICPE, 2017.
[81]
Matthew D Sinclair, Johnathan Alsop, and Sarita V Adve. HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs. In IISWC, 2017.
[82]
Yifan Sun, Saoni Mukherjee, Trinayan Baruah, Shi Dong, Julian Gutierrez, Prannoy Mohan, and David Kaeli. Evaluating performance tradeoffs on the radeon open compute platform. In ISPASS, 2018.
[83]
J. Gómez-Luna, I.-J. Sung, A.J. Lázaro-Mu noz, W.-H. Chung, J.M. González-Linares, and N. Guil. Chapter 8 - Application use cases: Platform atomics. In Heterogeneous System Architecture. 2016.
[84]
Wen-mei W. Hwu. Heterogeneous System Architecture: A New Compute Platform Infrastructure. 2015.
[85]
Shuai Che, Marc Orr, and Jonathan Gallmeier. Work stealing in a shared virtual-memory heterogeneous environment: A case study with betweenness centrality. In CF, 2017.
[86]
Shanjiang Tang, BingSheng He, Shuhao Zhang, and Zhaojie Niu. Elastic multi-resource fairness: balancing fairness and efficiency in coupled CPU-GPU architectures. In SC, 2016.
[87]
Feng Zhang, Bo Wu, Jidong Zhai, Bingsheng He, and Wenguang Chen. Finepar: Irregularity-aware fine-grained workload partitioning on integrated architectures. In CGO, 2017.
[88]
Younghyun Cho, Florian Negele, Seohong Park, Bernhard Egger, and Thomas R Gross. On-the-fly workload partitioning for integrated CPU/GPU architectures. In PACT, 2018.
[89]
Trinayan Baruah, Yifan Sun, Shi Dong, David Kaeli, and Norm Rubin. Airavat: Improving energy efficiency of heterogeneous applications. In DATE, 2018.
[90]
Trinayan Baruah. Energy efficient execution of heterogeneous applications. Master thesis. Northeastern University, 2017.
[91]
Adarsh Patil and Ramaswamy Govindarajan. HAShCache: Heterogeneity-aware shared DRAMCache for integrated heterogeneous systems. ACM TACO, 2017.
[92]
V. Garcia-Flores, J. Gómez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Pe na. Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. In IISWC, 2016.
[93]
R. Ausavarungnirun, K. Chang, L. Subramanian, G. Loh, and O. Mutlu. Staged Memory Scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA, 2012.
[94]
O. Kayiran, N. Chidambaram Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das. Managing GPU concurrency in heterogeneous architectures. In MICRO, 2014.
[95]
V. Garcia-Flores, E. Ayguade, and A. J. Pe na. Efficient data sharing on heterogeneous systems. In ICPP, 2017.
[96]
Johnathan Alsop, Matthew D Sinclair, and Sarita V Adve. Spandex: a flexible interface for efficient heterogeneous coherence. In ISCA, 2018.
[97]
Ján Veselỳ, Arkaprava Basu, Abhishek Bhattacharjee, Gabriel H Loh, Mark Oskin, and Steven K Reinhardt. Generic system calls for GPUs. In ISCA, 2018.
[98]
Arkaprava Basu, Joseph L Greathouse, Guru Venkataramani, and Ján Veselỳ. Interference from GPU system service requests. In IISWC, 2018.

Cited By

View all
  • (2024)HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00081(1012-1028)Online publication date: 2-Mar-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
  • (2023)Sailfish: Exploring Heterogeneous Query Acceleration on Discrete CPU-FPGA Architecture2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW58674.2023.00036(198-204)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering
April 2019
348 pages
ISBN:9781450362399
DOI:10.1145/3297663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. cpu-fpga architectures
  2. heterogeneous systems
  3. opencl
  4. performance analysis

Qualifiers

  • Research-article

Funding Sources

  • Huawei
  • Google
  • Applica- tions Driving Architectures (ADA) Research Center by SRC and DARPA
  • Hewlett Packard Labs
  • Intel
  • VMware
  • AliBaba

Conference

ICPE '19

Acceptance Rates

ICPE '19 Paper Acceptance Rate 13 of 71 submissions, 18%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00081(1012-1028)Online publication date: 2-Mar-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
  • (2023)Sailfish: Exploring Heterogeneous Query Acceleration on Discrete CPU-FPGA Architecture2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW58674.2023.00036(198-204)Online publication date: Apr-2023
  • (2023)Casper: Accelerating Stencil Computations Using Near-Cache ProcessingIEEE Access10.1109/ACCESS.2023.325200211(22136-22154)Online publication date: 2023
  • (2022)FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous ComputingElectronics10.3390/electronics1122375611:22(3756)Online publication date: 16-Nov-2022
  • (2022)TDLAS Tomography System for Online Imaging and Dynamic Process Playback of Temperature and Gas Mole FractionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2022.320866871(1-10)Online publication date: 2022
  • (2022)Analysis of Power Delivery Network (PDN) in Bridge-Chips for 2.5-D Heterogeneous IntegrationIEEE Transactions on Components, Packaging and Manufacturing Technology10.1109/TCPMT.2022.322368712:11(1824-1831)Online publication date: Nov-2022
  • (2022)LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00080(499-508)Online publication date: Oct-2022
  • (2022)OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95391-1_48(772-791)Online publication date: 23-Feb-2022
  • (2021)Xar-trekProceedings of the 22nd International Middleware Conference10.1145/3464298.3493388(104-118)Online publication date: 6-Dec-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media