research-article

Dynamically Scheduled High-level Synthesis

Authors:

Lana Josipović,

Radhika Ghosal,

Paolo IenneAuthors Info & Claims

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 127 - 136

https://doi.org/10.1145/3174243.3174264

Published: 15 February 2018 Publication History

Abstract

High-level synthesis (HLS) tools almost universally generate statically scheduled datapaths. Static scheduling implies that circuits out of HLS tools have a hard time exploiting parallelism in code with potential memory dependencies, with control-dependent dependencies in inner loops, or where performance is limited by long latency control decisions. The situation is essentially the same as in computer architecture between Very-Long Instruction Word (VLIW) processors and dynamically scheduled superscalar processors; the former display the best performance per cost in highly regular embedded applications, but general purpose, irregular, and control-dominated computing tasks require the runtime flexibility of dynamic scheduling. In this work, we show that high-level synthesis of dynamically scheduled circuits is perfectly feasible by describing the implementation of a prototype synthesizer which generates a particular form of latency-insensitive synchronous circuits. Compared to a commercial HLS tool, the result is a different trade-off between performance and circuit complexity, much as superscalar processors represent a different trade-off compared to VLIW processors: in demanding applications, the performance is very significantly improved at an affordable cost. We here demonstrate only the first steps towards more performant high-level synthesis tools adapted to emerging FPGA applications and the demands of computing in broader application domains.

References

[1]

M. Alle, A. Morvan, and S. Derrien. Runtime dependency analysis for loop pipelining in high-level synthesis. In Proceedings of the 50th Design Automation Conference, pages 51:1--51:10, Austin, Tex., June 2013.

Digital Library

[2]

Amazon.com, Inc. Amazon EC2 F1 Instances.

[3]

M. Budiu, P. V. Artigas, and S. C. Goldstein. Dataflow: A complement to superscalar. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 177--86, Austin, Tex., Mar. 2005.

Digital Library

[4]

M. Budiu and S. C. Goldstein. Pegasus: An efficient intermediate representation. Technical Report Carnegie Mellon University-CS-02--107, Carnegie Mellon University, May 2002.

[5]

L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli. Theory of latencyinsensitive design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-20(9):1059--76, Sept. 2001.

Digital Library

[6]

J. Carmona, J. Júlvez, J. Cortadella, and M. Kishinevsky. A scheduling strategy for synchronous elastic designs. Journal Fundamenta Informaticae, 108(1--2):1--21, Jan. 2011.

Digital Library

[7]

A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A cloud-scale acceleration architecture. In Proceedings of the 49th International Symposium on Microarchitecture, pages 1--13, Taipei, Taiwan, Oct. 2016.

[8]

S. Chatterjee, M. Kishinevsky, and U. Y. Ogras. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Design & Test of Computers, 29(3):80--88, June 2012.

[9]

S. Cheng and J. Wawrzynek. Synthesis of statically analyzable accelerator networks from sequential programs. In Proceedings of the International Conference on Computer-Aided Design, pages 126--33, Austin, Tex., Nov. 2016.

Digital Library

[10]

D. Chiou. Intel acquires Altera: How will the world of FPGAs be affected? In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, page 148, Monterey, Calif., Feb. 2016.

Digital Library

[11]

J. Cortadella, M. Kishinevsky, and B. Grundmann. Synthesis of synchronous elastic architectures. In Proceedings of the 43rd Design Automation Conference, pages 657--62, San Francisco, Calif., July 2006.

Digital Library

[12]

J. Cortadella, M. G. Oms, M. Kishinevsky, and S. S. Sapatnekar. RTL synthesis: From logic synthesis to automatic pipelining. Proceedings of the IEEE, 103(11):2061--75, Nov. 2015.

[13]

D. E. Culler and Arvind. Resource requirements of dataflow programs. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 141--150, Honolulu, Hawaii, May 1988.

Digital Library

[14]

S. Dai, M. Tan, K. Hao, and Z. Zhang. Flushing-enabled loop pipelining for high-level synthesis. In Proceedings of the 51st Design Automation Conference, pages 1--6, San Francisco, Calif., June 2014.

Digital Library

[15]

S. Dai, R. Zhao, G. Liu, S. Srinath, U. Gupta, C. Batten, and Z. Zhang. Dynamic hazard resolution for pipelining irregular loops in high-level synthesis. In Proceedings of the 25th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 189--194, Monterey, Calif., Feb. 2017.

Digital Library

[16]

G. De Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, New York, 1994.

Digital Library

[17]

J. C. Dvorak. How the Itanium Killed the Computer Industry, Jan. 2009.

[18]

D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer Journal, 45(1):12--18, Jan. 2002.

[19]

S. A. Edwards, R. Townsend, and M. A. Kim. Compositional dataflow circuits. In Proceedings of the 15th ACM-IEEE International Conference on Formal Methods and Models for System Design, pages 175--184, Vienna, Austria, Sept. 2017.

Digital Library

[20]

M. Fingeroff. High-Level Synthesis Blue Book. Xlibris Corporation, first edition, 2010.

Digital Library

[21]

M. Galceran-Oms, J. Cortadella, and M. Kishinevsky. Speculation in elastic systems. In Proceedings of the 46th Design Automation Conference, pages 292--95, San Francisco, Calif., July 2009.

Digital Library

[22]

N. George, H. Lee, D. Novo, T. Rompf, K. Brown, A. Sujeeth, M. Odersky, K. Olukotun, and P. Ienne. Hardware system synthesis from domain-specific languages. In Proceedings of the 23rd International Conference on Field-Programmable Logic and Applications, pages 1--8, Munich, Sept. 2014.

[23]

J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, fifth edition, 2011.

Digital Library

[24]

G. Hoover and F. Brewer. Synthesizing synchronous elastic flow networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 306--11, Munich, Mar. 2008.

Digital Library

[25]

Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu. Elastic CGRAs. In Proceedings of the 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 171--80, Monterey, Calif., Feb. 2013.

Digital Library

[26]

L. Josipović, P. Brisk, and P. Ienne. An out-of-order load-store queue for spatial computing. ACM Transactions on Embedded Computing Systems (TECS), 16(5s):125:1--125:19, Sept. 2017.

Digital Library

[27]

T. Kam, M. Kishinevsky, J. Cortadella, and M. Galceran-Oms. Correct-byconstruction microarchitectural pipelining. Proceedings of the 27th International Conference on Computer-Aided Design, pages 434--41, Nov. 2008.

Digital Library

[28]

M. S. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the 1988 ACM Conference on Programming Language Design and Implementation, pages 318--28, Atlanta, Ga., June 1988.

Digital Library

[29]

J. Liu, S. Bayliss, and G. A. Constantinides. Offline synthesis of online dependence testing: Parametric loop pipelining for HLS. In Proceedings of the 23rd IEEE Symposium on Field-Programmable Custom Computing Machines, pages 159--62, Vancouver, May 2015.

Digital Library

[30]

The LLVM Compiler Infrastructure. http://www.llvm.org.

[31]

Mentor Graphics. ModelSim, 2016.

[32]

T. Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541--80, Apr. 1989.

[33]

S. F. Nielsen, J. Sparsø, J. B. Jensen, and J. S. R. Nielsen. A behavioral synthesis frontend to the Haste/TiDE design flow. In Proceedings of the 15th International Symposium on Asynchronous Circuits and Systems, pages 185--94, Chapel Hill, N.C., May 2009.

Digital Library

[34]

E. Nurvitadhi, J. C. Hoe, T. Kam, and S.-L. L. Lu. Automatic pipelining from transactional datapath specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(3):441--54, Mar. 2011.

Digital Library

[35]

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st International Symposium on Computer Architecture, pages 13--24, Minneapolis, Minn., June 2014.

Digital Library

[36]

A. R. Putnam, D. Bennett, E. Dellinger, J. Mason, and P. Sundararajan. CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures. In Proceedings of the 16th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 173--178, Monterey, Calif., Feb. 2017.

Digital Library

[37]

B. R. Rau. Iterative modulo scheduling. International Journal of Parallel Programming, 24(1):3--64, Feb. 1996.

Digital Library

[38]

J. Sparsø. Current trends in high-level synthesis of asynchronous circuits. In Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems, pages 347--50, Yasmine Hammamet, Tunisia, Dec. 2009.

[39]

M. Tan, G. Liu, R. Zhao, S. Dai, and Z. Zhang. ElasticFlow: A complexity-effective approach for pipelining irregular loop nests. In Proceedings of the 34th International Conference on Computer-Aided Design, pages 78--85, Austin, Tex., Nov. 2015.

Digital Library

[40]

L. Torczon and K. Cooper. Engineering a Compiler. Morgan Kaufmann, second edition, 2011.

Digital Library

[41]

R. Townsend, M. A. Kim, and S. A. Edwards. From functional programs to pipelined dataflow circuits. In Proceedings of the 26th International Conference on Compiler Construction, pages 76--86, Austin, TX, USA, Feb. 2017.

Digital Library

[42]

M. Vijayaraghavan and Arvind. Bounded dataflow networks and latencyinsensitive circuits. In Proceedings of the 9th International Conference on Formal Methods and Models for Codesign, pages 171--80, Cambridge, MA, July 2009.

Digital Library

[43]

Xilinx Inc. Vivado High-Level Synthesis.

Cited By

Kim CLi PMohan AButt ASampson ANigam R(2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689790
Herklotz YWickerson J(2024)Hyperblock Scheduling for Verified High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564558:PLDI(1929-1953)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656455
Pelton BSapek AEguro KLo DForin AHumphrey MXi JCox DKarandikar Rde Fine Licht JBabin ECaulfield ABurger D(2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656420
Show More Cited By

Index Terms

Recommendations

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current pipelining approach in high-level synthesis (HLS) achieves high performance for applications with regular and statically analyzable memory access patterns. However, it cannot effectively handle infrequent data-dependent structural and data ...
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesis

The importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
Coordinated parallelizing compiler optimizations and high-level synthesis

We present a high-level synthesis methodology that applies a coordinated set of coarse-grain and fine-grain parallelizing transformations. The transformations are applied both during a pre-synthesis phase and during scheduling, with the objective of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2018

310 pages

ISBN:9781450356145

DOI:10.1145/3174243

General Chair:
Jason H. Anderson
University of Toronto, Canada
,
Program Chair:
Kia Bazargan
University of Minnesota, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '18

Sponsor:

SIGDA

FPGA '18: The 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 25 - 27, 2018

CALIFORNIA, Monterey, USA

Acceptance Rates

FPGA '18 Paper Acceptance Rate 10 of 116 submissions, 9%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,697
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)15

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim CLi PMohan AButt ASampson ANigam R(2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689790
Herklotz YWickerson J(2024)Hyperblock Scheduling for Verified High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564558:PLDI(1929-1953)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656455
Pelton BSapek AEguro KLo DForin AHumphrey MXi JCox DKarandikar Rde Fine Licht JBabin ECaulfield ABurger D(2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656420
Gorius JRokicki SDerrien SRodríguez GSadayappan PSukumaran-Rajam A(2024)A Unified Memory Dependency Framework for Speculative High-Level SynthesisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641581(13-25)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641581
Xu JJosipović LZhang ZPutnam A(2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637570
Elakhras AGuerrieri AJosipovic LIenne PZhang ZPutnam A(2024)Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow CircuitsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637556(44-54)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637556
Ye HJun HChen DTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624850
Gauthier LIshikawa Y(2024)HDLRuby: A Ruby Extension for Hardware Description and Its Translation to Synthesizable Verilog HDLACM Transactions on Embedded Computing Systems10.1145/358175723:5(1-26)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1145/3581757
Liu KLu AFang Z(2024)BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00052(325-331)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00052
Liu JGraczyk MGuerrieri AJosipović L(2024)Fast Switching Activity Estimation for HLS-Produced Dataflow Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00025(118-125)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00025
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents