Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3174243.3174264acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Dynamically Scheduled High-level Synthesis

Published: 15 February 2018 Publication History

Abstract

High-level synthesis (HLS) tools almost universally generate statically scheduled datapaths. Static scheduling implies that circuits out of HLS tools have a hard time exploiting parallelism in code with potential memory dependencies, with control-dependent dependencies in inner loops, or where performance is limited by long latency control decisions. The situation is essentially the same as in computer architecture between Very-Long Instruction Word (VLIW) processors and dynamically scheduled superscalar processors; the former display the best performance per cost in highly regular embedded applications, but general purpose, irregular, and control-dominated computing tasks require the runtime flexibility of dynamic scheduling. In this work, we show that high-level synthesis of dynamically scheduled circuits is perfectly feasible by describing the implementation of a prototype synthesizer which generates a particular form of latency-insensitive synchronous circuits. Compared to a commercial HLS tool, the result is a different trade-off between performance and circuit complexity, much as superscalar processors represent a different trade-off compared to VLIW processors: in demanding applications, the performance is very significantly improved at an affordable cost. We here demonstrate only the first steps towards more performant high-level synthesis tools adapted to emerging FPGA applications and the demands of computing in broader application domains.

References

[1]
M. Alle, A. Morvan, and S. Derrien. Runtime dependency analysis for loop pipelining in high-level synthesis. In Proceedings of the 50th Design Automation Conference, pages 51:1--51:10, Austin, Tex., June 2013.
[2]
Amazon.com, Inc. Amazon EC2 F1 Instances.
[3]
M. Budiu, P. V. Artigas, and S. C. Goldstein. Dataflow: A complement to superscalar. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 177--86, Austin, Tex., Mar. 2005.
[4]
M. Budiu and S. C. Goldstein. Pegasus: An efficient intermediate representation. Technical Report Carnegie Mellon University-CS-02--107, Carnegie Mellon University, May 2002.
[5]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli. Theory of latencyinsensitive design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-20(9):1059--76, Sept. 2001.
[6]
J. Carmona, J. Júlvez, J. Cortadella, and M. Kishinevsky. A scheduling strategy for synchronous elastic designs. Journal Fundamenta Informaticae, 108(1--2):1--21, Jan. 2011.
[7]
A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A cloud-scale acceleration architecture. In Proceedings of the 49th International Symposium on Microarchitecture, pages 1--13, Taipei, Taiwan, Oct. 2016.
[8]
S. Chatterjee, M. Kishinevsky, and U. Y. Ogras. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Design & Test of Computers, 29(3):80--88, June 2012.
[9]
S. Cheng and J. Wawrzynek. Synthesis of statically analyzable accelerator networks from sequential programs. In Proceedings of the International Conference on Computer-Aided Design, pages 126--33, Austin, Tex., Nov. 2016.
[10]
D. Chiou. Intel acquires Altera: How will the world of FPGAs be affected? In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, page 148, Monterey, Calif., Feb. 2016.
[11]
J. Cortadella, M. Kishinevsky, and B. Grundmann. Synthesis of synchronous elastic architectures. In Proceedings of the 43rd Design Automation Conference, pages 657--62, San Francisco, Calif., July 2006.
[12]
J. Cortadella, M. G. Oms, M. Kishinevsky, and S. S. Sapatnekar. RTL synthesis: From logic synthesis to automatic pipelining. Proceedings of the IEEE, 103(11):2061--75, Nov. 2015.
[13]
D. E. Culler and Arvind. Resource requirements of dataflow programs. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 141--150, Honolulu, Hawaii, May 1988.
[14]
S. Dai, M. Tan, K. Hao, and Z. Zhang. Flushing-enabled loop pipelining for high-level synthesis. In Proceedings of the 51st Design Automation Conference, pages 1--6, San Francisco, Calif., June 2014.
[15]
S. Dai, R. Zhao, G. Liu, S. Srinath, U. Gupta, C. Batten, and Z. Zhang. Dynamic hazard resolution for pipelining irregular loops in high-level synthesis. In Proceedings of the 25th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 189--194, Monterey, Calif., Feb. 2017.
[16]
G. De Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, New York, 1994.
[17]
J. C. Dvorak. How the Itanium Killed the Computer Industry, Jan. 2009.
[18]
D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer Journal, 45(1):12--18, Jan. 2002.
[19]
S. A. Edwards, R. Townsend, and M. A. Kim. Compositional dataflow circuits. In Proceedings of the 15th ACM-IEEE International Conference on Formal Methods and Models for System Design, pages 175--184, Vienna, Austria, Sept. 2017.
[20]
M. Fingeroff. High-Level Synthesis Blue Book. Xlibris Corporation, first edition, 2010.
[21]
M. Galceran-Oms, J. Cortadella, and M. Kishinevsky. Speculation in elastic systems. In Proceedings of the 46th Design Automation Conference, pages 292--95, San Francisco, Calif., July 2009.
[22]
N. George, H. Lee, D. Novo, T. Rompf, K. Brown, A. Sujeeth, M. Odersky, K. Olukotun, and P. Ienne. Hardware system synthesis from domain-specific languages. In Proceedings of the 23rd International Conference on Field-Programmable Logic and Applications, pages 1--8, Munich, Sept. 2014.
[23]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, fifth edition, 2011.
[24]
G. Hoover and F. Brewer. Synthesizing synchronous elastic flow networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 306--11, Munich, Mar. 2008.
[25]
Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu. Elastic CGRAs. In Proceedings of the 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 171--80, Monterey, Calif., Feb. 2013.
[26]
L. Josipović, P. Brisk, and P. Ienne. An out-of-order load-store queue for spatial computing. ACM Transactions on Embedded Computing Systems (TECS), 16(5s):125:1--125:19, Sept. 2017.
[27]
T. Kam, M. Kishinevsky, J. Cortadella, and M. Galceran-Oms. Correct-byconstruction microarchitectural pipelining. Proceedings of the 27th International Conference on Computer-Aided Design, pages 434--41, Nov. 2008.
[28]
M. S. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the 1988 ACM Conference on Programming Language Design and Implementation, pages 318--28, Atlanta, Ga., June 1988.
[29]
J. Liu, S. Bayliss, and G. A. Constantinides. Offline synthesis of online dependence testing: Parametric loop pipelining for HLS. In Proceedings of the 23rd IEEE Symposium on Field-Programmable Custom Computing Machines, pages 159--62, Vancouver, May 2015.
[30]
The LLVM Compiler Infrastructure. http://www.llvm.org.
[31]
Mentor Graphics. ModelSim, 2016.
[32]
T. Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541--80, Apr. 1989.
[33]
S. F. Nielsen, J. Sparsø, J. B. Jensen, and J. S. R. Nielsen. A behavioral synthesis frontend to the Haste/TiDE design flow. In Proceedings of the 15th International Symposium on Asynchronous Circuits and Systems, pages 185--94, Chapel Hill, N.C., May 2009.
[34]
E. Nurvitadhi, J. C. Hoe, T. Kam, and S.-L. L. Lu. Automatic pipelining from transactional datapath specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(3):441--54, Mar. 2011.
[35]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st International Symposium on Computer Architecture, pages 13--24, Minneapolis, Minn., June 2014.
[36]
A. R. Putnam, D. Bennett, E. Dellinger, J. Mason, and P. Sundararajan. CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures. In Proceedings of the 16th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 173--178, Monterey, Calif., Feb. 2017.
[37]
B. R. Rau. Iterative modulo scheduling. International Journal of Parallel Programming, 24(1):3--64, Feb. 1996.
[38]
J. Sparsø. Current trends in high-level synthesis of asynchronous circuits. In Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems, pages 347--50, Yasmine Hammamet, Tunisia, Dec. 2009.
[39]
M. Tan, G. Liu, R. Zhao, S. Dai, and Z. Zhang. ElasticFlow: A complexity-effective approach for pipelining irregular loop nests. In Proceedings of the 34th International Conference on Computer-Aided Design, pages 78--85, Austin, Tex., Nov. 2015.
[40]
L. Torczon and K. Cooper. Engineering a Compiler. Morgan Kaufmann, second edition, 2011.
[41]
R. Townsend, M. A. Kim, and S. A. Edwards. From functional programs to pipelined dataflow circuits. In Proceedings of the 26th International Conference on Compiler Construction, pages 76--86, Austin, TX, USA, Feb. 2017.
[42]
M. Vijayaraghavan and Arvind. Bounded dataflow networks and latencyinsensitive circuits. In Proceedings of the 9th International Conference on Formal Methods and Models for Codesign, pages 171--80, Cambridge, MA, July 2009.
[43]
Xilinx Inc. Vivado High-Level Synthesis.

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Hyperblock Scheduling for Verified High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564558:PLDI(1929-1953)Online publication date: 20-Jun-2024
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2018
310 pages
ISBN:9781450356145
DOI:10.1145/3174243
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compiler
  2. dynamically scheduled circuits
  3. high-level synthesis
  4. pipelining

Qualifiers

  • Research-article

Conference

FPGA '18
Sponsor:

Acceptance Rates

FPGA '18 Paper Acceptance Rate 10 of 116 submissions, 9%;
Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)196
  • Downloads (Last 6 weeks)15
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Hyperblock Scheduling for Verified High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564558:PLDI(1929-1953)Online publication date: 20-Jun-2024
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)A Unified Memory Dependency Framework for Speculative High-Level SynthesisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641581(13-25)Online publication date: 17-Feb-2024
  • (2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
  • (2024)Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow CircuitsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637556(44-54)Online publication date: 1-Apr-2024
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)HDLRuby: A Ruby Extension for Hardware Description and Its Translation to Synthesizable Verilog HDLACM Transactions on Embedded Computing Systems10.1145/358175723:5(1-26)Online publication date: 14-Aug-2024
  • (2024)BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00052(325-331)Online publication date: 2-Sep-2024
  • (2024)Fast Switching Activity Estimation for HLS-Produced Dataflow Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00025(118-125)Online publication date: 2-Sep-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media