Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

Published: 17 September 2021 Publication History

Abstract

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6, 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

References

[1]
Abeer Al-hyari, Ahmed Shamli, Timothy Martin, Shawki Areibi, and Gary Grewal. 2020. An adaptive analytic FPGA placement framework based on deep-learning. In ACM/IEEE Workshop on Machine Learning for CAD. 3–8.
[2]
Abeer Alhyari, Ahmed Shamli, Ziad Abuwaimer, Shawki Areibi, and Gary Grewal. 2019. A deep learning framework to predict routability for FPGA circuit placement. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 334–341.
[3]
Nikhil Bansal, Sumit Gupta, Nikil Dutt, Alexandru Nicolau, and Rajesh Gupta. 2004. Network topology exploration of mesh-based coarse-grain reconfigurable architectures. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, Vol. 1. IEEE, 474–479.
[4]
Michael Canesche, Marcelo Menezes, Westerley Carvalho, Frank Torres, Peter Jamieson, José Augusto Nacif, and Ricardo Ferreira. 2020. TRAVERSAL: A fast and adaptive graph-based placement and routing for CGRAs. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (2020).
[5]
Westerley Carvalho, Michael Canesche, Lucas Reis, Frank Torres, Lucas Silva, Peter Jamieson, José Nacif, and Ricardo Ferreira. 2020. A design exploration of scalable mesh-based fully pipelined accelerators. In 2020 International Conference on Field-programmable Technology (ICFPT). IEEE, 233–236.
[6]
Deming Chen, Jason Cong, and Peichan Pan. 2006. FPGA Design Automation: A Survey. Now Publishers Inc.
[7]
Liang Chen and Tulika Mitra. 2014. Graph minor approach for application mapping on cgras. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 7, 3 (2014), 1–25.
[8]
S. Alexander Chin and Jason H. Anderson. 2018. An architecture-agnostic integer linear programming approach to CGRA mapping. In Proceedings of the 55th Annual Design Automation Conference. 1–6.
[9]
Jason Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, and Peipei Zhou. 2014. A fully pipelined and dynamically composable architecture of CGRA. In 2014 IEEE 22nd Annual International Symposium on Field-programmable Custom Computing Machines. IEEE, 9–16.
[10]
Jason Cong, Michail Romesis, and Min Xie. 2003. Optimality and stability study of timing-driven placement algorithms. In Proceedings of the 2003 IEEE/ACM International Conference on Computer-aided Design (ICCAD’03). 472–. http://dx.doi.org/10.1109/ICCAD.2003.108
[11]
James Coole and Greg Stitt. 2010. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In Int. Conf. On Hardware/software Codesign and System Synthesis. ACM.
[12]
James Coole and Greg Stitt. 2015. Adjustable-cost overlays for runtime compilation. In 2015 IEEE 23rd Annual International Symposium on Field-programmable Custom Computing Machines. IEEE, 21–24.
[13]
Marcos Vinicius Da Silva, Ricardo Ferreira, Alisson Garcia, and Joao MP Cardoso. 2006. Mesh mapping exploration for coarse-grained reconfigurable array architectures. In 2006 IEEE International Conference on Reconfigurable Computing and FPGA’s (ReConFig’06). IEEE, 1–10.
[14]
Caleb Donovick, Makai Mann, Clark Barrett, and Pat Hanrahan. 2019. Agile SMT-Based mapping for CGRAs with restricted routing networks. In Int. Conf. On ReConFigurable Computing and FPGAs (ReConFig). IEEE.
[15]
Ricardo Ferreira, Waldir Denver, Monica Pereira, Jorge Quadros, Luigi Carro, and Stephan Wong. 2014. A run-time modulo scheduling by using a binary translation mechanism. In 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV). IEEE, 75–82.
[16]
Ricardo Ferreira, Vinicius Duarte, Waldir Meireles, Monica Pereira, Luigi Carro, and Stephan Wong. 2013. A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures. In 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 188–195.
[17]
Ricardo Ferreira, Alisson Garcia, Tiago Teixeira, and Joao M. P. Cardoso. 2007. A polynomial placement algorithm for data driven coarse-grained reconfigurable architectures. In IEEE Comp Soc Annual Symposium on VLSI.
[18]
Ricardo Ferreira, Luciana Rocha, A. Santos, J. Nacif, Stephan Wong, and Luigi Carro. 2013. A run-time graph-based polynomial placement and routing algorithm for virtual fpgas. In 2013 23rd International Conference on Field Programmable Logic and Applications. IEEE, 1–8.
[19]
Ricardo Ferreira, Julio Vendramini, and Mauro Nacif. 2011. Dynamic reconfigurable multicast interconnections by using radix-4 multistage networks in fpga. In IEEE International Conference on Industrial Informatics. IEEE, 810–815.
[20]
Stephen Friedman, Allan Carroll, Brian Van Essen, Benjamin Ylvisaker, Carl Ebeling, and Scott Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 191–200.
[21]
Anna Goldie and Azalia Mirhoseini. 2020. Placement optimization with deep reinforcement learning. In Proceedings of the 2020 International Symposium on Physical Design. 3–7.
[22]
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Design Automation Conference. 1–10.
[23]
Manupa Karunaratne, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2019. 4D-CGRA: Introducing branch dimension to spatio-temporal application mapping on CGRAs. In Iccad. 1–8.
[24]
Yen-Tai Lai, Hsin-Ya Lai, and Chia-Nan Yeh. 2005. Placement for the reconfigurable datapath architecture. In IEEE Symp on Circuits and Systems.
[25]
LESC-UFV. 2021. Dataflow graph benchmarks. https://github.com/lesc-ufv/benchmarks-cases-2021.
[26]
Zhaoying Li, Dhananjaya Wijerathne, Xianzhang Chen, Anuj Pathania, and Tulika Mitra. 2021. Chordmap: Automated mapping of streaming applications onto CGRA. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (2021), 1–1. https://doi.org/10.1109/TCAD.2021.3058313
[27]
Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Haoxing Ren, Brucek Khailany, and David Z. Pan. 2020. Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (2020).
[28]
Dajiang Liu, Shouyi Yin, Guojie Luo, Jiaxing Shang, Leibo Liu, Shaojun Wei, Yong Feng, and Shangbo Zhou. 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (2019).
[29]
Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang, Kenneth Kent, and Jonathan Rose. 2011. VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling. Acm Trets 4, 4 (2011), 32.
[30]
Dani Maarouff, Ahmed Shamli, Timothy Martin, Gary Grewal, and Shawki Areibi. 2020. A deep-learning framework for predicting congestion during FPGA placement. In 2020 30th International Conference on Field-programmable Logic and Applications (FPL). IEEE, 138–144.
[31]
Xingchen Man, Leibo Liu, Jianfeng Zhu, and Shaojun Wei. 2019. A general pattern-based dynamic compilation framework for coarse-grained reconfigurable architectures. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.
[32]
Bingfeng Mei, M. Berekovic, and J. Y. Mignolet. 2007. Adres & dresc: Architecture and compiler for coarse-grain reconfigurable processors. In Fine-and Coarse-grain Reconfigurable Computing. Springer, 255–297.
[33]
Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, et al. 2021. A graph placement methodology for fast chip design. Nature 594 (2021).
[34]
Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, Mohamed Eldafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G. Graham, Jean Wu, Matthew J. P. Walker, et al. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 13, 2 (2020), 1–55.
[35]
Tony Nowatzki, Newsha Ardalani, Karthikeyan Sankaralingam, and Jian Weng. 2018. Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign. In Int. Conf. On Parallel Architectures and Compilation Techniques PACT.
[36]
University of Toronto. 2019. CGRA-ME. http://cgra-me.ece.utoronto.ca/.
[37]
Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 166–176.
[38]
Ghasem Pasandi and Massoud Pedram. 2019. A dynamic programming-based, path balancing technology mapping algorithm targeting area minimization. In IEEE/ACM International Conference on Computer-aided Design (ICCAD).
[39]
Artur Podobas, Kentaro Sano, and Satoshi Matsuoka. 2020. A survey on coarse-grained reconfigurable architectures from a performance perspective. Arxiv Preprint arXiv:2004.04509 (2020).
[40]
Cheng Tan, Chenhao Xie, Ang Li, Kevin J. Barker, and Antonino Tumeo. 2020. OpenCGRA: An open-source unified framework for modeling, testing, and evaluating CGRAs. In 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 381–388.
[41]
Santa Barbara University of California. 2020. Express Benchmarks. https://web.ece.ucsb.edu/EXPRESS/benchmark/.
[42]
Maria Vieira, Michael Canesche, Lucas Bragança, Josué Campos, Mateus Silva, Ricardo Ferreira, and Jose A. Nacif. 2021. RESHAPE: A run-time dataflow hardware-based mapping for CGRA overlays. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5.
[43]
Matthew J. P. Walker and Jason H. Anderson. 2019. Generic connectivity-based CGRA mapping via integer linear programming. In Symposium on Field-programmable Custom Computing Machines (FCCM).
[44]
Wuqi Wang, Qingsong Meng, and Zhiwei Zhang. 2017. A survey of fpga placement algorithm research. In 2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 498–502.
[45]
Matthew A. Watkins, Tony Nowatzki, and Anthony Carno. 2016. Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 138–150.
[46]
Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, and Tony Nowatzki. 2020. A hybrid systolic-dataflow architecture for inductive matrix algorithms. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 703–716.
[47]
Dhananjaya Wijerathne, Zhaoying Li, Anuj Pathania, Tulika Mitra, and Lothar Thiele. 2021. Himap: Fast and scalable high-quality mapping on CGRA via hierarchical abstraction. In Date.

Cited By

View all
  • (2023)GCN-RA: A graph convolutional network-based resource allocator for reconfigurable systemsJournal of Computational Science10.1016/j.jocs.2023.10217874(102178)Online publication date: Dec-2023

Index Terms

  1. You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
    Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
    October 2021
    1367 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3481713
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 17 September 2021
    Accepted: 01 July 2021
    Revised: 01 June 2021
    Received: 01 April 2021
    Published in TECS Volume 20, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Coarse-grained reconfigurable architectures
    2. placement
    3. graph traversal

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)GCN-RA: A graph convolutional network-based resource allocator for reconfigurable systemsJournal of Computational Science10.1016/j.jocs.2023.10217874(102178)Online publication date: Dec-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media