Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3613424.3614246acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane

Published: 08 December 2023 Publication History

Abstract

Spatial architecture is a high-performance architecture that uses control flow graphs and data flow graphs as the computational model and producer/consumer models as the execution models. However, existing spatial architectures suffer from control flow handling challenges. Upon categorizing their PE execution models, we find that they lack autonomous, peer-to-peer, and temporally loosely-coupled control flow handling capability. This leads to limited performance in intensive control programs.
A spatial architecture, Marionette, is proposed, with an explicit-designed control flow plane. The Control Flow Plane enables autonomous, peer-to-peer and temporally loosely-coupled control flow handling. The Proactive PE Configuration ensures computation-overlapped and timely configuration to improve handling Branch Divergence. The Agile PE Assignment enhance the pipeline performance of Imperfect Loops. We develop full stack of Marionette (ISA, compiler, simulator, RTL) and demonstrate that in a variety of challenging intensive control programs, compared to state-of-the-art spatial architectures, Marionette outperforms Softbrain, TIA, REVEL, and RipTide by geomean 2.88×, 3.38×, 1.55×, and 2.66×.

References

[1]
Miguel Á. Abella-González, Pedro Carollo-Fernández, Louis-Noël Pouchet, Fabrice Rastello, and Gabriel Rodríguez. 2021. PolyBench/Python: Benchmarking Python Environments with Polyhedral Optimizations. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction (Virtual, Republic of Korea) (CC 2021). Association for Computing Machinery, New York, NY, USA, 59–70. https://doi.org/10.1145/3446804.3446842
[2]
Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 413–418.
[3]
Frances E Allen. 1970. Control flow analysis. ACM Sigplan Notices 5, 7 (1970), 1–19.
[4]
Erdal Arikan. 2009. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on information Theory 55, 7 (2009), 3051–3073.
[5]
Inpyo Bae, Barend Harris, Hyemi Min, and Bernhard Egger. 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2301–2310.
[6]
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an Agile Hardware Design Flow. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1109/DAC18072.2020.9218553
[7]
Mahesh Balasubramanian. 2021. Compiler Design for Accelerating Applications on Coarse-Grained Reconfigurable Architectures. Ph. D. Dissertation. Arizona State University.
[8]
Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: a systematic framework for heterogeneous CGRA realization. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 918–932.
[9]
Volker Baumgarte, Gerd Ehlers, Frank May, Armin Nückel, Martin Vorbach, and Markus Weinhardt. 2003. PACT XPP—A self-reconfigurable data processing architecture. the Journal of Supercomputing 26, 2 (2003), 167–184.
[10]
Najmeh Nazari Bavarsad, Hosein Mohammadi Makrani, Hossein Sayadi, Lawrence Landis, Setareh Rafatirad, and Houman Homayoun. 2021. HosNa: A DPC++ Benchmark Suite for Heterogeneous Architectures. In 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 509–516.
[11]
Václav E Beneš. 1962. On rearrangeable three-stage connecting networks. The Bell System Technical Journal 41, 5 (1962), 1481–1492.
[12]
Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. 2004. Spatial computation. In Proceedings of the 11th international conference on Architectural support for programming languages and operating systems. 14–26.
[13]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). Ieee, 44–54.
[14]
Dev C Chen and Jan M Rabaey. 1992. A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths. IEEE Journal of Solid-State Circuits 27, 12 (1992), 1895–1904.
[15]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH computer architecture news 44, 3 (2016), 367–379.
[16]
Jason Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, and Peipei Zhou. 2014. A fully pipelined and dynamically composable architecture of CGRA. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 9–16.
[17]
David E Culler. 1986. Dataflow architectures. Annual review of computer science 1, 1 (1986), 225–253.
[18]
Vidushi Dadu, Jian Weng, Sihao Liu, and Tony Nowatzki. 2019. Towards general purpose acceleration by exploiting common data-dependence forms. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 924–939.
[19]
Jinyi Deng, Linyun Zhang, Lei Wang, Jiawei Liu, Kexiang Deng, Shibin Tang, Jiangyuan Gu, Boxiao Han, Fei Xu, Leibo Liu, Shaojun Wei, and Shouyi Yin. 2022. Mixed-granularity Parallel Coarse-grained Reconfigurable Architecture. In 2022 59th ACM/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/3489517.3530454
[20]
Jack B Dennis, John B Fosseen, and John P Linderman. 1974. Data flow schemas. In International Symposium on Theoretical Programming. Springer, 187–216.
[21]
Loris Duch, Soumya Basu, Miguel Peón-Quirós, Giovanni Ansaloni, Laura Pozzi, and David Atienza. 2018. i-DPs CGRA: an interleaved-datapaths reconfigurable accelerator for embedded bio-signal processing. IEEE Embedded Systems Letters 11, 2 (2018), 50–53.
[22]
Hritam Dutta, Dmitrij Kissler, Frank Hannig, Alexey Kupriyanov, Jürgen Teich, and Bernard Pottier. 2009. A holistic approach for tightly coupled reconfigurable parallel processors. Microprocessors and Microsystems 33, 1 (2009), 53–62.
[23]
Xitian Fan, Di Wu, Wei Cao, Wayne Luk, and Lingli Wang. 2018. Stream processing dual-track CGRA for object inference. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 6 (2018), 1098–1111.
[24]
Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 283–295.
[25]
Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). Ieee, 126–137.
[26]
Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). Ieee, 126–137.
[27]
Graham Gobieski, Ahmet Oguz Atli, Kenneth Mai, Brandon Lucia, and Nathan Beckmann. 2021. Snafu: an ultra-low-power, energy-minimal CGRA-generation framework and architecture. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1027–1040.
[28]
Graham Gobieski, Souradip Ghosh, Marijn Heule, Todd Mowry, Tony Nowatzki, Nathan Beckmann, and Brandon Lucia. 2022. A programmable, energy-minimal dataflow compiler and architecture. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 546–564.
[29]
Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matthew Moe, and R Reed Taylor. 2000. PipeRench: A reconfigurable architecture and compiler. Computer 33, 4 (2000), 70–77.
[30]
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 38–51.
[31]
Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. IEEE, 503–514.
[32]
Matthew R Guthaus, Jeffrey S Ringenberg, Dan Ernst, Todd M Austin, Trevor Mudge, and Richard B Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the fourth annual IEEE international workshop on workload characterization. WWC-4 (Cat. No. 01EX538). IEEE, 3–14.
[33]
Reiner W Hartenstein, Alexander G Hirschbiel, M Riedmuller, Karin Schmidt, and Michael Weber. 1991. A novel ASIC design approach based on a new machine paradigm. IEEE Journal of Solid-State Circuits 26, 7 (1991), 975–989.
[34]
Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference 2017. 1–6.
[35]
Manupa Karunaratne, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2019. 4D-CGRA: Introducing branch dimension to spatio-temporal application mapping on CGRAs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8.
[36]
Sami Khawam, Ioannis Nousias, Mark Milward, Ying Yi, Mark Muir, and Tughrul Arslan. 2007. The reconfigurable instruction cell array. IEEE Transactions on very large scale integration (VLSI) systems 16, 1 (2007), 75–85.
[37]
Changkyu Kim, Simha Sethumadhavan, Madhu S Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W Keckler. 2007. Composable lightweight processors. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 381–394.
[38]
C-T Lea. 1988. A new broadcast switching network. IEEE transactions on communications 36, 10 (1988), 1128–1137.
[39]
Feng Liu, Heejin Ahn, Stephen R Beard, Taewook Oh, and David I August. 2015. DynaSpAM: Dynamic spatial architecture mapping using out of order instruction schedules. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 541–553.
[40]
Leibo Liu, Chenchen Deng, Dong Wang, Min Zhu, Shouyi Yin, Peng Cao, and Shaojun Wei. 2013. An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications. In Proceedings of the IEEE 2013 Custom Integrated Circuits Conference. IEEE, 1–4.
[41]
Leibo Liu, Zhaoshi Li, Chen Yang, Chenchen Deng, Shouyi Yin, and Shaojun Wei. 2017. HReA: An energy-efficient embedded dynamically reconfigurable fabric for 13-dwarfs processing. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 3 (2017), 381–385.
[42]
Mahim Mishra, Timothy J Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C Goldstein, and Mihai Budiu. 2006. Tartan: evaluating spatial computation for whole program execution. ACM SIGARCH Computer Architecture News 34, 5 (2006), 163–174.
[43]
Quan M. Nguyen and Daniel Sanchez. 2020. Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 596–608. https://doi.org/10.1109/MICRO50266.2020.00056
[44]
Quan M Nguyen and Daniel Sanchez. 2021. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1064–1077.
[45]
Chris Nicol. 2017. A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing. Wave computing white paper (2017), 1–9.
[46]
Chris Nicol. 2017. A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing. Wave computing white paper (2017), 1–9.
[47]
Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, and Bin Cui. 2023. FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement. arXiv preprint arXiv:2304.03946 (2023).
[48]
Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 416–429.
[49]
Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, and Joel Emer. 2013. Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (Tel-Aviv, Israel) (ISCA ’13). Association for Computing Machinery, New York, NY, USA, 142–153. https://doi.org/10.1145/2485922.2485935
[50]
Hyunchul Park, Yongjun Park, and Scott Mahlke. 2009. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 370–380.
[51]
Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137–151.
[52]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 389–402.
[53]
Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. 2014. Machsuite: Benchmarks for accelerator design and customized architectures. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 110–119.
[54]
Tom Richardson and Ruediger Urbanke. 2008. Modern coding theory. Cambridge university press.
[55]
Behnam Robatmili, Dong Li, Hadi Esmaeilzadeh, Sibi Govindan, Aaron Smith, Andrew Putnam, Doug Burger, and Stephen W Keckler. 2013. How to implement effective prediction and forwarding for fusable dynamic multicore architectures. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 460–471.
[56]
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Nitya Ranganathan, Doug Burger, Stephen W Keckler, Robert G McDonald, and Charles R Moore. 2004. Trips: A polymorphous architecture for exploiting ilp, tlp, and dlp. ACM Transactions on Architecture and Code Optimization (TACO) 1, 1 (2004), 62–93.
[57]
Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J Kurdahi, Nader Bagherzadeh, and Eliseu M Chaves Filho. 2000. MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE transactions on computers 49, 5 (2000), 465–481.
[58]
Jeckson Dellagostin Souza, Luigi Carro, Mateus Beck Rutzig, and Antonio Carlos Schneider Beck. 2016. A reconfigurable heterogeneous multicore with a homogeneous ISA. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1598–1603.
[59]
M. Suzuki, Y. Hasegawa, Y. Yamada, N. Kaneko, K. Deguchi, H. Amano, K. Anjo, M. Motomura, K. Wakabayashi, T. Toi, and T. Awashima. 2004. Stream applications on the dynamically reconfigurable processor. In Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921). 137–144. https://doi.org/10.1109/FPT.2004.1393261
[60]
Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. IEEE, 291–302.
[61]
Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J Eggers. 2007. The wavescalar architecture. ACM Transactions on Computer Systems (TOCS) 25, 2 (2007), 1–54.
[62]
Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie, Jiajia Li, Ang Li, Kevin J Barker, and Antonino Tumeo. 2022. DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 304–316.
[63]
tenstorrent. [n. d.]. Software and Silicon in Serbia w/ Ljubisa Bajic and Jim Keller. https://tenstorrent.com/research/software-and-silicon-in-serbia-w-ljubisa-bajic-and-jim-keller/ (2022, Mar 17).
[64]
Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic cgras for irregular loop specialization. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 412–425.
[65]
Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating programmable architectures for imaging and vision applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–13.
[66]
Dani Voitsechov and Yoav Etsion. 2014. Single-graph multiple flows: Energy efficient design alternative for GPGPUs. ACM SIGARCH computer architecture news 42, 3 (2014), 205–216.
[67]
Dani Voitsechov, Oron Port, and Yoav Etsion. 2018. Inter-thread communication in multithreaded, reconfigurable coarse-grain arrays. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 42–54.
[68]
John Von Neumann. 1993. First Draft of a Report on the EDVAC. IEEE Annals of the History of Computing 15, 4 (1993), 27–75.
[69]
Matthew A Watkins, Tony Nowatzki, and Anthony Carno. 2016. Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In 2016 IEEE International symposium on high performance computer architecture (HPCA). IEEE, 138–150.
[70]
Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, and Tony Nowatzki. 2020. A hybrid systolic-dataflow architecture for inductive matrix algorithms. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 703–716.
[71]
Alfred KW Yeung and Jan M Rabaey. 1993. A reconfigurable data-driven multiprocessor architecture for rapid prototyping of high throughput DSP algorithms. In [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences, Vol. 1. IEEE, 169–178.

Cited By

View all
  • (2024)Efficient Orchestrated AI Workflows Execution on Scale-Out Spatial ArchitectureIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34762371:2(229-243)Online publication date: Dec-2024

Index Terms

  1. Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
      October 2023
      1528 pages
      ISBN:9798400703294
      DOI:10.1145/3613424
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 December 2023

      Check for updates

      Author Tags

      1. coarse-grained reconfigurable array
      2. control flow
      3. control plane
      4. spatial architecture

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • the Science and Technology Innovation 2030 - New Generation of AI Project
      • Beijing Municipal Science and Technology Project
      • NSFC
      • the National Key Research and Development Program
      • Beijing National Research Center For Information Science And Technology
      • the Beijing Advanced Innovation Center for Integrated Circuits
      • Tsinghua University-China Mobile Communications Group Co.,Ltd. Joint Institute

      Conference

      MICRO '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)735
      • Downloads (Last 6 weeks)72
      Reflects downloads up to 26 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Efficient Orchestrated AI Workflows Execution on Scale-Out Spatial ArchitectureIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34762371:2(229-243)Online publication date: Dec-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media