Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3470496.3527411acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article
Open access

SIMD2: a generalized matrix instruction set for accelerating tensor computation beyond GEMM

Published: 11 June 2022 Publication History

Abstract

Matrix-multiplication units (MXUs) are now prevalent in every computing platform. The key attribute that makes MXUs so successful is the semiring structure, which allows tiling for both parallelism and data reuse. Nonetheless, matrix-multiplication is not the only algorithm with such attributes. We find that many algorithms share the same structure and differ in only the core operation; for example, using add-minimum instead of multiply-add. Algorithms with a semiring-like structure therefore have potential to be accelerated by a general-purpose matrix operation architecture, instead of common MXUs.
In this paper, we propose SIMD2, a new programming paradigm to support generalized matrix operations with a semiring-like structure. SIMD2 instructions accelerate eight more types of matrix operations, in addition to matrix multiplications. Since SIMD2 instructions resemble a matrix-multiplication instruction, we are able to build SIMD2 architecture on top of any MXU architecture with minimal modifications. We developed a framework that emulates and validates SIMD2 using NVIDIA GPUs with Tensor Cores. Across 8 applications, SIMD2 provides up to 38.59× speedup and more than 6.94× on average over optimized CUDA programs, with only 5% of full-chip area overhead.

References

[1]
BLAS (Basic Linear Algebra Subprograms). http://www.netlib.org/blas/, 2004.
[2]
Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, et al. Think fast: a tensor streaming processor (TSP) for accelerating deep learning workloads. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 145--158, 2020.
[3]
Arm Corporation. Introducing the Scalable Matrix Extension for the Armv9-A Architecture. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/scalable-matrix-extension-armv9-a-architecture, 2021.
[4]
Karam Chatha. Qualcomm® Cloud Al 100: 12TOPS/W Scalable, High Performance and Low Latency Deep Learning Inference Accelerator. In 2021 IEEE Hot Chips 33 Symposium (HCS), 2021.
[5]
Jack Choquette, Olivier Giroux, and Denis Foley. Volta: Performance and Programmability. IEEE Micro, 38(2):42--52, 2018.
[6]
Jesus Corbal, Roger Espasa, and Mateo Valero. MOM: a matrix SIMD instruction set architecture for multimedia applications. In Proceedings of the 1999 ACM/IEEE conference on Supercomputing, 1999.
[7]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
[8]
Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, and Wen-mei Hwu. Accelerating reduction and scan using tensor core units. In Proceedings of the ACM International Conference on Supercomputing, ICS '19, pages 46--57, 2019.
[9]
Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, and Wen-mei Hwu. Accelerating fourier and number theoretic transforms using tensor cores and warp shuffles. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 345--355, 2021.
[10]
Jeff Erickson. Algorithms. 2019.
[11]
Boyuan Feng, Yuke Wang, Guoyang Chen, Weifeng Zhang, Yuan Xie, and Yufei Ding. Egemm-tc: Accelerating scientific computing on tensor cores with extended precision. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '21, pages 278--291, 2021.
[12]
Robert W. Floyd. Algorithm 97: Shortest path. Commun. ACM, page 345, jun 1962.
[13]
Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 922--936, 2020.
[14]
Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. SparTen: A sparse tensor accelerator for convolutional neural networks. In 2019 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), MICRO '52, 2019.
[15]
Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J. Higham. Harnessing gpu tensor cores for fast fp16 arithmetic to speed up mixed-precision iterative refinement solvers. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 603--613, 2018.
[16]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
[17]
Pawan Harish, P.J. Narayanan, Vibhav Vineet, and Suryakant Patidar. Chapter 7 - fast minimum spanning tree computation. In Wen mei W. Hwu, editor, GPU Computing Gems Jade Edition, pages 77--88. Morgan Kaufmann, 2012.
[18]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. ExTensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 319--333, 2019.
[19]
Jared Hoberock and Nathan Bell. Thrust: A parallel template library. http://thrust.github.io/, 2010.
[20]
Pedro Holanda and Hannes Mühleisen. Relational queries with a tensor processing unit. In Proceedings of the 15th International Workshop on Data Management on New Hardware, DaMoN'19, 2019.
[21]
Kuan-Chieh Hsu and Hung-Wei Tseng. Accelerating Applications using Edge Tensor Processing Units. In SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2021, 2021.
[22]
Yu-Ching Hu, Yuliang Li, and Hung-Wei Tseng. TCUDB: Accelerating Database with Tensor Processors. In the 2022 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD 2022, 2022.
[23]
Intel Corporation. Intrinsics for Intel(R) Advanced Matrix Extensions (Intel(R) AMX) Instructions. https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-matrix-extensions-intel-amx-instructions.html, 2021.
[24]
Jeff Hammond. cuASR: CUDA Algebra for Semirings. https://github.com/hpcgarage/cuASR, 2021.
[25]
Jiacheng Pan. CUDA MST. https://github.com/jiachengpan/cudaMST, 2016.
[26]
Norman P Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. A Domain-specific Supercomputer for Training Deep Neural Networks. Communications of the ACM, 63(7):67--78, 2020.
[27]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA '17, pages 1--12, 2017.
[28]
Gary J. Katz and Joseph T. Kider. All-pairs shortest-paths for large graphs on the gpu. In Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 47--55, 2008.
[29]
Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, et al. Mathematical foundations of the graphblas. In 2016 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--9, 2016.
[30]
Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, and Timothy G. Rogers. Accel-sim: An extensible simulation framework for validated gpu modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 473--486, 2020.
[31]
Joseph B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 1956.
[32]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. TensorDIMM: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), MICRO '52, pages 740--753, 2019.
[33]
Sangwon Lee, Gyuyoung Park, and Myoungsoo Jung. TensorPRAM: Designing a scalable heterogeneous deep learning accelerator with byte-addressable prams. In 12th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2020, July 13--14, 2020, 2020.
[34]
Wai-Kong Lee, Hwajeong Seo, Zhenfei Zhang, and Seong Oun Hwang. Tensorcrypto: High throughput acceleration of lattice-based cryptography using tensor core on gpu. IEEE Access, 10:20616--20632, 2022.
[35]
M Leyzorek, RS Gray, AA Johnson, WC Ladew, SR Meaker Jr, RM Petry, and RN Seitz. Investigation of model techniques-first annual report-6 june 1956--1 july 1957--a study of model techniques for communication systems. Case Institute of Technology, Cleveland, Ohio, 1957.
[36]
Binrui Li, Shenggan Cheng, and James Lin. tcfft: A fast half-precision fft library for nvidia tensor cores. In 2021 IEEE International Conference on Cluster Computing (CLUSTER), pages 1--11, 2021.
[37]
Heng Liao, Jiajin Tu, Jing Xia, Hu Liu, Xiping Zhou, Honghui Yuan, and Yuxing Hu. Ascend: a scalable and unified architecture for ubiquitous deep neural network computing: Industry track paper. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021.
[38]
Yiqian Liu and Martin Burtscher. ECL-APSP v1.0. https://userweb.cs.txstate.edu/~burtscher/research/ECL-APSP/, 2021.
[39]
Yu-Chia Liu and Hung-Wei Tseng. NDS: N-Dimensional Storage. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021, pages 28--45, 2021.
[40]
Tianjian Lu, Yi-Fan Chen, Blake Hechtman, Tao Wang, and John Anderson. Large-scale discrete fourier transform on tpus. IEEE Access, 2021.
[41]
Tianjian Lu, Thibault Marin, Yue Zhuo, Yi-Fan Chen, and Chao Ma. Accelerating mri reconstruction on tpus. In 2020 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--9, 2020.
[42]
Tianjian Lu, Thibault Marin, Yue Zhuo, Yi-Fan Chen, and Chao Ma. Nonuniform fast fourier transform on tpus. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 783--787, 2021.
[43]
Ben D. Lund and Justin W. Smith. A multi-stage cuda kernel for floyd-warshall. ArXiv, abs/1001.4108, 2010.
[44]
Mateusz Bojanowski. Cuda Floyd Warshall implementation. https://github.com/MTB90/cuda-floyd_warshall, 2018.
[45]
Guy Melancon. Just how dense are dense graphs in the real world? a methodological note. In Proceedings of the 2006 AVI Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization, BELIV '06, pages 1--7, 2006.
[46]
Heajung Min, Kyung Min Han, and Young J. Kim. Accelerating probabilistic volumetric mapping using ray-tracing graphics hardware. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 5440--5445, 2021.
[47]
Mehryar Mohri. Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics, 7(3):321--350, 2002.
[48]
Alan Morningstar, Markus Hauru, Jackson Beall, Martin Ganahl, Adam G. M. Lewis, Vedika Khemani, and Guifre Vidal. Simulation of Quantum Many-Body Dynamics with Tensor Processing Units: Floquet Prethermalization. arXiv preprint arXiv:2111.08044, 2021.
[49]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In the 51st Annual IEEE/ACM international symposium on Microarchitecture (MICRO), 2018.
[50]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. PHI: Architectural Support for Synchronization-and Bandwidth-Efficient Commutative Scatter Updates. In the 52nd Annual IEEE/ACM international symposium on Microarchitecture (MICRO), 2019.
[51]
Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jim?mnez, and Leonel Sousa. Exploring the binary precision capabilities of tensor cores for epistasis detection. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 338--347, 2020.
[52]
NVIDIA A100 Tensor Core GPU Architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, 2020.
[53]
NVIDIA Corporation. NVIDIA T4 TENSOR CORE GPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-t4/t4-tensor-core-datasheet-951643.pdf, 2019.
[54]
NVIDIA Corporation. Warp Level Matrix Multiply-Accumulate Instructions. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions, 2021.
[55]
NVIDIA Corporation. NVIDIA Hopper Architecture In-Depth. https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/, 2022.
[56]
Egor Orachyov, Pavel Alimov, and Semyon Grigorev. cuBool: sparse Boolean linear algebra for NVIDIA CUDA. https://github.com/JetBrains-Research/cuBool, 2021. Version 1.2.0.
[57]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 27--40, 2017.
[58]
Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. SIGMA: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 58--70, 2020.
[59]
Md Aamir Raihan, Negar Goli, and Tor M Aamodt. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 79--92, 2019.
[60]
Scott Rostrup, Shweta Srivastava, and Kishore Singhal. Fast and memory-efficient minimum spanning tree on the gpu. International Journal of Computational Science and Engineering, 2013.
[61]
Justin Salmon and Simon McIntosh-Smith. Exploiting hardware-accelerated ray tracing for monte carlo particle transport with openmc. In 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 19--29, 2019.
[62]
Stanislav G. Sedukhin and Marcin Paprzycki. Generalizing matrix multiplication for efficient computations on modern computers. In Parallel Processing and Applied Mathematics, pages 225--234, 2012.
[63]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. Brief announcement: The problem based benchmark suite. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pages 68--70, 2012.
[64]
Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. Graphr: Accelerating graph processing using reram. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 531--543, 2018.
[65]
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. MatRaptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 766--780, 2020.
[66]
Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 689--702, 2020.
[67]
Brian W Thompto, Dung Q Nguyen, José E Moreira, Ramon Bertran, Hans Jacobson, Richard J Eickemeyer, Rahul M Rao, Michael Goulet, Marcy Byers, Christopher J Gonzalez, et al. Energy Efficiency Boost in the AI-Infused POWER10 Processor. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021.
[68]
Jesmin Jahan Tithi, Neal C Crago, and Joel S Emer. Exploiting spatial architectures for edit distance algorithms. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014.
[69]
Michel Barlaud Vincent Garcia, Éric Debreuve. kNN-CUDA. https://github.com/vincentfpgarcia/kNN-CUDA, 2018.
[70]
John Von Neumann. First draft of a report on the edvac. IEEE Annals of the History of Computing, 15(4), 1993.
[71]
Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, and Tony Nowatzki. A hybrid systolic-dataflow architecture for inductive matrix algorithms. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020.
[72]
Wm A Wulf and Sally A McKee. Hitting the memory wall: Implications of the obvious. ACM SIGARCH computer architecture news, 23(1):20--24, 1995.
[73]
Guowei Zhang, Nithya Attaluri, Joel S. Emer, and Daniel Sanchez. Gamma: Leveraging Gustavson's Algorithm to Accelerate Sparse Matrix Multiplication. In Proceedings of the 26th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-26), April 2021.
[74]
Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. Graphp: Reducing communication for pim-based graph processing with efficient data partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018.
[75]
Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. SpArch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 261--274, 2020.
[76]
Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 15--28, 2018.
[77]
Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern GPUs. In the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 359--371, 2019.
[78]
Yuhao Zhu. RTNN: Accelerating Neighbor Search Using Hardware Ray Tracing. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '22, pages 76--89, 2022.
[79]
Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. Graphq: Scalable pim-based graph processing. In the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 712--725, 2019.

Cited By

View all
  • (2024)LLM-Aided Compilation for Tensor Accelerators2024 IEEE LLM Aided Design Workshop (LAD)10.1109/LAD62341.2024.10691748(1-14)Online publication date: 28-Jun-2024
  • (2024)LLM-Aided Compilation for Tensor Accelerators2024 IEEE LLM Aided Design Workshop (LAD)10.1109/LAD62341.2024.10691720(1-16)Online publication date: 28-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
June 2022
1097 pages
ISBN:9781450386104
DOI:10.1145/3470496
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation (NSF)
  • National Science Foundation (NSF)

Conference

ISCA '22
Sponsor:

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)969
  • Downloads (Last 6 weeks)130
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LLM-Aided Compilation for Tensor Accelerators2024 IEEE LLM Aided Design Workshop (LAD)10.1109/LAD62341.2024.10691748(1-14)Online publication date: 28-Jun-2024
  • (2024)LLM-Aided Compilation for Tensor Accelerators2024 IEEE LLM Aided Design Workshop (LAD)10.1109/LAD62341.2024.10691720(1-16)Online publication date: 28-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media