research-article

TDGraph: a topology-driven accelerator for high-performance streaming graph processing

Authors:

Hui YuAuthors Info & Claims

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 116 - 129

https://doi.org/10.1145/3470496.3527409

Published: 11 June 2022 Publication History

Abstract

Many solutions have been recently proposed to support the processing of streaming graphs. However, for the processing of each graph snapshot of a streaming graph, the new states of the vertices affected by the graph updates are propagated irregularly along the graph topology. Despite the years' research efforts, existing approaches still suffer from the serious problems of redundant computation overhead and irregular memory access, which severely underutilizes a many-core processor. To address these issues, this paper proposes a topology-driven programmable accelerator TDGraph, which is the first accelerator to augment the many-core processors to achieve high performance processing of streaming graphs. Specifically, we propose an efficient topology-driven incremental execution approach into the accelerator design for more regular state propagation and better data locality. TDGraph takes the vertices affected by graph updates as the roots to prefetch other vertices along the graph topology and synchronizes the incremental computations of them on the fly. In this way, most state propagations originated from multiple vertices affected by different graph updates can be conducted together along the graph topology, which help reduce the redundant computations and data access cost. Besides, through the efficient coalescing of the accesses to vertex states, TDGraph further improves the utilization of the cache and memory bandwidth. We have evaluated TDGraph on a simulated 64-core processor. The results show that, the state-of-the-art software system achieves the speedup of 7.1~21.4 times after integrating with TDGraph, while incurring only 0.73% area cost. Compared with four cutting-edge accelerators, i.e., HATS, Minnow, PHI, and DepGraph, TDGraph gains the speedups of 4.6~12.7, 3.2~8.6, 3.8~9.7, and 2.3~6.1 times, respectively.

References

[1]

2022. DDR4 SDRAM System Power Calculator. https://media-www.micron.com/-/media/client/global/documents/products/power-calculator/ddr4_power_calc.xlsm?rev=a8a5e30d8a7e41c4adcaad2df73934b4.

[2]

2022. macsim. https://github.com/gthparch/macsim.

[3]

2022. SNAP. http://snap.stanford.edu/data/index.html.

[4]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 105--117.

Digital Library

[5]

Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In Proceedings of the 2016 International Conference on Supercomputing. 39:1--39:11 pages.

[6]

Sam Ainsworth and Timothy M. Jones. 2018. An Event-Triggered Programmable Prefetcher for Irregular Workloads. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 578--592.

[7]

Sam Ainsworth and Timothy M. Jones. 2019. Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective. ACM Transactions on Computer Systems 36, 3 (2019), 8:1--8:34.

Digital Library

[8]

Mikhail Asiatici and Paolo Ienne. 2021. Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. 609--622.

Digital Library

[9]

Vignesh Balaji, Neal Crago, Aamer Jaleel, and Brandon Lucia. 2021. P-OPT: Practical Optimal Cache Replacement for Graph Analytics. In Proceedings of the 27th IEEE International Symposium on High-Performance Computer Architecture. 668--681.

[10]

Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads. In Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture. 373--386.

[11]

Abanti Basak, Zheng Qu, Jilan Lin, Alaa R. Alameldeen, Zeshan Chishti, Yufei Ding, and Yuan Xie. 2021. Improving Streaming Graph Processing Performance using Input Knowledge. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1036--1050.

Digital Library

[12]

Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. Journal of the ACM 46, 5 (1999), 720--748.

Digital Library

[13]

Nagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, and Vijaykrishnan Narayanan. 2020. GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 433--445.

Digital Library

[14]

Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th European Conference on Computer Systems. 85--98.

Digital Library

[15]

David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1999. Parallel computer architecture: a hardware/software approach. Gulf Professional Publishing.

[16]

Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 217--226.

Digital Library

[17]

Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference. 1775--1784.

Digital Library

[18]

Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. Spot-Light: Detecting Anomalies in Streaming Graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1378--1386.

Digital Library

[19]

Priyank Faldu, Jeff Diamond, and Boris Grot. 2020. Domain-Specialized Cache Management for Graph Analytics. In Proceedings of the 26th IEEE International Symposium on High Performance Computer Architecture. 234--248.

[20]

Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental Graph Computations: Doable and Undoable. In Proceedings of the 2017 ACM International Conference on Management of Data. 155--169.

Digital Library

[21]

Shufeng Gong, Chao Tian, Qiang Yin, Wenyuan Yu, Yanfeng Zhang, Liang Geng, Song Yu, Ge Yu, and Jingren Zhou. 2021. Automating Incremental Graph Processing with Flexible Memoization. Proceedings of the VLDB Endowment 14, 9 (2021), 1613--1625.

Digital Library

[22]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30.

Digital Library

[23]

Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. 56:1--56:13.

[24]

Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the 9th European Conference on Computer Systems. 1:1--1:14.

Digital Library

[25]

Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel S. Emer. 2010. High performance cache replacement using re-reference interval prediction. In Proceedings of the 37th International Symposium on Computer Architecture. 60--71.

[26]

Xiaolin Jiang, Chengshuo Xu, Xizhe Yin, Zhijia Zhao, and Rajiv Gupta. 2021. Tripoline: generalized incremental graph processing via graph triangle inequality. In Proceedings of the 16th European Conference on Computer Systems. 17--32.

Digital Library

[27]

Daniel A. Jiménez. 2013. Insertion and promotion for tree-based PseudoLRU last-level caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 284--296.

Digital Library

[28]

Sang Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture. 411--424.

Digital Library

[29]

Kevin M. Lepak and Mikko H. Lipasti. 2002. Temporally silent stores. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems. 30--41.

[30]

Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 177--187.

Digital Library

[31]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.

[32]

Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: sparsity-aware incremental processing of streaming graphs. In Proceedings of the 16th European Conference on Computer Systems. 83--98.

Digital Library

[33]

Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the 14th EuroSys Conference 2019. 25:1--25:16.

Digital Library

[34]

Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: graph semantics aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture9. 116--128.

Digital Library

[35]

Andrew McCrabb, Eric Winsor, and Valeria Bertacco. 2019. DREDGE: Dynamic Repartitioning during Dynamic Graph Execution. In Proceedings of the 56th Annual Design Automation Conference. 28.

Digital Library

[36]

Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sánchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. 1--14.

Digital Library

[37]

Anurag Mukkara, Nathan Beckmann, and Daniel Sánchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 1009--1022.

Digital Library

[38]

Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles. 439--455.

Digital Library

[39]

Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture. 457--468.

[40]

Quan M. Nguyen and Daniel Sánchez. 2021. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1064--1077.

[41]

Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven M.Burns, and Özcan Özturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture. 166--177.

[42]

Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876--1888.

Digital Library

[43]

Shafiur Rahman, Nael Abu-Ghazaleh, and Rajiv Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture. 908--921.

[44]

Shafiur Rahman, Mahbod Afarin, Nael B. Abu-Ghazaleh, and Rajiv Gupta. 2021. JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1091--1105.

Digital Library

[45]

Kenneth A. Ross. 2007. Efficient Hash Probes on Modern Processors. In Proceedings of the 23rd International Conference on Data Engineering. 1297--1301.

[46]

Daniel Sánchez and Christos Kozyrakis. 2013. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 475--486.

Digital Library

[47]

David Sayce. 2020. The Number of tweets per day in 2020. https://www.dsayce.com/social-media/tweets-day/.

[48]

Steven L. Scott. 1996. Synchronization and Communication in the T3E Multiprocessor. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. 26--36.

Digital Library

[49]

Albert Segura, Jose-Maria Arnau, and Antonio González. 2019. SCU: a GPU stream compaction unit for graph processing. In Proceedings of the 46th International Symposium on Computer Architecture. 424--435.

Digital Library

[50]

Albert Segura, Jose-Maria Arnau, and Antonio Gonzalez. 2021. Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions. IEEE Trans. Comput. (2021), 1--12.

[51]

Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey S. Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proceedings of the 22nd International Conference on Parallel and Distributed Computing. 319--333.

Digital Library

[52]

Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates. In Proceedings of the 2018 ACM Symposium on Cloud Computing. 301--312.

Digital Library

[53]

Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In Proceedings of the 2016 International Conference on Management of Data. 417--430.

Digital Library

[54]

Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135--146.

[55]

Avinash Sodani, Roger Gramunt, Jesüs Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro 36, 2 (2016), 34--46.

Digital Library

[56]

Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Helen Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing Using ReRAM. In Proceedings of the 24th IEEE International Symposium on High Performance Computer Architecture. 531--543.

[57]

Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, and Lizy K. John. 2018. Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction. Proceedings of the VLDB Endowment 12, 2 (2018), 154--168.

Digital Library

[58]

Yanwei Song and Engin Ipek. 2015. More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 242--254.

Digital Library

[59]

Pourya Vaziri and Keval Vora. 2021. Controlling Memory Footprint of Stateful Streaming Graph Processing. In Proceedings of the 2021 USENIX Annual Technical Conference. 269--283.

[60]

Keval Vora, Rajiv Gupta, and Guoqing Xu. 2016. Synergistic Analysis of Evolving Graphs. ACM Transactions on Architecture and Code Optimization 13, 4 (2016), 32:1--32:27.

Digital Library

[61]

Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 237--251.

Digital Library

[62]

Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 194--204.

Digital Library

[63]

Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2019. Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 615--628.

Digital Library

[64]

Yifan Yang, Joel S. Emer, and Daniel Sanchez. 2021. SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. 1070--1082.

Digital Library

[65]

Yifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2020. GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 419--432.

Digital Library

[66]

Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: indirect memory prefetcher. In Proceedings of the 48th International Symposium on Microarchitecture. 178--190.

Digital Library

[67]

Dan Zhang, Xiaoyu Ma, Michael Thomson, and Derek Chiou. 2018. Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 593--607.

Digital Library

[68]

Guowei Zhang, Virginia Chiu, and Daniel Sanchez. 2016. Exploiting Semantic Commutativity in Hardware Speculation. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. Article 34:1--34:12.

[69]

Guowei Zhang, Webb Horn, and Daniel Sanchez. 2015. Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture. 13--25.

Digital Library

[70]

Mingxing Zhang, Yongwei Wu, Youwei Zhuo, Xuehai Qian, Chengying Huan, and Kang Chen. 2018. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 608--621.

Digital Library

[71]

Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture. 544--557.

[72]

Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating Asynchronous Graph Processing via Forward and Backward Sweeping. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2018), 895--907.

[73]

Yu Zhang, Xiaofei Liao, Hai Jin, Ligang He, Bingsheng He, Haikun Liu, and Lin Gu. 2021. DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture. 371--384.

[74]

Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, and Haikun Liu. 2021. LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. In Proceedings of the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis. 45:1--45:14.

[75]

Ruohuang Zheng and Sreepathi Pai. 2021. Efficient Execution of Graph Algorithms on CPU with SIMD Extensions. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization. 262--276.

Digital Library

[76]

Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-Based Graph Processing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 712--725.

Digital Library

Cited By

Zhang CScheffler PBenz TPerotti MBenini L(2024)Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546797(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546797
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 23-Mar-2024
https://dl.acm.org/doi/10.1145/3641853
Yin CJiang JWang QMao ZJing N(2024)DeltaGNN: Accelerating Graph Neural Networks on Dynamic Graphs With Delta UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333515343:4(1163-1176)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3335153
Show More Cited By

Index Terms

TDGraph: a topology-driven accelerator for high-performance streaming graph processing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
      2. Special purpose systems
    2. Parallel architectures
      1. Multicore architectures

Recommendations

JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Graph Processing is at the core of many critical emerging workloads operating on unstructured data, including social network analysis, bioinformatics, and many others. Many applications operate on graphs that are constantly changing, i.e., new nodes ...
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Direct MPI Library for Intel Xeon Phi Co-Processors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

June 2022

1097 pages

ISBN:9781450386104

DOI:10.1145/3470496

General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Huawei Technologies Co., Ltd
National Natural Science Foundation of China

Conference

ISCA '22

Sponsor:

SIGARCH

ISCA '22: The 49th Annual International Symposium on Computer Architecture

June 18 - 22, 2022

New York, New York

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,653
Total Downloads

Downloads (Last 12 months)314
Downloads (Last 6 weeks)19

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang CScheffler PBenz TPerotti MBenini L(2024)Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546797(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546797
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 23-Mar-2024
https://dl.acm.org/doi/10.1145/3641853
Yin CJiang JWang QMao ZJing N(2024)DeltaGNN: Accelerating Graph Neural Networks on Dynamic Graphs With Delta UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333515343:4(1163-1176)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3335153
López-Paradís GHair IKannan SRabbat RMurray PLopes AZahedi RZuo WBalkind J(2024)The Case For Data Centre Hyperloops2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00026(230-244)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00026
Agostini NHaris JGibson PJayaweera MRubin NTumeo AAbellán JCano JKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based AcceleratorsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444801(143-157)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444801
Gong STian CYin QWang ZYu SZhang YYu WGeng LFu CYu GZhou J(2024)Ingress: an automated incremental graph processing systemThe VLDB Journal10.1007/s00778-024-00838-z33:3(781-806)Online publication date: 20-Feb-2024
https://doi.org/10.1007/s00778-024-00838-z
Yu HZhang YZhao JLiao YHuang ZHe DGu LJin HLiao XLiu HHe BYue J(2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
https://dl.acm.org/doi/10.1145/3617685
Yu SGong SZhang YYu WYin QTian CTao QYan YYu GZhou J(2023)Layph: Making Change Propagation Constraint in Incremental Graph Processing by Layering Graph2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00212(2766-2779)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00212
Jiang ZMao FGuo YLiu XLiu HLiao XJin HZhang W(2023)ACGraph: Accelerating Streaming Graph Processing via Dependence Hierarchy2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247904(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247904

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents