Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3358807.3358843guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

SIMD-X: programming and processing of graph algorithms on GPUs

Published: 10 July 2019 Publication History

Abstract

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However, graph algorithms such as breadth-first search and k-core, often fail to take full advantage of GPUs, due to irregularity in memory access and control flow. To address this challenge, we have developed SIMD-X, for programming and processing of single instruction multiple, complex, data on GPUs. Specifically, the new Active-Compute-Combine (ACC) model not only provides ease of programming to programmers, but more importantly creates opportunities for system-level optimizations. To this end, SIMD-X utilizes just-in-time task management which filters out inactive vertices at runtime and intelligently maps various tasks to different amount of GPU cores in pursuit of workload balancing. In addition, SIMD-X leverages push-pull based kernel fusion that, with the help of a new deadlock-free global barrier, reduces a large number of computation kernels to very few. Using SIMD-X, a user can program a graph algorithm in tens of lines of code, while achieving 3×, 6×, 24×, 3× speedup over Gunrock, Galois, CuSha, and Ligra, respectively.

References

[1]
Nvidia cuda c programming guide. NVIDIA Corporation, 2011.
[2]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, volume 16, pages 265-283, 2016.
[3]
Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Xuehai Qian, Kang Chen, and Weimin Zheng. Squeezing out all the value of loaded data: An out-ofcore graph processing system with reduced disk i/o. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 125-137, 2017.
[4]
S Beamer, K Asanovic, and D Patterson. Direction-optimizing Breadth-First Search. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1-10. IEEE, 2012.
[5]
Bibek Bhattarai, Hang Liu, and H Howie Huang. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, volume 19, 2019.
[6]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. R-MAT: A Recursive Model for Graph Mining. In SDM, 2004.
[7]
Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. Computation and communication efficient graph processing with distributed immutable view. In Proceedings of the 23rd international symposium on Highperformance parallel and distributed computing, pages 215-226. ACM, 2014.
[8]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.
[9]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems, pages 85-98. ACM, 2012.
[10]
Sharan Chetlur, CliffWoolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014.
[11]
Andrew Davidson, Sean Baxter, Michael Garland, and John D Owens. Work-efficient parallel GPU methods for single-source shortest paths. In 28th International Symposium on Parallel & Distributed Processing (IPDPS), pages 349-359. IEEE, 2014.
[12]
European Open Stream Map. http://download.geofabrik.de/europe-latest.osm.bz2,.
[13]
Eric Finnerty, Zachary Sherer, Hang Liu, and Yan Luo. Dr. BFS: Data Centric Breadth-First Search on FPGAs. In Proceedings of the 56th Annual Design Automation Conference 2019, page 208. ACM, 2019.
[14]
Anil Gaihre, Yan Luo, and Hang Liu. Do Bitcoin Users Really Care About Anonymity? An Analysis of the Bitcoin Transaction Graph. In 2018 IEEE International Conference on Big Data (Big Data), pages 1198-1207. IEEE, 2018.
[15]
Anil Gaihre, Zhenlin Wu, Fan Yao, and Hang Liu. XBFS: eXploring Runtime Optimizations for Breadth-First Search on GPUs. In Proceedings of the international symposium on High-performance parallel and distributed computing (HPDC). ACM, 2019.
[16]
Benedict R Gaster and Lee Howes. Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? Computer, 2012.
[17]
Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Practical Recommendations on Crawling Online Social Networks. IEEE Journal on Selected Areas in Communications, 2011.
[18]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, volume 12, page 2, 2012.
[19]
GTgraph: A suite of synthetic random graph generators. http://www.cse.psu.edu/~madduri/software/GTgraph/.
[20]
Kshitij Gupta, Jeff A Stuart, and John D Owens. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing (InPar), 2012, pages 1-14. IEEE, 2012.
[21]
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems, page 1. ACM, 2014.
[22]
Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of international conference on Knowledge discovery and data mining (SIGKDD), pages 77-85, 2013.
[23]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: a DSL for easy and efficient graph analysis. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), volume 40, pages 349-362, 2012.
[24]
Derek R Hower, Blake A Hechtman, Bradford M Beckmann, Benedict R Gaster, Mark D Hill, Steven K Reinhardt, and David A Wood. Heterogeneous-race-free memory models. ACM SIGARCH Computer Architecture News, 42(1):427-440, 2014.
[25]
Yang Hu, Hang Liu, and H Howie Huang. High-Performance Triangle Counting on GPUs. In 2018 IEEE High Performance extreme Computing Conference (HPEC), pages 1-5. IEEE, 2018.
[26]
Yang Hu, Hang Liu, and H Howie Huang. Tricore: Parallel triangle counting on gpus. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 171-182. IEEE, 2018.
[27]
H Howie Huang and Hang Liu. Big data machine learning and graph analytics: Current state and future challenges. In 2014 IEEE International Conference on Big Data (Big Data), pages 16-17. IEEE, 2014.
[28]
Yuede Ji, Hang Liu, and H Howie Huang. iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 731- 742. IEEE, 2018.
[29]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675-678. ACM, 2014.
[30]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. A Distributed Multi-GPU System for Fast Graph Processing. Proceedings of the VLDB Endowment, 11(3):297-310, 2017.
[31]
Farzad Khorasani. High Performance Vertex-Centric Graph Analytics on GPUs. PhD Dissertation: University of California, Riverside, 2016.
[32]
Farzad Khorasani, Rajiv Gupta, and Laxmi N Bhuyan. Scalable simd-efficient graph processing on gpus. In Parallel Architecture and Compilation (PACT), 2015 International Conference on, pages 39-50. IEEE, 2015.
[33]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N Bhuyan. CuSha: vertex-centric graph processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 239-252. ACM, 2014.
[34]
Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proceedings of the 2016 International Conference on Management of Data, pages 447-461. ACM, 2016.
[35]
Pradeep Kumar and H Howie Huang. G-store: high-performance graph store for trillion-edge processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 71. IEEE Press, 2016.
[36]
Pradeep Kumar and H Howie Huang. Falcon: scaling IO performance in multi-SSD volumes. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, pages 41-53. USENIX Association, 2017.
[37]
Pradeep Kumar and H Howie Huang. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In 17th USENIX Conference on File and Storage Technologies (FAST 19), pages 249-263, 2019.
[38]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW, 2010.
[39]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. GraphChi: large-scale graph computation on just a PC. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, pages 31-46. USENIX Association, 2012.
[40]
Hang Liu and H Howie Huang. Graphene: Fine-Grained IO Management for Graph Computing. In 15th USENIX Conference on File and Storage Technologies (FAST 17), pages 285-300. USENIX Association.
[41]
Hang Liu and H. Howie Huang. Enterprise: Breadth-First Graph Traversal on GPU Servers. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.
[42]
Hang Liu and H. Howie Huang. Graphene: Fine-Grained IO Management for Graph Computing. In Proceedings of the 15th USENIX Conference on File and Storage Technologies. USENIX Association, 2017.
[43]
Hang Liu, H Howie Huang, and Yang Hu. iBFS: Concurrent Breadth-First Search on GPUs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD), 2016.
[44]
Weifeng Liu and Brian Vinter. CSR5: An efficient storage format for cross-platform sparse matrixvector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing, pages 339-350. ACM, 2015.
[45]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. Graphlab: A new framework for parallel machine learning. 2010.
[46]
Lijuan Luo, Martin Wong, and Wen-mei Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th design automation conference, pages 52-55. ACM, 2010.
[47]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of the Twelfth European Conference on Computer Systems, pages 527- 543. ACM, 2017.
[48]
Sepideh Maleki, Annie Yang, and Martin Burtscher. Higher-order and tuple-based massively-parallel prefix sums, volume 51. ACM, 2016.
[49]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for largescale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135-146. ACM, 2010.
[50]
Duane Merrill, Michael Garland, and Andrew Grimshaw. Scalable GPU graph traversal. In PPoPP, 2012.
[51]
Ulrich Meyer and Peter Sanders. D-Stepping: A Parallel Single Source Shortest Path Algorithm. Algorithms-- ESA'98, 1998.
[52]
Youshan Miao, Wentao Han, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Enhong Chen, and Wenguang Chen. Immortalgraph: A system for storage and analysis of temporal graphs. ACM Transactions on Storage (TOS), 2015.
[53]
Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. Distributed k-Core Decomposition. IEEE Transactions on Parallel and Distributed Systems, 2013.
[54]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), pages 456-471. ACM, 2013.
[55]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 622-636. ACM, 2018.
[56]
Nvidia. NVIDIA Kepler GK110 Architecture Whitepaper. 2013.
[57]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab, 1999.
[58]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. 2017.
[59]
Vijayan Prabhakaran, Ming Wu, Xuetian Weng, Frank McSherry, Lidong Zhou, and Maya Haridasan. Managing large graphs on multi-cores with graph awareness. In Proceedings of USENIX conference on Annual Technical Conference. USENIX Association, 2012.
[60]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. Chaos: Scaleout Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 410-424. ACM, 2015.
[61]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 472-488. ACM, 2013.
[62]
Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, and Karsten Schwan. GraphReduce: processing large-scale graphs on accelerator-based systems. In High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conference for, pages 1-12. IEEE, 2015.
[63]
Bin Shao, Haixun Wang, and Yatao Li. Trinity: A distributed graph engine on a memory cloud. In Proceedings of International Conference on Management of Data (SIGMOD), pages 505-516, 2013.
[64]
Zachary Sherer, Eric Finnerty, Yan Luo, and Hang Liu. Software Hardware Co-Optimized BFS on FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 190-190. ACM, 2019.
[65]
Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 16), pages 317-332.
[66]
Julian Shun and Guy E Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPoPP, 2013.
[67]
George M Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems. In International Parallel and Distributed Processing Symposium (IPDPS), 2014.
[68]
SNAP: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/.
[69]
Tyler Sorensen, Alastair F Donaldson, Mark Batty, Ganesh Gopalakrishnan, and Zvonimir Rakamarić. Portable inter-workgroup barrier synchronisation for GPUs. In ACM SIGPLAN Notices, volume 51, pages 39-58. ACM, 2016.
[70]
The University of Florida: Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices/.
[71]
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. From Think Like a Vertex to Think Like a Graph. Proceedings of the VLDB Endowment, 2013.
[72]
Stanley Tzeng, Anjul Patney, and John D Owens. Task Management for Irregular-Parallel Workloads on the GPU. In Proceedings of the Conference on High Performance Graphics. Eurographics Association, 2010.
[73]
Mohamed Wahib and Naoya Maruyama. Scalable Kernel Fusion for Memory-bound GPU applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2014.
[74]
Kai Wang and Zhendong Su. GraphQ: Graph Query Processing with Abstraction Refinement-Scalable and Programmable Analytics over Very Large Graphs on a Single PC.
[75]
Siyuan Wang, Chang Lou Lou, Rong Chen, and Haibo Chen. Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), Boston, MA, 2018. USENIX Association.
[76]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 265-266. ACM, 2015.
[77]
Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Muhammad Osama, Chenshan Yuan, Weitang Liu, Andy T Riffel, et al. Gunrock: GPU Graph Analytics. arXiv preprint arXiv:1701.01170, 2017.
[78]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. G ra M: scaling graph computation to the trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 408- 421. ACM, 2015.
[79]
Shucai Xiao and Wu-chun Feng. Inter-block GPU communication via fast barrier synchronization. In International Symposium on Parallel & Distributed Processing (IPDPS), pages 1-12, 2010.
[80]
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. Sync or async: Time to fuse for distributed graph-parallel computation. In ACM SIGPLAN Notices (PPoPP), volume 50, pages 194-204. ACM, 2015.
[81]
Da Yan and Hang Liu. Parallel graph processing. Encyclopedia of Big Data Technologies, pages 1- 8, 2018.
[82]
Shengen Yan, Guoping Long, and Yunquan Zhang. StreamScan: fast scan algorithms for GPUs without global barrier synchronization. In PPoPP, 2013.
[83]
Jialing Zhang, Xiaoyan Zhuo, Aekyeung Moon, Hang Liu, and Seung Woo Son. Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/ Restart. In IEEE Symposium on Mass Storage Systems and Technologies, 2019.
[84]
Kaiyuan Zhang, Rong Chen, and Haibo Chen. NUMA-aware graph-structured analytics. ACM SIGPLAN Notices (PPoPP), 50(8):183-193, 2015.
[85]
Mingxing Zhang, Yongwei Wu, Kang Chen, Xuehai Qian, Xue Li, and Weimin Zheng. Exploring the Hidden Dimension in Graph Processing. In OSDI, pages 285-300, 2016.
[86]
Mingxing Zhang, Yongwei Wu, Youwei Zhuo, Xuehai Qian, Chengying Huan, and Kang Chen. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 608-621. ACM, 2018.
[87]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation. IEEE Transactions on Parallel and Distributed Systems, 2014.
[88]
Yunhao Zhang, Rong Chen, and Haibo Chen. Submillisecond Stateful Stream Querying over Fastevolving Linked Data. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP), pages 614-630. ACM, 2017.
[89]
Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data), pages 293-302. IEEE, 2017.
[90]
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. FlashGraph: processing billion-node graphs on an array of commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, pages 45-58. USENIX Association, 2015.
[91]
Jianlong Zhong and Bingsheng He. Medusa: Simplified graph processing on gpus. Parallel and Distributed Systems, IEEE Transactions on, 25(6):1543-1552, 2014.
[92]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 375-386. USENIX Association, 2015.

Cited By

View all
  • (2021)Towards Next-Generation Cybersecurity with Graph AIACM SIGOPS Operating Systems Review10.1145/3469379.346938655:1(61-67)Online publication date: 6-Jun-2021
  • (2021)Compiling graph applications for GPUs with graphitProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370321(248-261)Online publication date: 27-Feb-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
USENIX ATC '19: Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference
July 2019
1076 pages
ISBN:9781939133038

Sponsors

  • VMware
  • Nutanix: Nutanix
  • NSF
  • Facebook: Facebook
  • ORACLE: ORACLE

Publisher

USENIX Association

United States

Publication History

Published: 10 July 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Towards Next-Generation Cybersecurity with Graph AIACM SIGOPS Operating Systems Review10.1145/3469379.346938655:1(61-67)Online publication date: 6-Jun-2021
  • (2021)Compiling graph applications for GPUs with graphitProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370321(248-261)Online publication date: 27-Feb-2021

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media