Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

Published: 01 July 2021 Publication History

Abstract

To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems—GraphS and GraphSR—for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.

References

[1]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). IEEE, 105–117.
[2]
Tero Aittokallio and Benno Schwikowski. 2006. Graph-based methods for analysing networks in cell biology. Brief. Bioinform. 7, 3 (2006), 243–255.
[3]
Andrei Alexandrescu and Katrin Kirchhoff. 2007. Data-driven graph construction for semi-supervised graph-based learning in NLP. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 204–211.
[4]
ARM. 2009. ARM Cortex-A5 Processor. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a5.php.
[5]
Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, 373–386.
[6]
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner et al. 2018. Relational inductive biases, deep learning, and graph networks. Retrieved from https://arXiv:1806.01261.
[7]
Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, Article 12, 10 pages. Retrieved from http://dl.acm.org/citation.cfm?id=2388996.2389013.
[8]
Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. Retrieved from https://arXiv:cs.DC/1508.03619.
[9]
Scott Beamer, Aydin Buluc, Krste Asanovic, and David Patterson. 2013. Distributed memory breadth-first search revisited: Enabling bottom-up search. In Proceeding sof the IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum. IEEE, 1618–1627.
[10]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 587–596.
[11]
Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web. ACM, 595–602.
[12]
Aydin Buluc, Scott Beamer, Kamesh Madduri, Krste Asanovic, and David Patterson. 2017. Distributed-memory breadth-first search on massive graphs. Retrieved from https://arXiv:1705.04590.
[13]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 442–446.
[14]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15). ACM, New York, NY, Article 1, 15 pages.
[15]
Thayne Coffman, Seth Greenblatt, and Sherry Marcus. 2004. Graph-based technologies for intelligence analysis. Commun. ACM 47, 3 (Mar. 2004), 45–47.
[16]
Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification Version 2.1. Technical Report.
[17]
Guohao Dai, Tianhao Huang, Yuze Chi, Jishen Zhao, Guangyu Sun, Yongpan Liu, Yu Wang, Yuan Xie, and Huazhong Yang. 2018. Graphh: A processing-in-memory architecture for large-scale graph processing. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 34, 4 (2018), 640–653.
[18]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18). ACM, New York, NY, 752–768.
[19]
Anton J. Enright and Christos A. Ouzounis. 2001. BioLayout—An automatic graph layout algorithm for similarity visualization. Bioinformatics 17, 9 (2001), 853–854.
[20]
Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). ACM, New York, NY, 495–510.
[21]
Francois Fouss, Alain Pirotte, Jean-Michel Renders, and Marco Saerens. 2007. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 355–369.
[22]
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 113–124.
[23]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Andrew Lenharth, and Keshav Pingali. 2018. Abelian: A compiler for graph analytics on distributed, heterogeneous platforms. In Proceedings of the European Conference on Parallel Processing. Springer, 249–264.
[24]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 17–30. Retrieved from http://dl.acm.org/citation.cfm?id=2387880.2387883.
[25]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599–613. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685096.
[26]
Amit Goyal, Hal Daumé III, and Raul Guerra. 2012. Fast large-scale approximate graph construction for nlp. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 1069–1080.
[27]
Graph500. 2010. Graph 500 Benchmarks. Retrieved from http://www.graph500.org.
[28]
Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 855–864.
[29]
Ziyu Guan, Jiajun Bu, Qiaozhu Mei, Chun Chen, and Can Wang. 2009. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 540–547.
[30]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–13.
[31]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for easy and efficient graph analysis. SIGPLAN Not. 47, 4 (Mar. 2012), 349–362.
[32]
Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. 2015. PGX.D: A fast distributed graph processing engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 58, 12 pages.
[33]
Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’13). ACM, New York, NY, Article 92, 11 pages.
[34]
Imranul Hoque and Indranil Gupta. 2013. LFGraph: Simple and fast distributed graph analytics. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS’13). ACM, New York, NY, Article 9, 17 pages.
[35]
M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer, and D. Sanchez. 2016. Unlocking ordered parallelism with the swarm architecture. IEEE Micro 36, 3 (2016), 105–117.
[36]
Andrew B. Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi. 2012. ORION 2.0: A power-area simulator for interconnection networks. IEEE Trans. Very Large Scale Integr. Syst. 20, 1 (Jan. 2012), 191–196.
[37]
Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric system interconnect design with hybrid memory cubes. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, 145–156.
[38]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 591–600.
[39]
Nicolas Le Novere, Michael Hucka, Huaiyu Mi, Stuart Moodie, Falk Schreiber, Anatoly Sorokin, Emek Demir, Katja Wegner, Mirit I. Aladjem, Sarala M. Wimalaratne, et al. 2009. The systems biology graphical notation. Nature Biotechnology 27, 8 (2009), 735–741.
[40]
Dong Uk Lee, Kyung Whan Kim, Kwan Weon Kim, Hongjung Kim, Ju Young Kim, Young Jun Park, Jae Hwan Kim, Dae Suk Kim, Heat Bit Park, Jin Wook Shin, et al. 2014. 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). IEEE, 432–433.
[41]
Jure Leskovec and Andrej Krevl. 2014. friendster. Retrieved from https://snap.stanford.edu/data/com-Friendster.html.
[42]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1 (2009), 29–123.
[43]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469–480.
[44]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (Apr. 2012), 716–727.
[45]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 135–146.
[46]
Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proceedings of the 14th EuroSys Conference 2019 (EuroSys’19). ACM, New York, NY, Article 25, 16 pages.
[47]
David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (1983), 417–427.
[48]
Julian McAuley and Jure Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). Curran Associates, 539–547. Retrieved from http://dl.acm.org/citation.cfm?id=2999134.2999195.
[49]
Frank McSherry. 2017. COST in the land of databases. Retrieved from https://github.com/frankmcsherry/blog/blob/master/posts/2017-09-23.md.
[50]
Frank McSherry, Michael Isard, and Derek G Murray. 2015. Scalability! But at what {COST}? In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS’15).
[51]
Batul J. Mirza, Benjamin J. Keller, and Naren Ramakrishnan. 2003. Studying recommendation algorithms by graph analysis. J. Intell. Info. Syst. 20, 2 (2003), 131–160.
[52]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18). IEEE, 1–14.
[53]
Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 457–468.
[54]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456–471.
[55]
The University of Texas at Austin. 2019. Texas Advanced Computing Center (TACC). Retrieved from https://www.tacc.utexas.edu/.
[56]
Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy efficient architecture for graph analytics accelerators. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 166–177.
[57]
Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’16). ACM, New York, NY, 1–19.
[58]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 701–710.
[59]
The Lemur Project. 2013. The ClueWeb12 Dataset. Retrieved from http://lemurproject.org/clueweb12/.
[60]
Meikang Qiu, Lei Zhang, Zhong Ming, Zhi Chen, Xiao Qin, and Laurence T. Yang. 2013. Security-aware optimization for ubiquitous computing systems with SEAT graph approach. J. Comput. Syst. Sci. 79, 5 (2013), 518–529.
[61]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Association for Computing Machinery, New York, NY, 472–488.
[62]
Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM’13). ACM, New York, NY, Article 22, 12 pages.
[63]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 475–486.
[64]
Satu Elisa Schaeffer. 2007. Graph clustering. Comput. Sci. Rev. 1, 1 (2007), 27–64.
[65]
Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6, 14 (Sep. 2013), 1906–1917.
[66]
Manjunath Shevgoor, Jung-Sik Kim, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, and Aniruddha N. Udipi. 2013. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 198–209.
[67]
Julian Shun. 2019. K-Core. Retrieved from http://jshun.github.io/ligra/docs/tutorial_kcore.html.
[68]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 135–146.
[69]
Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis, and Michael W. Mahoney. 2016. Parallel local graph clustering. Proc. VLDB Endow. 9, 12 (Aug. 2016), 1041–1052.
[70]
AM Stankovic and MS Calovic. 1989. Graph oriented algorithm for the steady-state security enhancement in distribution networks. IEEE Trans. Power Delivery 4, 1 (1989), 539–544.
[71]
Lei Tang and Huan Liu. 2010. Graph mining applications to social network analysis. In Managing and Mining Graph Data. Springer, 487–513.
[72]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Sotware-defined cache hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 652–665.
[73]
Keval Vora. 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’19). USENIX Association, USA, 429–442.
[74]
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). Association for Computing Machinery, New York, NY, 237–251.
[75]
Keval Vora, Sai Charan Koduru, and Rajiv Gupta. 2014. ASPIRE: Exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA’14). ACM, New York, NY, 861–878.
[76]
Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui, Beixing Deng, and Xing Li. 2011. Understanding graph sampling algorithms for social network analysis. In Proceedings of the 31st International Conference on Distributed Computing Systems Workshops. IEEE, 123–128.
[77]
English Wikipedia. 2013. enwiki-2013. Retrieved from http://law.di.unimi.it/webdata/enwiki-2013/.
[78]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling graph computation to the trillions. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15). ACM, New York, NY, 408–421.
[79]
Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, Ming Wu, Wei Li, and Lidong Zhou. 2017. Tux2: Distributed graph computation for machine learning. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, Berkeley, CA, 669–682.
[80]
Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). Association for Computing Machinery, New York, NY, 247–260.
[81]
Torsten Zesch and Iryna Gurevych. 2007. Analysis of the Wikipedia category graph for NLP applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT’07). 1–8.
[82]
Mingxing Zhang, Yongwei Wu, Kang Chen, Xuehai Qian, Xue Li, and Weimin Zheng. 2016. Exploring the hidden dimension in graph processing. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, 285–300. Retrieved from http://dl.acm.org/citation.cfm?id=3026877.3026900.
[83]
Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 544–557.
[84]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article 121 (Oct. 2018), 30 pages.
[85]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, 301–316. http://dl.acm.org/citation.cfm?id=3026877.3026901
[86]
Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-based graph processing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). ACM, New York, NY, 712–725.

Cited By

View all
  • (2024)Benchmarking Big Data Systems: Performance and Decision-Making Implications in Emerging TechnologiesTechnologies10.3390/technologies1211021712:11(217)Online publication date: 3-Nov-2024
  • (2024)How to Fit the SCC Algorithm Efficiently into Distributed Graph Iterative Computation2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00043(254-263)Online publication date: 2-Jul-2024
  • (2022)Accelerating Neural Network Training with Processing-in-Memory GPU2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00051(414-421)Online publication date: May-2022

Index Terms

  1. Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 37, Issue 1-4
    November 2019
    177 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/3446674
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2021
    Accepted: 01 March 2021
    Revised: 01 December 2020
    Received: 01 July 2020
    Published in TOCS Volume 37, Issue 1-4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Graph analytics
    2. big data
    3. compilers
    4. graph algorithms

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)418
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Benchmarking Big Data Systems: Performance and Decision-Making Implications in Emerging TechnologiesTechnologies10.3390/technologies1211021712:11(217)Online publication date: 3-Nov-2024
    • (2024)How to Fit the SCC Algorithm Efficiently into Distributed Graph Iterative Computation2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00043(254-263)Online publication date: 2-Jul-2024
    • (2022)Accelerating Neural Network Training with Processing-in-Memory GPU2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00051(414-421)Online publication date: May-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media