Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447786.3456230acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

DZiG: sparsity-aware incremental processing of streaming graphs

Published: 21 April 2021 Publication History

Abstract

State-of-the-art streaming graph processing systems that provide Bulk Synchronous Parallel (BSP) guarantees remain oblivious to the computation sparsity present in iterative graph algorithms, which severely limits their performance. In this paper we propose DZiG, a high-performance streaming graph processing system that retains efficiency in presence of sparse computations while still guaranteeing BSP semantics. At the heart of DZiG is: (1) a sparsity-aware incremental processing technique that expresses computations in a recursive manner to be able to safely identify and prune updates (hence retaining sparsity); (2) a simple change-driven programming model that naturally exposes sparsity in iterative computations; and, (3) an adaptive processing model that automatically changes the incremental computation strategy to limit its overheads when computations become very sparse. DZiG outperforms state-of-the-art streaming graph processing systems, and pushes the boundary of dependency-driven processing for streaming graphs to over 10 million simultaneous mutations, which is orders of magnitude higher compared to the state-of-the-art systems.

References

[1]
Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In Conference on Innovative Data Systems Research (CIDR '05), volume 5, pages 277--289, 2005.
[2]
Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '13), pages 577--588, 2013.
[3]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast Incremental and Personalized PageRank. Proceedings of the VLDB Endowment (PVLDB '10), 4(3):173--184, 2010.
[4]
Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Eddie Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, et al. Retrospective on Aurora. The VLDB Journal, 13(4):370--383, 2004.
[5]
Pramod Bhatotia, Umut A Acar, Flavio P Junqueira, and Rodrigo Rodrigues. Slider: Incremental Sliding Window Analytics. In Proceedings of the International Middleware Conference (Middleware '14), pages 61--72, 2014.
[6]
Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. Efficiently Updating Materialized Views. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '86), pages 61--71, 1986.
[7]
Paolo Boldi and Sebastiano Vigna. The WebGraph Framework I: Compression Techniques. In Proceedings of the International Conference on World Wide Web (WWW '04), pages 595--601, 2004.
[8]
Zhuhua Cai, Dionysios Logothetis, and Georgos Siganos. Facilitating Real-Time Graph Mining. In Proceedings of the International Workshop on Cloud Data Management (CloudDB '12), pages 1--8, 2012.
[9]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. Apache Flink™: Stream and Batch Processing in a Single Engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 38:28--38, 2015.
[10]
Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM '10), pages 10--17, 2010.
[11]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In Proceedings of the European Conference on Computer Systems (EuroSys '12), pages 85--98, 2012.
[12]
CilkPlus: https://www.cilkplus.org/.
[13]
Prasanna Desikan, Nishith Pathak, Jaideep Srivastava, and Vipin Kumar. Incremental Page Rank Computation on Evolving Graphs. In Proceedings of the International Conference on World Wide Web (WWW '05), pages 1094--1095, 2005.
[14]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. Low-Latency Graph Streaming Using Compressed Purely-Functional Trees. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '19), page 918--934, 2019.
[15]
David Ediger, Rob Mccoll, Jason Riedy, and David A. Bader. STINGER: High Performance Data Structure for Streaming Graphs. In IEEE Conference on High Performance Extreme Computing (HPEC '12), pages 1--5, 2012.
[16]
Friendster network dataset. http://konect.uni-koblenz.de/networks/friendster. KONECT, 2015.
[17]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI '12), pages 17--30, 2012.
[18]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI '14), pages 599--613, 2014.
[19]
Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '93), pages 157--166, 1993.
[20]
Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs. Proceedings of the VLDB Endowment (PVLDB '14), 7(13):1379--1380, 2014.
[21]
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In Proceedings of the European Conference on Computer Systems (EuroSys '14), pages 1:1--1:14, 2014.
[22]
Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. Time-Evolving Graph Processing at Scale. In Proceedings of the International Workshop on Graph Data Management Experiences and Systems (GRADES '16), page 5, 2016.
[23]
Sepandar Kamvar, Taher Haveliwala, and Gene Golub. Adaptive methods for the computation of PageRank. Linear Algebra and its Applications, 386:51--65, 2003.
[24]
U Kang, Duen Horng, and Christos Faloutsos. Inference of Beliefs on Billion-Scale Graphs. In Large-scale Data Mining: Theory and Applications (LDMTA '10), 2010.
[25]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. Graphflow: An active graph database. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '17), pages 1695--1698, 2017.
[26]
Pradeep Kumar and H. Howie Huang. G-Store: High-Performance Graph Store for Trillion-Edge Processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, 2016.
[27]
Pradeep Kumar and H. Howie Huang. GRAPHONE: A Data Store for Real-time Analytics on Evolving Graphs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST '19), pages 249--263, 2019.
[28]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, A Social Network or a News Media? In Proceedings of the International Conference on World Wide Web (WWW '10), pages 591--600, 2010.
[29]
Daan Leijen, Benjamin Zorn, and Leonardo de Moura. Mimalloc: Free List Sharding in Action. In Asian Symposium on Programming Languages and Systems (APLAS '19), pages 244--265, 2019.
[30]
Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, and Jingfang Xu. ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18), 2018.
[31]
Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In IEEE International Conference on Data Engineering (ICDE '15), pages 363--374, April 2015.
[32]
Mugilan Mariappan and Keval Vora. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the European Conference on Computer Systems (EuroSys '19), pages 25:1--25:16, 2019.
[33]
Frank McSherry. A Uniform Approach to Accelerated PageRank Computation. In Proceedings of the International Conference on World Wide Web (WWW '05), pages 575--582, 2005.
[34]
Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard. Differential Dataflow. In Conference on Innovative Data Systems Research (CIDR '13), 2013.
[35]
Youshan Miao, Wentao Han, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Enhong Chen, and Wenguang Chen. ImmortalGraph: A System for Storage and Analysis of Temporal Graphs. ACM Transactions on Storage (TOS), 11(3):14:1--14:34, 2015.
[36]
Svilen R. Mihaylov, Zachary G. Ives, and Sudipto Guha. REX: Recursive, Delta-Based Data-Centric Computation. Proceedings of the VLDB Endowment (PVLDB '12), 5(11):1280--1291, 2012.
[37]
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A Timely Dataflow System. In ACM Symposium on Operating Systems Principles (SOSP '13), pages 439--455, 2013.
[38]
Neo4J. www.neo4j.com.
[39]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. In ACM Symposium on Operating Systems Principles (SOSP '13), pages 456--471, 2013.
[40]
Kamal Nigam and Rayid Ghani. Analyzing the Effectiveness and Applicability of Co-training. In Proceedings of the International Conference on Information and Knowledge Management (CIKM '00), pages 86--93, 2000.
[41]
Vivek Nigam, Limin Jia, Boon Thau Loo, and Andre Scedrov. Maintaining Distributed Logic Programs Incrementally. In Computer Languages, Systems & Structures, pages 125--136, 2011.
[42]
OrientDB. https://orientdb.com/.
[43]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, 1998.
[44]
Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, and Joseph M. Hellerstein. Enabling Real-Time Querying of Live and Historical Stream Data. In International Conference on Scientific and Statistical Database Management (SSDBM '07), pages 28--, 2007.
[45]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In ACM Symposium on Operating Systems Principles (SOSP '15), pages 410--424, 2015.
[46]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented Sketch: Faster and More Accurate Stream Processing. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '16), pages 1449--1463, 2016.
[47]
Semih Salihoglu and Jennifer Widom. GPS: A Graph Processing System. In International Conference on Scientific and Statistical Database Management (SSDBM '13), pages 22:1--22:12, 2013.
[48]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par '16), pages 319--333, 2016.
[49]
Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. GraphJet: Real-time Content Recommendations at Twitter. Proceedings of the VLDB Endowment (PVLDB '16), 9(13):1281--1292, 2016.
[50]
Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '16), pages 417--430, 2016.
[51]
Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13), pages 135--146, 2013.
[52]
Toyotaro Suzumura, Shunsuke Nishii, and Masaru Ganse. Towards Large-scale Graph Stream Processing Platform. In Proceedings of the International Conference on World Wide Web (WWW '14), pages 1321--1326, 2014.
[53]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ Twitter. In Proceedings of the ACM International Conference on Management of Data (SIGMOD '14), pages 147--156, 2014.
[54]
Leslie G Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103--111, 1990.
[55]
Keval Vora. Lumos: Dependency-Driven Disk-based Graph Processing. In USENIX Annual Technical Conference (USENIX ATC '19), pages 429--442, 2019.
[56]
Keval Vora, Rajiv Gupta, and Guoqing Xu. Synergistic Analysis of Evolving Graphs. ACM Transactions on Architecture and Code Optimization (TACO '16), 13(4):32, 2016.
[57]
Keval Vora, Rajiv Gupta, and Guoqing Xu. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17), pages 237--251, 2017.
[58]
Keval Vora, Sai Charan Koduru, and Rajiv Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14), pages 861--878, 2014.
[59]
Keval Vora, Guoqing (Harry) Xu, and Rajiv Gupta. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In USENIX Annual Technical Conference (USENIX ATC '17), pages 507--522, 2016.
[60]
Wikipedia links, english network dataset. http://konect.uni-koblenz.de/networks/wikipedia_link_en. KONECT, 2017.
[61]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. GraM: Scaling Graph Computation to the Trillions. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '15), pages 408--421, 2015.
[62]
Yahoo! Webscope Program. http://webscope.sandbox.yahoo.com/.
[63]
Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '12), 2012.
[64]
Erik Zeitler and Tore Risch. Massive Scale-out of Expensive Continuous Queries. In Proceedings of the VLDB Endowment (PVLDB '11), pages 1181--1188, 2011.
[65]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In International Conference on Algorithmic Applications in Management (AAIM '08), pages 337--348, 2008.
[66]
Xiaojin Zhu and Zoubin Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. In CMU Technical Report CALD-02-107, 2002.
[67]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In USENIX Symposium on Operating Systems Design and Implementation (OSDI '16), pages 301--316, 2016.
[68]
Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen. LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proceedings of the VLDB Endowment (PVLDB '20), 13(7):1020--1034, 2020.

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • (2024)TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347157435:12(2527-2544)Online publication date: Dec-2024
  • (2024)DeltaGNN: Accelerating Graph Neural Networks on Dynamic Graphs With Delta UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333515343:4(1163-1176)Online publication date: Apr-2024
  • (2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
  • (2024)GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and AnalysisInternational Journal of Parallel Programming10.1007/s10766-024-00768-x52:3(147-170)Online publication date: 18-May-2024
  • (2024)Ingress: an automated incremental graph processing systemThe VLDB Journal10.1007/s00778-024-00838-z33:3(781-806)Online publication date: 20-Feb-2024
  • (2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
  • (2023)Accelerating Incremental Graph Processing by Layering GraphProceedings of the ACM Turing Award Celebration Conference - China 202310.1145/3603165.3607457(160-161)Online publication date: 25-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media