Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Towards distributed bitruss decomposition on bipartite graphs

Published: 01 May 2022 Publication History

Abstract

Mining cohesive subgraphs on bipartite graphs is an important task. The k-bitruss is one of many popular cohesive subgraph models, which is the maximal subgraph where each edge is contained in at least k butterflies. The bitruss decomposition problem is to find all k-bitrusses for k ≥ 0. Dealing with large graphs is often beyond the capability of a single machine due to its limited memory and computational power, leading to a need for efficiently processing large graphs in a distributed environment. However, all current solutions are for a single machine and a centralized environment, where processors can access the graph or auxiliary indexes randomly and globally. It is difficult to directly deploy such algorithms on a shared-nothing model. In this paper, we propose distributed algorithms for bitruss decomposition. We first propose SC-HBD as the baseline, which uses H-function to define bitruss numbers and computes them iteratively to a fix point in parallel. We then introduce a subgraph-centric peeling method SC-PBD, which peels edges in batches over different butterfly complete subgraphs. We then introduce local indexes on each fragment, study the butterfly-aware edge partition problem including its hardness, and propose an effective partitioner. Finally we present the bitruss butterfly-complete subgraph concept, and divide and conquer DC-BD method with optimization strategies. Extensive experiments show the proposed methods solve graphs with 30 trillion butterflies in 2.5 hours, while existing parallel methods under shared-memory model fail to scale to such large graphs.

References

[1]
Aman Abidi, Rui Zhou, Lu Chen, and Chengfei Liu. 2021. Pivot-based maximal biclique enumeration. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3558--3564.
[2]
Amine Abou-Rjeili and George Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, 10--pp.
[3]
Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163--166.
[4]
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd international conference on World Wide Web. 119--130.
[5]
Charles-Edmond Bichot and Patrick Siarry. 2013. Graph partitioning. John Wiley & Sons.
[6]
Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced graph edge partition. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1456--1465.
[7]
Aydin Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. 2016. Recent advances in graph partitioning. Algorithm engineering (2016), 117--158.
[8]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC) 5, 3 (2019), 1--39.
[9]
Dong Dai, Wei Zhang, and Yong Chen. 2017. IOGP: An incremental online graph partitioning algorithm for distributed graph databases. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 219--230.
[10]
Michael L Fredman and Robert Endre Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM (JACM) 34, 3 (1987), 596--615.
[11]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311--1322.
[12]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.
[13]
George Karypis and Vipin Kumar. 1999. Parallel multilevel series k-way partitioning scheme for irregular graphs. Siam Review 41, 2 (1999), 278--300.
[14]
Mijung Kim and K Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data & Knowledge Engineering 72 (2012), 285--303.
[15]
Kartik Lakhotia, Rajgopal Kannan, and Viktor Prasanna. 2021. Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery. arXiv preprint arXiv:2110.12511 (2021).
[16]
Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, and Lizy K John. 2015. Data partitioning strategies for graph workloads on heterogeneous clusters. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[17]
Dongsheng Li, Yiming Zhang, Jinyan Wang, and Kian-Lee Tan. 2019. TopoX: Topology refactorization for efficient graph partitioning and processing. Proceedings of the VLDB Endowment 12, 8 (2019), 891--905.
[18]
Boge Liu, Long Yuan, Xuemin Lin, Lu Qin, Wenjie Zhang, and Jingren Zhou. 2019. Efficient (α, β)-core computation: An index-based approach. In The World Wide Web Conference. 1130--1141.
[19]
Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H Eugene Stanley. 2016. The H-index of a network node and its relation to degree and coreness. Nature communications 7, 1 (2016), 1--7.
[20]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.
[21]
Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. 2012. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems 24, 2 (2012), 288--300.
[22]
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. 2015. Hdrf: Stream-based partitioning for power-law graphs. In Proceedings of the 24th ACM international on conference on information and knowledge management. 243--252.
[23]
Ahmet Erdem Sariyüce and Ali Pinar. 2018. Peeling Bipartite Networks for Dense Subgraph Discovery. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, Yi Chang, Chengxiang Zhai, Yan Liu, and Yoelle Maarek (Eds.). ACM, 504--512.
[24]
Ahmet Erdem Sariyüce, C. Seshadhri, and Ali Pinar. 2018. Local Algorithms for Hierarchical Dense Subgraph Discovery. Proc. VLDB Endow. 12, 1 (2018), 43--56.
[25]
Yingxia Shao, Lei Chen, and Bin Cui. 2014. Efficient cohesive subgraphs detection in parallel. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu (Eds.). ACM, 613--624.
[26]
Jessica Shi and Julian Shun. 2020. Parallel Algorithms for Butterfly Computations. In 1st Symposium on Algorithmic Principles of Computer Systems, APOCS@SODA 2020, Salt Lake City, UT, USA, January 8, 2020. SIAM, 16--30.
[27]
Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47, 1 (2014), 1--45.
[28]
Kelvin Sim, Jinyan Li, Vivekanand Gopalkrishnan, and Guimei Liu. 2009. Mining maximal quasi-bicliques: Novel algorithm and applications in the stock market and protein networks. Statistical Analysis and Data Mining: The ASA Data Science Journal 2, 4 (2009), 255--273.
[29]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1222--1230.
[30]
Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, Susan Mehringer, Eric Wernert, H Tufo, D Panda, et al. 2017. Stampede 2: The evolution of an xsede supercomputer. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. 1--8.
[31]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining. 333--342.
[32]
Charalampos E Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. 2017. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web. 1451--1460.
[33]
Shiv Verma, Luke M Leslie, Yosub Shin, and Indranil Gupta. 2017. An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing. Proceedings of the VLDB Endowment 10, 5 (2017).
[34]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2020. Efficient Bitruss Decomposition for Large-scale Bipartite Graphs. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020. IEEE, 661--672.
[35]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2021. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. The VLDB Journal (2021), 1--24.
[36]
Yue Wang, Ruiqi Xu, Xun Jian, Alexander Zhou, and Lei Chen. 2022. Towards Distributed Bitruss Decomposition on Bipartite Graphs-Full Version. https://keithyue.github.io/files/vldb22-DBitruss.pdf [Online; accessed 5-May-2022]
[37]
Wikipedia contributors. 2021. Facebook --- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Facebook&oldid=1010764584 [Online; accessed 5-May-2022].
[38]
Wikipedia contributors. 2021. Singles' Day --- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Singles%27_Day&oldid=1007941275 [Online; accessed 5-May-2022].
[39]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 301--316.
[40]
Zhaonian Zou. 2016. Bitruss Decomposition of Bipartite Graphs. In Database Systems for Advanced Applications - 21st International Conference, DASFAA 2016, Dallas, TX, USA, April 16-19, 2016, Proceedings, Part II (Lecture Notes in Computer Science), Shamkant B. Navathe, Weili Wu, Shashi Shekhar, Xiaoyong Du, X. Sean Wang, and Hui Xiong (Eds.), Vol. 9643. Springer, 218--233.

Cited By

View all
  • (2024)Efficient Index for Temporal Core Queries over Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368196517:11(2813-2825)Online publication date: 1-Jul-2024
  • (2023)Efficient Core Maintenance in Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36173291:3(1-26)Online publication date: 13-Nov-2023
  • (2023)Butterfly counting and bitruss decomposition on uncertain bipartite graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00782-432:5(1013-1036)Online publication date: 1-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 9
May 2022
239 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2022
Published in PVLDB Volume 15, Issue 9

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)10
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Index for Temporal Core Queries over Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368196517:11(2813-2825)Online publication date: 1-Jul-2024
  • (2023)Efficient Core Maintenance in Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36173291:3(1-26)Online publication date: 13-Nov-2023
  • (2023)Butterfly counting and bitruss decomposition on uncertain bipartite graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00782-432:5(1013-1036)Online publication date: 1-Sep-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media