Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery

Published: 20 June 2023 Publication History

Abstract

Wing and Tip decomposition are motif-based analytics for bipartite graphs that construct a hierarchy of butterfly (2,2-biclique) dense edge and vertex induced subgraphs, respectively. They have applications in several domains, including e-commerce, recommendation systems, document analysis, and others.
Existing decomposition algorithms use a bottom-up approach that constructs the hierarchy in an increasing order of the subgraph density. They iteratively select the edges or vertices with minimum butterfly count peel, i.e., remove them along with their butterflies. The amount of butterflies in real-world bipartite graphs makes bottom-up peeling computationally demanding. Furthermore, the strict order of peeling entities results in a large number of sequentially dependent iterations. Consequently, parallel algorithms based on bottom up peeling incur heavy synchronization and poor scalability.
In this article, we propose a novel Parallel Bipartite Network peelinG (PBNG) framework that adopts a two-phased peeling approach to relax the order of peeling, and in turn, dramatically reduce synchronization. The first phase divides the decomposition hierarchy into few partitions and requires little synchronization. The second phase concurrently processes all partitions to generate individual levels of the hierarchy and requires no global synchronization. The two-phased peeling further enables batching optimizations that dramatically improve the computational efficiency of PBNG.
We empirically evaluate PBNG using several real-world bipartite graphs and demonstrate radical improvements over the existing approaches. On a shared-memory 36 core server, PBNG achieves up to 19.7× self-relative parallel speedup. Compared to the state-of-the-art parallel framework ParButterfly, PBNG reduces synchronization by up to 15,260× and execution time by up to 295×. Furthermore, it achieves up to 38.5× speedup over state-of-the-art algorithms specifically tuned for wing decomposition. Our source code is made available at https://github.com/kartiklakhotia/RECEIPT.

References

[1]
Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, and Nick Duffield. 2015. Efficient graphlet counting for large networks. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 1–10.
[2]
Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick G. Duffield, and Theodore L. Willke. 2017. Graphlet decomposition: Framework, algorithms, and applications. Knowl. Inf. Syst. 50, 3 (March2017), 689–722.
[3]
Sinan G. Aksoy, Tamara G. Kolda, and Ali Pinar. 2017. Measuring and modeling bipartite graphs with community structure. J. Complex Netw. 5, 4 (2017), 581–603.
[4]
Richard Anderson and Ernst W. Mayr. 1984. A P-complete Problem and Approximations to it. Technical Report.
[5]
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, and Srikanta Tirthapura. 2014. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB J. 23, 2 (2014), 175–199.
[6]
Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In Proceedings of the International Conference on Management of Data. 1006–1023.
[7]
Jie Chen and Yousef Saad. 2010. Dense subgraph extraction with application to community detection. IEEE Trans. Knowl. Data Eng. 24, 7 (2010), 1216–1230.
[8]
Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1 (1985), 210–223.
[9]
Jonathan Cohen. 2008. Trusses: Cohesive Subgraphs for Social Network Analysis. National Security Agency Technical Report 16, 3.1.
[10]
Ketan Date, Keven Feng, Rakesh Nagi, Jinjun Xiong, Nam Sung Kim, and Wen-Mei Hwu. 2017. Collaborative (cpu+ gpu) algorithms for triangle counting and truss decomposition on the minsky architecture: Static graph challenge: Subgraph isomorphism. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–7.
[11]
Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269–274.
[12]
Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. 293–304.
[13]
Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, and Sunita Verma. 2015. Ego-net community mining applied to friend suggestion. Proc. VLDB Endow. 9, 4 (2015), 324–335.
[14]
Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proc. VLDB Endow. 13, 6 (2020), 854–867.
[15]
Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks V. S. Lakshmanan, and Xuemin Lin. 2019. Efficient algorithms for densest subgraph discovery. Proc. VLDB Endow. 12, 11 (2019), 1719–1732.
[16]
Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Exploiting burstiness in reviews for review spammer detection. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media.
[17]
James Fox, Oded Green, Kasimir Gabert, Xiaojing An, and David A. Bader. 2018. Fast and adaptive list intersections on the gpu. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’18). IEEE, 1–7.
[18]
Eugene Fratkin, Brian T. Naughton, Douglas L. Brutlag, and Serafim Batzoglou. 2006. MotifCut: Regulatory motifs finding with maximum density subgraphs. Bioinformatics (Oxford, Engl.) 22, 14 (July2006), e150–7.
[19]
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. 721–732.
[20]
Ronald L. Graham. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2 (1969), 416–429.
[21]
Oded Green, James Fox, Euna Kim, Federico Busato, Nicola Bombieri, Kartik Lakhotia, Shijie Zhou, Shreyas Singapura, Hanqing Zeng, Rajgopal Kannan, Viktor Prasanna, and David Bader. 2017. Quickly finding a truss in a haystack. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–7.
[22]
Oded Green, James Fox, Alex Watkins, Alok Tripathy, Kasimir Gabert, Euna Kim, Xiaojing An, Kumar Aatish, and David A. Bader. 2018. Logarithmic radix binning and vectorized triangle counting. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’18). IEEE, 1–7.
[23]
Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web. 507–517.
[24]
Yizhang He, Kai Wang, Wenjie Zhang, Xuemin Lin, and Ying Zhang. 2021. Exploring cohesive subgraphs with vertex engagement and tie strength in bipartite graphs. Inf. Sci. 572 (2021), 277–296.
[25]
Yang Hu, Hang Liu, and H. Howie Huang. 2018. Tricore: Parallel triangle counting on gpus. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18). IEEE, 171–182.
[26]
Zan Huang, Daniel D. Zeng, and Hsinchun Chen. 2007. Analyzing consumer-product graphs: Empirical findings and applications in recommender systems. Manage. Sci. 53, 7 (2007), 1146–1164.
[27]
Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single PC. Proc. VLDB Endow. 9, 1 (2015), 13–23.
[28]
Jérôme Kunegis. 2013. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. 1343–1350.
[29]
Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna. 2020. GPOP: A scalable cache-and memory-efficient framework for graph processing over parts. ACM Trans. Parallel Comput. 7, 1 (2020), 1–24.
[30]
Kartik Lakhotia, Rajgopal Kannan, and Viktor Prasanna. 2021. Parallel peeling of bipartite networks for hierarchical dense subgraph discovery. arXiv:2110.12511. Retrieved from https://arxiv.org/abs/2110.12511.
[31]
Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna, and Cesar A. F. De Rose. 2020. Receipt: Refine coarse-grained independent tasks for parallel tip decomposition of bipartite graphs. Proc. VLDB Endow. 14, 3 (2020), 404–417.
[32]
Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data. Springer, 303–336.
[33]
Elizabeth A. Leicht, Petter Holme, and Mark E. J. Newman. 2006. Vertex similarity in networks. Phys. Rev. E 73, 2 (2006), 026120.
[34]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling distance labeling on small-world networks. In Proceedings of the International Conference on Management of Data (SIGMOD’19). Association for Computing Machinery, New York, NY, 1060–1077. DOI:
[35]
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 939–948.
[36]
Chenhao Ma, Reynold Cheng, Laks V. S. Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: A motif counting algorithm for uncertain graphs. Proc. VLDB Endow. 13, 2 (2019), 155–168.
[37]
Fragkiskos D. Malliaros, Christos Giatsidis, Apostolos N. Papadopoulos, and Michalis Vazirgiannis. 2019. The core decomposition of networks: Theory, algorithms and applications. VLDB J. (2019), 1–32.
[38]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07).
[39]
Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st International Conference on World Wide Web. 191–200.
[40]
Eisha Nathan, Anita Zakrzewska, Jason Riedy, and David A. Bader. 2017. Local community detection in dynamic graphs using personalized centrality. Algorithms 10, 3 (2017), 102.
[41]
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 419–432.
[42]
Mark E. J. Newman. 2001. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. U.S.A. 98, 2 (2001), 404–409.
[43]
Ha-Myung Park, Sung-Hyon Myaeng, and U. Kang. 2016. PTE: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16). ACM Press. DOI:
[44]
E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A. Bader. 2011. Parallel community detection for massive graphs. In International Conference on Parallel Processing and Applied Mathematics. Springer, 286–296.
[45]
Ryan Rossi and Nesreen Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
[46]
Polina Rozenshtein, Aris Anagnostopoulos, Aristides Gionis, and Nikolaj Tatti. 2014. Event detection in activity networks. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1176–1185.
[47]
Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Steven Smith, William Song, et al. 2017. Static graph challenge: Subgraph isomorphism. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–6.
[48]
Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Steven Smith, William Song, et al. 2017. Static graph challenge: Subgraph isomorphism. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–6.
[49]
Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. 2018. Butterfly counting in bipartite networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2150–2159.
[50]
Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyüce, and Srikanta Tirthapura. 2019. FLEET: Butterfly estimation from a bipartite graph stream. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1201–1210.
[51]
Ahmet Erdem Sarıyüce and Ali Pinar. 2016. Fast hierarchy construction for dense subgraphs. Proc. VLDB Endow. 10, 3 (2016).
[52]
Ahmet Erdem Sarıyüce and Ali Pinar. 2018. Peeling bipartite networks for dense subgraph discovery. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 504–512.
[53]
Ahmet Erdem Sariyüce, C. Seshadhri, and Ali Pinar. 2018. Local algorithms for hierarchical dense subgraph discovery. Proc. VLDB Endow. 12, 1 (2018), 43–56.
[54]
Ahmet Erdem Sariyuce, C. Seshadhri, Ali Pinar, and Umit V. Catalyurek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web. 927–937.
[55]
Jessica Shi and Julian Shun. 2019. Parallel algorithms for butterfly computations. arXiv:1907.08607. Retrieved from https://arxiv.org/abs/1907.08607.
[56]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135–146.
[57]
Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In Proceedings of the IEEE 31st International Conference on Data Engineering. IEEE, 149–160.
[58]
Shaden Smith, Xing Liu, Nesreen K. Ahmed, Ancy Sarah Tom, Fabrizio Petrini, and George Karypis. 2017. Truss decomposition on shared-memory parallel systems. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–6.
[59]
Christian L. Staudt and Henning Meyerhenke. 2015. Engineering parallel algorithms for community detection in massive networks. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2015), 171–184.
[60]
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8–pp.
[61]
Charalampos E. Tsourakakis. 2014. A novel approach to finding near-cliques: The triangle-densest subgraph problem. arXiv:1405.1477. Retrieved from https://arxiv.org/abs/1405.1477.
[62]
Charalampos E. Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. 2017. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web. 1451–1460.
[63]
Chad Voegele, Yi-Shan Lu, Sreepathi Pai, and Keshav Pingali. 2017. Parallel triangle counting and k-truss identification using graph-centric methods. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’17). IEEE, 1–7.
[64]
Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012).
[65]
Jia Wang, Ada Wai-Chee Fu, and James Cheng. 2014. Rectangle counting in large bipartite graphs. In Proceedings of the IEEE International Congress on Big Data. IEEE, 17–24.
[66]
Kai Wang, Xin Cao, Xuemin Lin, Wenjie Zhang, and Lu Qin. 2018. Efficient computing of radius-bounded k-cores. In Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE’18). IEEE, 233–244.
[67]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2019. Vertex priority based butterfly counting for large-scale bipartite networks. Proc. VLDB Endow. 12, 10 (2019), 1139–1152.
[68]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2020. Efficient bitruss decomposition for large-scale bipartite graphs. In Proceedings of the IEEE 36th International Conference on Data Engineering (ICDE’20). IEEE, 661–672.
[69]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2021. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. VLDB J. (2021), 1–24.
[70]
Nan Wang, Jingbo Zhang, Kian-Lee Tan, and Anthony K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4, 2 (2010), 58–68.
[71]
Dong Wen, Lu Qin, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2018. I/o efficient core graph decomposition: Application to degeneracy ordering. IEEE Trans. Knowl. Data Eng. 31, 1 (2018), 75–90.
[72]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. Graphit: A high-performance graph dsl. Proceedings of the ACM on Programming Languages 2, (2018), 1–30.
[73]
Zhaonian Zou. 2016. Bitruss decomposition of bipartite graphs. In International Conference on Database Systems for Advanced Applications. Springer, 218–233.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 10, Issue 2
June 2023
285 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3604626
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023
Online AM: 10 February 2023
Accepted: 01 February 2023
Revised: 27 January 2023
Received: 21 July 2021
Published in TOPC Volume 10, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph algorithms
  2. parallel graph analytics
  3. dense graph mining
  4. bipartite graphs

Qualifiers

  • Research-article

Funding Sources

  • Defense Advanced Research Projects Agency (DARPA)
  • National Science Foundation (NSF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 173
    Total Downloads
  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media