research-article

Efficient Maximal Biclique Enumeration on GPUs

Authors:

Gang ChenAuthors Info & Claims

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 16, Pages 1 - 13

https://doi.org/10.1145/3581784.3607059

Published: 11 November 2023 Publication History

Abstract

Maximal biclique enumeration (MBE) in bipartite graphs is an important problem in data mining with many real-world applications. All existing solutions for MBE are designed for CPUs. Parallel MBE algorithms for GPUs are needed for MBE acceleration leveraging its many computing cores. However, enumerating maximal bicliques using GPUs has three main challenges including large memory requirement, thread divergence, and load imbalance. In this paper, we propose GMBE, the first highly-efficient GPU solution for the MBE problem. To overcome the challenges, we design a node-reuse approach to reduce GPU memory usage, a pro-active pruning method using the vertex's local neighborhood size to alleviate thread divergence, and a load-aware task scheduling framework to achieve load balance among threads within GPU warps and blocks. Our experimental results show that GMBE on an NVIDIA A100 GPU can achieve 70.6× speedup over the state-of-the-art parallel MBE algorithm ParMBE on a 96-core CPU machine.

Supplemental Material

MP4 File - SC23 video presentation for "Efficient Maximal Biclique Enumeration on GPUs"

SC23 video presentation for the main program paper "Efficient Maximal Biclique Enumeration on GPUs" by Zhe Pan, Shuibing He, Xu Li, Xuechen Zhang, Rui Wang and Gang Chen

Download
243.32 MB

References

[1]

2022. CUDA C++ programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/.

[2]

2023. CUDA from Wikipedia. https://en.wikipedia.org/wiki/CUDA.

[3]

2023. GeForce RTX 20 Series. https://www.nvidia.com/en-gb/geforce/20-series/.

[4]

2023. NVIDIA A100 Tensor Core GPU. https://www.nvidia.com/en-gb/data-center/a100/.

[5]

2023. NVIDIA Nsight Compute. https://developer.nvidia.com/nsight-compute/.

[6]

2023. NVIDIA V100 Tensor Core GPU. https://www.nvidia.com/en-gb/data-center/v100/.

[7]

2023. Single instruction, multiple threads (SIMT) from Wikipedia. https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads.

[8]

Aman Abidi, Rui Zhou, Lu Chen, and Chengfei Liu. 2020. Pivot-based Maximal Biclique Enumeration. In IJCAI. 3558--3564.

[9]

Gabriela Alexe, Sorin Alexe, Yves Crama, Stephan Foldes, Peter L Hammer, and Bruno Simeone. 2004. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics 145, 1 (2004), 11--21.

Digital Library

[10]

Mohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Elisa Bertino, and Norman Foo. 2013. Collusion detection in online rating systems. In Web Technologies and Applications: 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4--6, 2013. Proceedings 15. Springer, 196--207.

[11]

Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, and Wen-mei Hwu. 2022. Parallel k-clique counting on gpus. In Proceedings of the 36th ACM International Conference on Supercomputing. 1--14.

Digital Library

[12]

Tariq Alusaifeer, Sheela Ramanna, Christopher J Henry, and James Peters. 2013. GPU implementation of MCE approach to finding near neighbourhoods. In International Conference on Rough Sets and Knowledge Technology. Springer, 251--262.

Digital Library

[13]

Martin Burtscher, Rupesh Nasre, and Keshav Pingali. 2012. A quantitative study of irregular programs on GPUs. In 2012 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 141--151.

Digital Library

[14]

Lu Chen, Chengfei Liu, Rui Zhou, Jiajie Xu, and Jianxin Li. 2022. Efficient maximal biclique enumeration for large sparse bipartite graphs. Proceedings of the VLDB Endowment 15, 8 (2022), 1559--1571.

Digital Library

[15]

Xuhao Chen et al. 2022. Efficient and Scalable Graph Pattern Mining on GPUs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 857--877.

[16]

Xuhao Chen, Li-Wen Chang, Christopher I Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu. 2014. Adaptive cache management for energy-efficient GPU computing. In 2014 47th Annual IEEE/ACM international symposium on microarchitecture. IEEE, 343--355.

Digital Library

[17]

Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, and Keshav Pingali. 2021. Sandslash: a two-level framework for efficient graph pattern mining. In Proceedings of the ACM International Conference on Supercomputing. 378--391.

Digital Library

[18]

Apurba Das and Srikanta Tirthapura. 2019. Shared-memory parallel maximal biclique enumeration. In 2019 IEEE 26th International Conference on High Performance Computing (HiPC). IEEE, 34--43.

[19]

David Eppstein. 1994. Arboricity and bipartite subgraph listing algorithms. Information processing letters 51, 4 (1994), 207--211.

[20]

Wentian Guo, Yuchen Li, and Kian-Lee Tan. 2020. Exploiting reuse for GPU subgraph enumeration. IEEE Transactions on Knowledge and Data Engineering (TKDE) 34, 9 (2020), 4231--4244.

[21]

Kshitij Gupta, Jeff A Stuart, and John D Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In 2012 Innovative Parallel Computing (InPar). 1--14.

[22]

Danny Hermelin and George Manoussakis. 2021. Efficient enumeration of maximal induced bicliques. Discrete Applied Mathematics 303 (2021), 253--261.

Digital Library

[23]

George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.

Digital Library

[24]

Kyle Kloster, Blair D Sullivan, and Andrew van der Poel. 2019. Mining maximal induced bicliques using odd cycle transversals. In Proceedings of the 2019 SIAM International Conference on Data Mining. SIAM, 324--332.

[25]

Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd international conference on world wide web. 1343--1350.

Digital Library

[26]

Jure Leskovec and Andrej Krevl. 2014. SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.

[27]

Brenton Lessley, Talita Perciano, Manish Mathai, Hank Childs, and E Wes Bethel. 2017. Maximal clique enumeration with data-parallel primitives. In 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV). IEEE, 16--25.

[28]

Yuan Lin and Vinod Grover. 2018. Using cuda warp-level primitives. https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/.

[29]

Guimei Liu, Kelvin Sim, and Jinyan Li. 2006. Efficient mining of large maximal bicliques. In Data Warehousing and Knowledge Discovery: 8th International Conference, DaWaK 2006, Krakow, Poland, September 4--8, 2006. Proceedings 8. Springer, 437--448.

Digital Library

[30]

Bingqing Lyu, Lu Qin, Xuemin Lin, Ying Zhang, Zhengping Qian, and Jingren Zhou. 2020. Maximum biclique search at billion scale. Proceedings of the VLDB Endowment (2020).

Digital Library

[31]

Ziyi Ma, Yuling Liu, Yikun Hu, Jianye Yang, Chubo Liu, and Huadong Dai. 2022. Efficient maintenance for maximal bicliques in bipartite graph streams. World Wide Web 25, 2 (2022), 857--877.

Digital Library

[32]

Arko Provo Mukherjee and Srikanta Tirthapura. 2016. Enumerating maximal bicliques from a large graph using mapreduce. IEEE Transactions on Services Computing (TSC) 10, 5 (2016), 771--784.

[33]

Ron Rymon. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR). 539--550.

[34]

Amos Tanay, Roded Sharan, and Ron Shamir. 2002. Discovering statistically significant biclusters in gene expression data. In Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology. 136--144.

[35]

Kai Wang, Wenjie Zhang, Xuemin Lin, Lu Qin, and Alexander Zhou. 2022. Efficient personalized maximum biclique search. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 498--511.

[36]

Yi-Wen Wei, Wei-Mei Chen, and Hsin-Hung Tsai. 2021. Accelerating the Bron-Kerbosch algorithm for maximal clique enumeration using GPUs. IEEE Transactions on Parallel and Distributed Systems (TPDS) 32, 9 (2021), 2352--2366.

Digital Library

[37]

Martin Winter, Mathias Parger, Daniel Mlakar, and Markus Steinberger. 2021. Are dynamic memory managers on gpus slow? a survey and benchmarks. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 219--233.

Digital Library

[38]

Jianye Yang, Yun Peng, Dian Ouyang, Wenjie Zhang, Xuemin Lin, and Xiang Zhao. 2023. (p, q)-biclique counting and enumeration for large sparse bipartite graphs. The VLDB Journal (2023), 1--25.

[39]

Yun Zhang, Charles A Phillips, Gary L Rogers, Erich J Baker, Elissa J Chesler, and Michael A Langston. 2014. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics 15 (2014), 1--18.

Cited By

Zhang JWu FJiang HCheng GChen GWang Q(2024)SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All ScalesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673069(179-188)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673069
Pan ZLi XHe SZhang XWang RGao YChen GSun X(2024)AMBEA: Aggressive Maximal Biclique Enumeration in Large Bipartite Graph ComputingIEEE Transactions on Computers10.1109/TC.2024.344186473:12(2664-2677)Online publication date: Dec-2024
https://doi.org/10.1109/TC.2024.3441864

Index Terms

Efficient Maximal Biclique Enumeration on GPUs

Recommendations

Efficient Convex Optimization on GPUs for Embedded Model Predictive Control
GPGPU-10: Proceedings of the General Purpose GPUs

GPU applications have traditionally run on PCs or in larger scale systems. With the introduction of the Tegra line of mobile processors, NVIDIA expanded the types of systems that can exploit the massive parallelism offered by GPU computing ...
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2023

1428 pages

ISBN:9798400701092

DOI:10.1145/3581784

Chair:
Dorian Arnold,
Program Chair:
Rosa M Badia,
Program Co-chair:
Kathryn Mohror

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

SC '23

Sponsor:

SIGHPC

SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2023

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
509
Total Downloads

Downloads (Last 12 months)295
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JWu FJiang HCheng GChen GWang Q(2024)SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All ScalesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673069(179-188)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673069
Pan ZLi XHe SZhang XWang RGao YChen GSun X(2024)AMBEA: Aggressive Maximal Biclique Enumeration in Large Bipartite Graph ComputingIEEE Transactions on Computers10.1109/TC.2024.344186473:12(2664-2677)Online publication date: Dec-2024
https://doi.org/10.1109/TC.2024.3441864

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten