Abstract
When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely used parallel heterogeneous devices that can significantly boost performance for data science programs. However, currently no work enables efficient butterfly counting on GPU. To fill this gap, we propose a GPU-based butterfly counting method, called G-BFC. G-BFC solves three significant technical problems. First, butterfly counting involves massive serial operations, which leads to severe synchronization overheads and performance degradation. We unlock the serial region and utilize the shared memory on GPU to efficiently handle it. Second, butterfly counting on GPU faces the workload imbalance problem. To maximize efficiency, we develop a novel adaptive strategy to balance the workload among threads. Third, the large number of two-hop paths, also known as wedges, in bipartite graphs make parallel butterfly counting difficult to traverse. We develop an innovative preprocessing strategy that can significantly cut down on the required number of wedges. We conduct comprehensive experiments on both server-grade and edge-grade GPU platforms, and experiments show that G-BFC brings significant performance benefits. G-BFC achieves 4.84\(\times \) performance speedup over the state-of-the-art solution on eleven real-world datasets.
Similar content being viewed by others
References
Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Vertex priority based butterfly counting for large-scale bipartite networks. In: PVLDB (2019)
Aksoy, S.G., Kolda, T.G., Pinar, A.: Measuring and modeling bipartite graphs with community structure. J. Complex Netw. 5(4), 581–603 (2017)
Huang, Z.: Link prediction based on graph topology: The predictive value of generalized clustering coefficient. Available at SSRN 1634014 (2010)
Cohen, J.: Trusses: Cohesive subgraphs for social network analysis. National security agency technical report, 16(3.1):1–29 (2008)
Faris, H., Ala’M, A.-Z., Heidari, A.A., Aljarah, I., Mafarja, M., Hassonah, M.A., Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fus. 48, 67–83 (2019)
Lo, S.-H., Lee, C.-R., Chung, Y.-C., Chung, I.-H.: A parallel rectangle intersection algorithm on GPU+ CPU. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 43–52. IEEE (2011)
Zhang, F., Zhai, J., Shen, X., Mutlu, O., Du, X.: POCLib: A high-performance framework for enabling near orthogonal processing on compression. IEEE Trans. Parallel Distrib. Syst. 33(2), 459–475 (2022)
Pandey, S., Li, X. S., Buluc, A., Xu, J., Liu, H.: H-index: Hash-indexing for parallel triangle counting on GPUs. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2019)
Yu, H., Guo, X., Luo, X., Bian, W., Zhang, T.: Construct trip graphs by using taxi trajectory data. Data Sci. Eng. 8(1), 1–22 (2023)
Hu, Y., Liu, H., Huang, H.H.: Tricore: Parallel triangle counting on GPUs. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 171–182. IEEE (2018)
Jetson AGX Xavier Series. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/ (2022)
NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics. https://developer.nvidia.com/blog/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/ (2022)
Taobao. https://www.taobao.com/ (2022)
Sanei-Mehri, S.-V., Sariyuce, A. E., Tirthapura, S.: Butterfly counting in bipartite networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2150–2159 (2018)
Shi, J., Shun, J.: Parallel algorithms for butterfly computations. In: Symposium on Algorithmic Principles of Computer Systems (2020)
Green, O., Yalamanchili, P., Munguía, L.-M.: Fast triangle counting on the GPU. In: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, pp. 1–8 (2014)
Ueno, K., Suzumura, T.: Parallel distributed breadth first search on GPU. In: 20th Annual International Conference on High Performance Computing, pp. 314–323. IEEE (2013)
Wu, T., Wang, B., Shan, Y., Yan, F., Wang, Y., Xu, N.: Efficient pagerank and spmv computation on amd gpus. In: 2010 39th International Conference on Parallel Processing, pp. 81–89. IEEE (2010)
Xu, Q., Zhang, F., Yao, Z., Lu, L., Du, X., Deng, D., He, B.: Efficient load-balanced butterfly counting on gpu. In: PVLDB (2022)
Zhang, P., Wang, J., Li, X., Li, M., Di, Z., Fan, Y.: Clustering coefficient and community structure of bipartite networks. Physica A 387(27), 6869–6875 (2008)
Robins, G., Alexander, M.: Small worlds among interlocking directors: network structure and distance in bipartite graphs. Comput. Math. Organ. Theory 10, 69–94 (2004)
Lind, P.G., Gonzalez, M.C., Herrmann, H.J.: Cycles and clustering in bipartite networks. Phys. Rev. E 72(5), 056127 (2005)
Lyu, B., Qin, L., Lin, X., Zhang, Y., Qian, Z., Zhou, J.: Maximum biclique search at billion scale. Proc. VLDB Endow. (2020)
Caldarelli, G., Pastor-Satorras, R., Vespignani, A.: Structure of cycles and local ordering in complex networks. Eur. Phys. J. B 38, 183–186 (2004)
Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Sarıyüce, A.E., Pinar, A.: Peeling bipartite networks for dense subgraph discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 504–512 (2018)
Chen, J., Saad, Y.: Dense subgraph extraction with application to community detection. IEEE Trans. Knowl. Data Eng. 24(7), 1216–1230 (2010)
Kim, M., Leskovec, J.: Multiplicative attribute graph model of real-world networks. Int. Math. 8(1–2), 113–160 (2012)
Gao, Y., Wang, X., He, X., Feng, H., Zhang, Y.: Rumor detection with self-supervised learning on texts and social graph. Front. Comput. Sci. 17(4), 174611 (2023)
Beutel, A., Xu, W., Guruswami, V., Palow, C., Faloutsos, C.: Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 119–130 (2013)
Palmer, D.: Broken ties: interlocking directorates and intercorporate coordination. Admin. Sci. Q. 40–55 (1983)
Li, R., Wang, P., Jia, P., Zhang, X., Zhao, J., Tao, J., Yuan, Y., Guan, X.: Approximately counting butterflies in large bipartite graph streams. IEEE Trans. Knowl. Data Eng. 34(12), 5621–5635 (2021)
Wang, J., Fu, A.W.-C., Cheng, J.: Rectangle counting in large bipartite graphs. In: 2014 IEEE International Congress on Big Data, pp. 17–24. IEEE (2014)
Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated cpu/gpu architectures. IEEE Trans. Parallel Distrib. Syst. 28(3), 905–918 (2016)
Zhang, F., Pan, Z., Zhou, Y., Zhai, J., Shen, X., Mutlu, O., Du, X.: G-tadoc: Enabling efficient gpu-based text analytics without decompression. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1679–1690. IEEE (2021)
Zhang, F., Chen, Z., Zhang, C., Zhou, A.C., Zhai, J., Du, X.: An efficient parallel secure machine learning framework on gpus. IEEE Trans. Parallel Distrib. Syst. 32(9), 2262–2276 (2021)
Pan, Z., Zhang, F., Zhou, Y., Zhai, J., Shen, X., Mutlu, O., Du, X.: Exploring data analytics without decompression on embedded GPU systems. IEEE Trans. Parallel Distrib. Syst. 33(7), 1553–1568 (2021)
Shao, Y., Chen, L., Cui, B.: Efficient cohesive subgraphs detection in parallel. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 613–624 (2014)
Fan, W., Xu, J., Wu, Y., Yu, W., Jiang, J., Zheng, Z., Zhang, B., Cao, Y., Tian, C.: Parallelizing sequential graph computations. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 495–510 (2017)
Shang, Z., Li, F., Yu, J.X., Zhang, Z., Cheng, H.: Graph analytics through fine-grained parallelism. In: Proceedings of the 2016 International Conference on Management of Data, pp. 463–478 (2016)
Gallet, B., Gowanlock, M.: Heterogeneous CPU-GPU epsilon grid joins: static and dynamic work partitioning strategies. Data Sci. Eng. 6(1), 39–62 (2021)
Bhatia, S.: Approximate triangle count and clustering coefficient. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1809–1811 (2018)
Huang, J., Huang, X., Zhu, Y., Xu, J.: Parallel algorithms for parameter-free structural diversity search on graphs. World Wide Web 24, 397–417 (2021)
Ghaffari, M., Lattanzi, S., Mitrović, S.: Improved parallel algorithms for density-based network clustering. In: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), vol. 97, pp. 2201–2210. PMLR (2019)
Arifuzzaman, S., Khan, M., Marathe, M.: Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans. Knowl. Discov. Data (TKDD) 14(1), 1–34 (2019)
Han, S., Zou, L., Yu, J.X.: Speeding up set intersections in graph algorithms using simd instructions. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1587–1602 (2018)
Hu, L., Zou, L., Liu, Y.: Accelerating triangle counting on GPU. In: Proceedings of the 2021 International Conference on Management of Data, pp. 736–748 (2021)
Bisson, M., Fatica, M.: High performance exact triangle counting on GPUs. IEEE Trans. Parallel Distrib. Syst. 28(12), 3501–3510 (2017)
Jain, S., Seshadhri, C.: A fast and provable method for estimating clique counts using turán’s theorem. In: Proceedings of the 26th International Conference on World Wide Web, pp. 441–449 (2017)
Pinar, A., Seshadhri, C., Vishal, V.: Escape: Efficiently counting all 5-vertex subgraphs. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1431–1440 (2017)
Rahman, M., Bhuiyan, M., Hasan, M.A.: Graft: An approximate graphlet counting algorithm for large graph analysis. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1467–1471 (2012)
Zhu, R., Zou, Z., Li, J.: Fast rectangle counting on massive networks. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 847–856. IEEE (2018)
Chen, X., Dathathri, R., Gill, G., Pingali, K.: Pangolin: An efficient and flexible graph mining system on CPU and GPU. Proc. VLDB Endow. 13(8), 1190–1205 (2020)
Shi, T., Zhai, M., Xu, Y., Zhai, J.: Graphpi: High performance graph pattern matching through effective redundancy elimination. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020)
Sheshbolouki, A., Özsu, M.T.: sgrapp: Butterfly approximation in streaming graphs. ACM Trans. Knowl. Discov. Data (TKDD) 16(4), 1–43 (2022)
Shiloach, Y., Vishkin, U.: An o (n2log n) parallel max-flow algorithm. J. Algorithms 3(2), 128–146 (1982)
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: A high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12 (2016)
Jangda, A., Polisetty, S., Guha, A., Serafini, M.: Nextdoor: GPU-based graph sampling for graph machine learning. arXiv preprint arXiv:2009.06693 (2020)
Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 769–780. IEEE (2014)
Fox, J., Green, O., Gabert, K., An, X., Bader, D.A.: Fast and adaptive list intersections on the GPU. In: 2018 IEEE High Performance extreme Computing Conference (HPEC), pp. 1–7. IEEE (2018)
Cheng, J., Grossman, M., McKercher, T.: Professional CUDA c Programming. Wiley, New York (2014)
Gomathy, C., Geetha, V.: A real time analysis of service based using mobile phone controlled vehicle using DTMF for accident prevention. Int. J. Comput. Appl. 975, 8887 (2016)
Chen, S., Liu, Y., Gao, X., Han, Z.: Mobilefacenets: Efficient CNNs for accurate real-time face verification on mobile devices. In: Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, August 11–12, 2018, Proceedings 13, pp. 428–438. Springer, Berlin (2018)
Seidaliyeva, U., Akhmetov, D., Ilipbayeva, L., Matson, E.T.: Real-time and accurate drone detection in a video with a static background. Sensors 20(14), 3856 (2020)
Rohan, A., Rabah, M., Kim, S.-H.: Convolutional neural network-based real-time object detection and tracking for parrot AR drone 2. IEEE Access 7, 69575–69584 (2019)
Gia, T. N., Jiang, M., Sarker, V. K., Rahmani, A. M., Westerlund, T., Liljeberg, P., Tenhunen, H.: Low-cost fog-assisted health-care IoT system with energy-efficient sensor nodes. In: 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1765–1770. IEEE (2017)
Pan, S., Li, P., Yi, C., Zeng, D., Liang, Y.-C., Hu, G.: Edge intelligence empowered urban traffic monitoring: a network tomography perspective. IEEE Trans. Intell. Transp. Syst. 22(4), 2198–2211 (2020)
Xing, T., Yang, Q., Jiang, Z., Fu, X., Wang, J., Wu, C.Q., Chen, X.: Wifine: Real-time gesture recognition using wi-fi with edge intelligence. ACM Trans. Sens. Netw. 19(1), 1–24 (2022)
Jetson Xavier NX Series. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-nx/ (2022)
Azad, A., Buluç, A., Gilbert, J.: Parallel triangle counting and enumeration using matrix algebra. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 804–811. IEEE (2015)
Wolf, M.M., Deveci, M., Berry, J.W., Hammond, S.D., Rajamanickam, S.: Fast linear algebra-based triangle counting with kokkoskernels. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2017)
Tracking the Trackers. https://ssc.io/trackingthetrackers/ (2021)
The KONECT Project. http://konect.cc, 2021
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19(45–74), 22 (2010)
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 721–732 (2005)
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. (2009)
Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \), \(\beta \))-core computation in bipartite graphs. VLDB J. 29(5), 1075–1099 (2020)
Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs. VLDB J. (2021)
Opsahl, T.: Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc. Netw. 35(2), 159–167 (2013)
Pashanasangi, N., Seshadhri, C.: Efficiently counting vertex orbits of all 5-vertex subgraphs, by evoke. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 447–455 (2020)
Zhang, Y., Yu, J. X.: Hub labeling for shortest path counting. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1813–1828 (2020)
Lokshtanov, D., Björklund, A., Saurabh, S., Zehavi, M.: Approximate counting of k-paths: simpler, deterministic, and in polynomial space. ACM Trans. Algorithms (TALG) 17(3), 1–44 (2021)
Jha, M., Seshadhri, C., Pinar, A.: Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In: Proceedings of the 24th International Conference on World Wide Web, pp. 495–505 (2015)
Blandford, D.K., Blelloch, G.E., Kash, I.A.: Compact representations of separable graphs (2003)
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 219–228 (2009)
Kang, U., Faloutsos, C.: Beyond’caveman communities’: Hubs and spokes for graph compression and mining. In: 2011 IEEE 11th International Conference on Data Mining, pp. 300–309. IEEE (2011)
Green, O., Bader, D. A.: custinger: Supporting dynamic graph algorithms for GPUs. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)
Pan, Y., Wang, Y., Wu, Y., Yang, C., Owens, J.D.: Multi-GPU graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 479–490. IEEE (2017)
Fu, Z., Personick, M., Thompson, B.: Mapgraph: A high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems, pp. 1–6 (2014)
Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core CPU and GPU. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 78–88. IEEE (2011)
Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. ACM Sigplan Not. 47(8), 117–128 (2012)
Fagginger Auer, B.O., Bisseling, R.H.: A gpu algorithm for greedy graph matching. Facing the Multicore-Challenge II: Aspects of New Paradigms and Technologies in Parallel Computing, pp. 108–119 (2012)
Buluç, A., Gilbert, J.R., Budak, C.: Solving path problems on the GPU. Parallel Comput. 36(5–6), 241–253 (2010)
Zhao, W., Tan, S., Li, P.: Song: Approximate nearest neighbor search on gpu. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1033–1044. IEEE (2020)
Naumov, M., Castonguay, P., Cohen, J.: Parallel graph coloring with applications to the incomplete-LU factorization on the GPU. Nvidia White Paper (2015)
Deveci, M., Boman, E.G., Devine, K.D., Rajamanickam, S.: Parallel graph coloring for manycore architectures. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 892–901. IEEE (2016)
Abu-Aisheh, Z., Raveaux, R., Ramel, J.-Y., Martineau, P.: A parallel graph edit distance algorithm. Expert Syst. Appl. 94, 41–57 (2018)
Polak, A.: Counting triangles in large graphs on GPU. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 740–746. IEEE (2016)
Acknowledgements
This work is supported by the National Key R &D Program of China under Grant (No. 2023YFB4503603), National Natural Science Foundation of China (U1911203, 62172419, and 62322213), and Beijing Nova Program (20230484397 and 20220484137). Bingsheng’s research is in part supported by the Ministry of Education AcRF Tier 2 grant (No. MOE-T2EP20121-0016) in Singapore.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Feng Zhang is the corresponding author of this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xia, Y., Zhang, F., Xu, Q. et al. GPU-based butterfly counting. The VLDB Journal 33, 1543–1567 (2024). https://doi.org/10.1007/s00778-024-00861-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-024-00861-0