Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448016.3452800acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges

Published: 18 June 2021 Publication History

Abstract

Streaming graph analysis is gaining importance in various fields due to the natural dynamicity in many real graph applications. However, approximately counting triangles in real-world streaming graphs with edge duplication and expiration remains an unsolved problem. In this paper, we propose SWTC algorithm to address approximate sliding-window triangle counting problem in streaming graphs with edge duplication. In SWTC, we propose a fixed-length slicing strategy that addresses both unbiased sampling and cardinality estimation issues with a bounded memory usage. We theoretically prove the superiority of our method in sample graph size and estimation accuracy under given memory upper bound. Extensive experiments also confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.

Supplementary Material

MP4 File (3448016.3452800.mp4)
Streaming graph analysis is gaining importance in various fields due to the natural dynamicity in many real graph applications. However, approximately counting triangles in real-world streaming graphs with edge duplication and expiration remains an unsolved problem. In this paper, we propose SWTC algorithm to address approximate sliding-window triangle counting problem in streaming graphs with edge duplication. In SWTC, we propose a fixed-length slicing strategy that addresses both unbiased sampling and cardinality estimation issues with a bounded memory usage. We theoretically prove the superiority of our method in the sample graph size and estimation accuracy under given memory upper bound. Extensive experiments over large real streaming graphs confirm our approach can obtain larger sample graphs and more accurate estimation value on counting triangle numbers compared with the baseline method.

References

[1]
Jonathan W Berry, Hendrickson Bruce, Randall A Laviolette, and Cynthia A Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E Statistical Nonlinear & Soft Matter Physics, 83(5 Pt 2):056119, 2011.
[2]
Eckmann Jean-Pierre and Moses Elisha. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci U S A, 99(9):5825--5829, 2002.
[3]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(3):13, 2010.
[4]
Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, 2002.
[5]
U Kang, Brendan Meeder, Evangelos E Papalexakis, and Christos Faloutsos. Heigen: Spectral analysis for billion-scale graphs. IEEE Transactions on knowledge and data engineering, 26(2):350--362, 2012.
[6]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y Zhao, and Yafei Dai. Uncovering social network sybils in the wild. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(1):1--29, 2014.
[7]
Zhenjun Li, Yunting Lu, Wei-Peng Zhang, Rong-Hua Li, Jun Guo, Xin Huang, and Rui Mao. Discovering hierarchical subgraphs of k-core-truss. Data Science and Engineering, 3(2):136--149, 2018.
[8]
A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun Lung Wu. Counting and sampling triangles from a graph stream. Proceedings of the Vldb Endowment, 6(14):1870--1881, 2013.
[9]
Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. Graph sample and hold: A framework for big-graph analytics. In Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, 2014.
[10]
Pinghui Wang, Yiyan Qi, Sun Yu, Xiangliang Zhang, and Xiaohong Guan. Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proceedings of the Vldb Endowment, 11(2):162--175, 2017.
[11]
P Oscar Boykin and Vwani P Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68, 2005.
[12]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. Siam Journal on Computing, 31(6):1794--1813, 2002.
[13]
Youhuan Li, Lei Zou, M Tamer Özsu, and Dongyan Zhao. Time constrained continuous subgraph search over streaming graphs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1082--1093. IEEE, 2019.
[14]
Michael S Crouch, Andrew McGregor, and Daniel Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.
[15]
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment, 11(12):1876--1888, 2018.
[16]
Minsoo Jung, Yongsub Lim, Sunmin Lee, and U Kang. Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Mining and Knowledge Discovery, 33(5):1225--1253, 2019.
[17]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Transactions on Knowledge Discovery from Data (TKDD), 11(4):1--50, 2017.
[18]
Kijung Shin, Sejoon Oh, Jisu Kim, Bryan Hooi, and Christos Faloutsos. Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(2):1--39, 2020.
[19]
Rainer Gemulla and Wolfgang Lehner. Sampling time-based sliding windows in bounded space. In Acm Sigmod International Conference on Management of Data, 2008.
[20]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137--156. Discrete Mathematics and Theoretical Computer Science, 2007.
[21]
Daniel Ting. Streamed approximate counting of distinct elements: Beating optimal batch methods. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 442--451, 2014.
[22]
Dongjin Lee, Kijung Shin, and Christos Faloutsos. Temporal locality-aware sampling for accurate triangle counting in real graph streams. The VLDB Journal, pages 1--25, 2020.
[23]
Source code of swtc and the baseline method. https://github.com/StreamingTriangleCounting/TriangleCounting.git.
[24]
Brian Babcock, Mayur Datar, and Rajeev Motwani. Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 633--634. Society for Industrial and Applied Mathematics, 2002.
[25]
George M Slota and Kamesh Madduri. Complex network analysis using parallel approximate motif counting. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pages 405--414. IEEE, 2014.
[26]
Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi. Motif counting beyond five nodes. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(4):1--25, 2018.
[27]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1):2--es, 2007.
[28]
Bobhash function. Published by Bob Jenkins at http://burtleburtle.net/bob/hash/doobs.html.
[29]
Murmurhash function. Published by Austin Appleby at https://github.com/aappleby/smhasher.
[30]
Robert Sedgewick. Algorithms in c. Pearson Education, 2001.
[31]
Aphash and collection of other hash functions. Published by Arash Partow at http://www.partow.net/programming/hashfunctions/#RSHashFunction.
[32]
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.
[33]
Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In Acm International Conference on Information & Knowledge Management, 2013.
[34]
Xiaocheng Hu, Yufei Tao, and Chin Wan Chung. Massive graph triangulation. In Acm Sigmod International Conference on Management of Data, 2013.
[35]
Jinha Kim, Wook Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. Opt:a new framework for overlapped and parallel triangulation in large-scale graphs. 2014.
[36]
Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1115--1124, 2016.
[37]
Ziv Bar-Yossef, Ravi Kumar, and D Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 623--632. Society for Industrial and Applied Mathematics, 2002.
[38]
Buriol, S Luciana, Frahling, Gereon, Leonardi, Stefano, Marchetti-Spaccamela, Alberto, Sohler, and Christian. Counting triangles in data streams. In Acm Sigmod-sigact-sigart Symposium on Principles of Database Systems, 2006.
[39]
Hossein Jowhari and Mohammad Ghodsi. New streaming algorithms for counting triangles in graphs. In International Computing and Combinatorics Conference, pages 710--716. Springer, 2005.
[40]
Yongsub Lim and U Kang. Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 685--694. ACM, 2015.
[41]
Madhav Jha, C. Seshadhri, and Ali Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. 2013.
[42]
Charalampos E Tsourakakis, U Kang, Gary L Miller, and Christos Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837--846, 2009.
[43]
Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37--57, 1985.
[44]
Rainer Gemulla, Wolfgang Lehner, and Peter J Haas. Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal, 17(2):173--201, 2008.
[45]
Vladimir Braverman, Rafail Ostrovsky, and Carlo Zaniolo. Optimal sampling from sliding windows. In Twenty-eighth Acm Sigmod-sigact-sigart Symposium on Principles of Database Systems, 2009.
[46]
Graham Cormode, Shanmugavelayutham Muthukrishnan, Ke Yi, and Qin Zhang. Optimal sampling from distributed streams. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 77--86, 2010.

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Truss-Based Community Search over Streaming Directed GraphsProceedings of the VLDB Endowment10.14778/3659437.365944017:8(1816-1829)Online publication date: 31-May-2024
  • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate algorithms
  2. streaming graphs
  3. triangle counting

Qualifiers

  • Research-article

Funding Sources

  • NSFC
  • Beijing Academy of Artificial Intelligence (BAAI)
  • Key Research and Development Program of Hubei Province
  • Scientific and technological innovation 2030 - new generation of artificial intelligence major project

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)11
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Truss-Based Community Search over Streaming Directed GraphsProceedings of the VLDB Endowment10.14778/3659437.365944017:8(1816-1829)Online publication date: 31-May-2024
  • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 2-May-2024
  • (2024)MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming GraphsProceedings of the ACM on Management of Data10.1145/36392602:1(1-26)Online publication date: 26-Mar-2024
  • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
  • (2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
  • (2024)Newton Sketches: Estimating Node Intimacy in Dynamic Graphs Using Newton's Law of Cooling2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00225(2904-2916)Online publication date: 13-May-2024
  • (2023)Auxo: A Scalable and Efficient Graph Stream Summarization StructureProceedings of the VLDB Endowment10.14778/3583140.358315416:6(1386-1398)Online publication date: 1-Feb-2023
  • (2023)Fast Approximate Pattern Counting For Static Graphs2023 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT)10.1109/ICFEICT59519.2023.00102(584-588)Online publication date: May-2023
  • (2023)A distributed streaming framework for edge–cloud triangle counting in graph streamsKnowledge-Based Systems10.1016/j.knosys.2023.110878278:COnline publication date: 25-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media