Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Fast, Accurate and Provable Triangle Counting in Fully Dynamic Graph Streams

Published: 09 February 2020 Publication History

Abstract

Given a stream of edge additions and deletions, how can we estimate the count of triangles in it? If we can store only a subset of the edges, how can we obtain unbiased estimates with small variances?
Counting triangles (i.e., cliques of size three) in a graph is a classical problem with applications in a wide range of research areas, including social network analysis, data mining, and databases. Recently, streaming algorithms for triangle counting have been extensively studied since they can naturally be used for large dynamic graphs. However, existing algorithms cannot handle edge deletions or suffer from low accuracy.
Can we handle edge deletions while achieving high accuracy? We propose ThinkD, which accurately estimates the counts of global triangles (i.e., all triangles) and local triangles associated with each node in a fully dynamic graph stream with additions and deletions of edges. Compared to its best competitors, ThinkD is (a) Accurate: up to 4.3× more accurate within the same memory budget, (b) Fast: up to 2.2× faster for the same accuracy requirements, and (c) Theoretically sound: always maintaining estimates with zero bias (i.e., the difference between the true triangle count and the expected value of its estimate) and small variance. As an application, we use ThinkD to detect suddenly emerging dense subgraphs, and we show its advantages over state-of-the-art methods.

References

[1]
Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In KDD’14.
[2]
Nesreen K. Ahmed, Nick Duffield, Theodore L. Willke, and Ryan A. Rossi. 2017. On sampling from massive graph streams. PVLDB 10, 11 (2017), 1430--1441.
[3]
Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In WAW’09.
[4]
Brian Babcock, Mayur Datar, and Rajeev Motwani. 2002. Sampling from a moving window over streaming data. In SODA’02.
[5]
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and mapreduce. PVLDB 5, 5 (2012), 454--465.
[6]
Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA’02.
[7]
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.
[8]
Vladimir Batagelj and Matjaž Zaveršnik. 2007. Short cycle connectivity. Discrete Mathematics 307, 3 (2007), 310--318.
[9]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. TKDD 4, 3 (2010), 13.
[10]
Jonathan W. Berry, Bruce Hendrickson, Randall A. LaViolette, and Cynthia A. Phillips. 2011. Tolerating the community detection resolution limit with edge weighting. Physical Review E 83, 5 (2011), 056119.
[11]
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In WWW’13.
[12]
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC’15.
[13]
Paul G. Brown and Peter J. Haas. 2006. Techniques for warehousing of sample data. In ICDE’06.
[14]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In APPROX’00.
[15]
Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. TKDD 6, 4 (2012), 17.
[16]
Michael S. Crouch, Andrew McGregor, and Daniel Stubbs. 2013. Dynamic graphs in the sliding-window model. In ESA’13. Springer, 337--348.
[17]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SICOMP 31, 6 (2002), 1794--1813.
[18]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2017. TRIÈST: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 11, 4 (2017), 43.
[19]
Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Ismail Oner Sebe, Ahmed Taei, and Sunita Verma. 2015b. Ego-net community mining applied to friend suggestion. PVLDB 9, 4 (2015), 324--335.
[20]
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015a. Efficient densest subgraph computation in evolving graphs. In WWW’15.
[21]
Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. 2008. Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal 17, 2 (2008), 173--201.
[22]
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In VLDB’05.
[23]
Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report.
[24]
Bronwyn H. Hall, Adam B. Jaffe, and Manuel Trajtenberg. 2001. The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. Technical Report. National Bureau of Economic Research.
[25]
Guyue Han and Harish Sethu. 2017. Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs. In ASONAM’17.
[26]
Paul W. Holland and Samuel Leinhardt. 1971. Transitivity in structural models of small groups. Comp. Group Stud 2, 2 (1971), 107--124.
[27]
Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. TKDD 11, 4 (2017), 44.
[28]
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2014. I/O-efficient algorithms on triangle listing and counting. TODS 39, 4 (2014), 27.
[29]
Madhav Jha, Comandur Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In KDD’13.
[30]
Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In ICDM’15.
[31]
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014. Catchsync: Catching synchronized behavior in large directed graphs. In KDD’14.
[32]
Minsoo Jung, Sunmin Lee, Yongsub Lim, and U. Kang. 2016. FURL: Fixed-memory and uncertainty reducing local triangle counting for graph streams. Data Mining and Knowledge Discovery 33, 5 (2019), 1225--1253.
[33]
Andreas Kemper. 2009. Valuation of Network Effects in Software Markets: A Complex Networks Approach. Springer Science 8 Business Media.
[34]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In ICALP’09. 597--608.
[35]
Bryan Klimt and Yiming Yang. 2004. Introducing the enron corpus. In CEAS’04.
[36]
Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2010. Efficient triangle counting in large graphs via degree-based vertex partitioning. In WAW’10.
[37]
Konstantin Kutzkov and Rasmus Pagh. 2014. Triangle counting in dynamic graph streams. In SWAT’14.
[38]
Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data. 303--336.
[39]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.
[40]
Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In SPIRE’02. Springer, 1--10.
[41]
Yongsub Lim, Minsoo Jung, and U. Kang. 2018. Memory-efficient and accurate sampling for counting local triangles in graph streams: from simple to multigraphs. TKDD 12, 1 (2018), 4.
[42]
R. Duncan Luce and Albert D. Perry. 1949. A method of matrix analysis of group structure. Psychometrika 14, 2 (1949), 95--116.
[43]
Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In ASONAM’11.
[44]
Paolo Massa and Paolo Avesani. 2005. Controversial users demand local trust metrics: An experimental study on epinions. com community. In AAAI’05.
[45]
Andrew McGregor. 2014. Graph stream algorithms: A survey. ACM SIGMOD Record 43, 1 (2014), 9--20.
[46]
Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T Vu. 2015. Densest subgraph in dynamic graph streams. In MFCS’14.
[47]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In IMC’07.
[48]
Muhammad Anis Uddin Nasir, Aristides Gionis, Gianmarco De Francisci Morales, and Sarunas Girdzijauskas. 2017. Fully dynamic algorithm for top-k densest subgraphs. In CIKM’17.
[49]
Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Review 45, 2 (2003), 167--256.
[50]
Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a mapreduce implementation. Inform. Process. Lett. 112, 7 (2012), 277--281.
[51]
Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. PVLDB 6, 14 (2013), 1870--1881.
[52]
B. A. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. 2010. Eigenspokes: Surprising patterns and community structure in large graphs. In PAKDD’10.
[53]
Polina Rozenshtein, Nikolaj Tatti, and Aristides Gionis. 2017. Finding dynamic dense subgraphs. TKDD 11, 3 (2017), 27.
[54]
Kijung Shin. 2017. WRS: Waiting room sampling for accurate triangle counting in real graph streams. In ICDM’17.
[55]
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2018a. Patterns and anomalies in k-cores of real-world graphs with applications. Knowl. Inf. Syst. 54, 3 (2018), 677--710.
[56]
Kijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos. 2018b. Tri-Fly: Distributed estimation of global and local triangle counts in graph streams. In PAKDD’18.
[57]
Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2018c. Fast, accurate, and flexible algorithms for dense subtensor mining. TKDD 12, 3 (2018), 30.
[58]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017a. D-Cube: Dense-block detection in terabyte-scale tensors. In WSDM’17.
[59]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017b. DenseAlert: Incremental dense-subtensor detection in tensor streams. In KDD’17. ACM, 1057--1066.
[60]
Kijung Shin, Jisu Kim, Bryan Hooi, and Christos Faloutsos. 2018d. Think before you discard: Accurate triangle counting in graph streams with deletions. In ECML/PKDD’18.
[61]
Charles Spearman. 1904. The proof and measurement of association between two things. AJP 15, 1 (1904), 72--101.
[62]
Jun Sun, Jérôme Kunegis, and Steffen Staab. 2016. Predicting user roles in social networks using transfer learning with feature transformation. In ICDMW’16. IEEE, 128--135.
[63]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW’11.
[64]
Kanat Tangwongsan, Aduri Pavan, and Srikanta Tirthapura. 2013. Parallel triangle counting in massive streaming graphs. In CIKM’13.
[65]
Charalampos E Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In ICDM’08.
[66]
Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, and Christos Faloutsos. 2011. Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. SNAM 1, 2 (2011), 75--81.
[67]
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. Doulion: Counting triangles in massive graphs with a coin. In KDD’09.
[68]
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In WOSN’09.
[69]
Jeffrey S. Vitter. 1985. Random sampling with a reservoir. TOMS 11, 1 (1985), 37--57.
[70]
Pinghui Wang, Yiyan Qi, Yu Sun, Xiangliang Zhang, Jing Tao, and Xiaohong Guan. 2017. Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. PVLDB 11, 2 (2017), 162--175.
[71]
Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Vol. 8. Cambridge University Press.
[72]
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684 (1998), 440--442.
[73]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. KAIS 42, 1 (2015), 181--213.

Cited By

View all
  • (2024)Counting Butterflies in Fully Dynamic Bipartite Graph Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00226(2917-2930)Online publication date: 13-May-2024
  • (2024)Conditional heavy hitter monitoring and application of heterogeneous graph streams based on sketchesInformation Processing & Management10.1016/j.ipm.2024.10376261:4(103762)Online publication date: Jul-2024
  • (2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 2
April 2020
322 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3382774
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2020
Accepted: 01 October 2019
Revised: 01 August 2019
Received: 01 July 2018
Published in TKDD Volume 14, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Triangle counting
  2. edge deletions
  3. local triangles

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Artificial Intelligence Graduate School Program (KAIST)
  • National Science Foundation
  • Korea government MSIT
  • Institute of Information 8 Communications Technology Planning 8 Evaluation (IITP)
  • Army Research Laboratory
  • National Research Foundation of Korea (NRF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)236
  • Downloads (Last 6 weeks)19
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Counting Butterflies in Fully Dynamic Bipartite Graph Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00226(2917-2930)Online publication date: 13-May-2024
  • (2024)Conditional heavy hitter monitoring and application of heterogeneous graph streams based on sketchesInformation Processing & Management10.1016/j.ipm.2024.10376261:4(103762)Online publication date: Jul-2024
  • (2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
  • (2023)Balanced and Unbalanced Triangle Count in Signed NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327265735:12(12491-12496)Online publication date: 1-Dec-2023
  • (2023)Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph Counting on Fully Dynamic Graph Streams2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00088(1084-1097)Online publication date: Apr-2023
  • (2023)A distributed streaming framework for edge–cloud triangle counting in graph streamsKnowledge-Based Systems10.1016/j.knosys.2023.110878278:COnline publication date: 25-Oct-2023
  • (2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023
  • (2023)Interplay between topology and edge weights in real-world graphs: concepts, patterns, and an algorithmData Mining and Knowledge Discovery10.1007/s10618-023-00940-w37:6(2139-2191)Online publication date: 26-Jul-2023
  • (2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
  • (2023)Sliding window-based approximate triangle counting with bounded memory usageThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00783-332:5(1087-1110)Online publication date: 9-Mar-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media