Abstract
We study the subgraph counting problem in data streams. We provide the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G. Our estimator works in the turnstile model, i.e., can handle both edge-insertions and edge-deletions, and is applicable in a distributed setting. Prior to this work, only for a few non-regular graphs estimators were known in case of edge-insertions, leaving the problem of counting general subgraphs in the turnstile model wide open. We further demonstrate the applicability of our estimator by analyzing its concentration for several graphs H and the case where G is a power law graph.
This material is based upon work supported by the National Science Foundation under Award No. 1103688.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Spencer, J.: The Probabilistic Method, 3rd edn. Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons (2008)
Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proc. 13th Symp. on Discrete Algorithms (SODA), pp. 623–632 (2002)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proc. 14th Intl. Conf. Knowledge Discovery and Data Mining (KDD), pp. 16–24 (2008)
Bordino, I., Donato, D., Gionis, A., Leonardi, S.: Mining large networks with subgraph counting. In: Proc. 8th Intl. Conf. on Data Mining (ICDM), pp. 737–742 (2008)
Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: Proc. 25th Symp. Principles of Database Systems (PODS), pp. 253–262 (2006)
Buriol, L.S., Frahling, G., Leonardi, S., Sohler, C.: Estimating Clustering Indexes in Data Streams. In: Arge, L., Hoffmann, M., Welzl, E. (eds.) ESA 2007. LNCS, vol. 4698, pp. 618–632. Springer, Heidelberg (2007)
Ganguly, S.: Estimating Frequency Moments of Data Streams Using Random Linear Combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 369–380. Springer, Heidelberg (2004)
Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear-time. SIAM J. Disc. Math. 25(3), 1365–1411 (2011)
Jowhari, H., Ghodsi, M.: New Streaming Algorithms for Counting Triangles in Graphs. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 710–716. Springer, Heidelberg (2005)
Manjunath, M., Mehlhorn, K., Panagiotou, K., Sun, H.: Approximate Counting of Cycles in Streams. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 677–688. Springer, Heidelberg (2011)
McGregor, A.: Open Problems in Data Streams and Related Topics, IITK Workshop on Algorithms For Data Sreams (2006), http://www.cse.iitk.ac.in/users/sganguly/data-stream-probs.pdf
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: Simple building blocks of complex networks. Science 298(5594), 824–827 (2002)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)
Newman, M.E.J.: The structure and function of complex networks. SIAM Review 45, 167–256 (2003)
Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)
Wong, E., Baur, B., Quader, S., Huang, C.: Biological network motif detection: principles and practice. Briefings in Bioinformatics, 1–14 (June 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kane, D.M., Mehlhorn, K., Sauerwald, T., Sun, H. (2012). Counting Arbitrary Subgraphs in Data Streams. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds) Automata, Languages, and Programming. ICALP 2012. Lecture Notes in Computer Science, vol 7392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31585-5_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-31585-5_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31584-8
Online ISBN: 978-3-642-31585-5
eBook Packages: Computer ScienceComputer Science (R0)