Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/545381.545466acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
Article

Maintaining stream statistics over sliding windows: (extended abstract)

Published: 06 January 2002 Publication History

Abstract

We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that using O(1/e log2N) bits of memory, we can estimate the number of 1's to within a factor of 1 + ε. We also give a matching lower bound of Ω(1/e log2 N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers. We provide matching upper and lower bounds for this more general problem as well. We apply our techniques to obtain efficient algorithms for the Lp norms (for p ε [1, 2]) of vectors under the sliding window model. Using the algorithm for the basic counting problem, one can adapt many other techniques to work for the sliding window model, with a multiplicative overhead of O(1/εlog N) in memory and a 1 + ε factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.

References

[1]
N. Alon, Y. Matias, M. Szegedy. The space complexity of approximating the frequency moments. In Proc. Twenty-Eighth Annual ACM Symposium on Theory of Computing, 1996.
[2]
C. Cortes, K. Fisher, D. Pregibon, A. Rogers. Hancock: a language for extracting signatures from data streams. In Proc. 2000 ACM SIGKDD, pp. 9-17, 2000.
[3]
S. Chaudhuri, R. Motwani, V. R. Narasayya. On random sampling over joins. In Proc. 1999 ACM SIGMOD, pp. 263-274, 1999.
[4]
M. Fang, H. Garcia-Molina, R. Motwani, N. Shivakumar, J. D. Ullman. Computing iceberg queries efficiently. In Proc. 24th International Conference on Very Large Data Bases (VLDB), 1998.
[5]
J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In Proc. 40th Symposium on Foundations of Computer Science, 1999.
[6]
P. Flajolet, G. Martin. Probabilistic Counting. In Proc. 24th Symposium on Foundations of Computer Science, 1983.
[7]
C. Fraleigh, S. Moon, C. Diot, B. Lyles, F. Tobagi. Architecture of a passive monitoring system for backbone IP networks. Technical Report TR00-ATL-101-801, Sprint Labs, 2000.
[8]
S. Guha, N. Koudas, K. Shim. Data-Streams and Histograms. To appear in Proc. Thirty-Third Annual ACM Symposium on Theory of Computing, 2001.
[9]
S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering data streams. In Proc. 2000 Annual IEEE Symp. on Foundations of Computer Science, pages 359-366, 2000.
[10]
M. R. Henzinger, P. Raghavan, S. Rajagopalan. Computing on data streams. Technical Report TR 1998-011, Compaq Systems Research Center, Palo Alto, California, May 1998.
[11]
P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. In Proc. 41st Symposium on Foundations of Computer Science, 2000.
[12]
Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik, Suel. Optimal Histograms with Quality Guarantees. In Proc. 24th International Conference on Very Large Data Bases (VLDB), 1998.
[13]
R. Motwani, P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[14]
Netflow Services and Applications. Whitepaper, Cisco Systems, 2000. Available at http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SODA '02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
January 2002
1018 pages
ISBN:089871513X

Sponsors

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 06 January 2002

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Machine Comprehension of Spoken ContentIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.291349927:9(1469-1480)Online publication date: 1-Sep-2019
  • (2019)Multivariate network traffic analysis using clustered patternsComputing10.1007/s00607-018-0619-4101:4(339-361)Online publication date: 1-Apr-2019
  • (2018)OLMThe Journal of Supercomputing10.1007/s11227-017-2181-974:2(637-664)Online publication date: 1-Feb-2018
  • (2016)Efficient Detection of Flow Anomalies with Limited Monitoring ResourcesProceedings of the 12th Conference on International Conference on Network and Service Management10.5555/3375069.3375076(55-63)Online publication date: 31-Oct-2016
  • (2016)Incremental computation of common windowed holistic aggregatesProceedings of the VLDB Endowment10.14778/2994509.29945379:12(1221-1232)Online publication date: 1-Aug-2016
  • (2016)GStreamMinerProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983341(2489-2492)Online publication date: 24-Oct-2016
  • (2016)TSCluWinProceedings, Part II, of the 21st International Conference on Database Systems for Advanced Applications - Volume 964310.1007/978-3-319-32049-6_9(133-148)Online publication date: 16-Apr-2016
  • (2015)Estimating mutual information on data streamsProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791348(1-12)Online publication date: 29-Jun-2015
  • (2014)Min-d-occurProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020790(370-379)Online publication date: 23-Jul-2014
  • (2014)Indexing for summary queriesACM Transactions on Database Systems10.1145/250870239:1(1-39)Online publication date: 6-Jan-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media