Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/545381.545465acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
Article

Sampling from a moving window over streaming data

Published: 06 January 2002 Publication History

Abstract

We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is O(k) when maintaining a sample of size k over a window of the n most recent elements from the data stream, and with high probability the algorithm requires no more than O(k log n) memory.When the number of elements in the window is variable, as is the case when the size of the window is defined as a time duration rather than as a fixed number of data elements, the sampling problem becomes harder. Our second algorithm, "priority-sample", works even when the number of elements in the window can vary dynamically over time. With high probability, the "priority-sample" algorithm uses no more than O(k log n) memory.

References

[1]
C. R. Aragon and R. G. Seidel, Randomized Search Trees, Proc. of the 30th IEEE Symp. on Foundations of Computer Science, 1989, pp. 540-545.
[2]
S. Babu and J. Widom, Continuous Queries over Data Streams, ACM SIGMOD Record, 30.3 (2001), pp. 109-120.
[3]
K. Mulmuley, Computational Geometry: An Introduction through Randomized Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1994.
[4]
J. S. Vitter, Random Sampling with a Reservoir, ACM Trans. on Mathematical Software, 11(1985), pp. 31-35.
[5]
P. B. Gibbons and Y. Matias and V. Poosala, Fast Incremental Maintenance of Approximate Histograms, Proc. of the 23rd VLDB Conference, 1997, pp. 466-475.
[6]
Y. Matias and J. S. Vitter and M. Wang, Dynamic Maintenance of Wavelet-Based Histograms, Proc. of the 26th VLDB Conference, 2000, pp. 101-110.

Cited By

View all
  • (2021)Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate EdgesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452800(645-657)Online publication date: 9-Jun-2021
  • (2020)Large Very Dense Subgraphs in a Stream of EdgesProceedings of the 2020 ACM-IMS on Foundations of Data Science Conference10.1145/3412815.3416884(107-117)Online publication date: 19-Oct-2020
  • (2019)Better Sliding Window Algorithms to Maximize Subadditive and Diversity ObjectivesProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319701(254-268)Online publication date: 25-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SODA '02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
January 2002
1018 pages
ISBN:089871513X

Sponsors

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 06 January 2002

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate EdgesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452800(645-657)Online publication date: 9-Jun-2021
  • (2020)Large Very Dense Subgraphs in a Stream of EdgesProceedings of the 2020 ACM-IMS on Foundations of Data Science Conference10.1145/3412815.3416884(107-117)Online publication date: 19-Oct-2020
  • (2019)Better Sliding Window Algorithms to Maximize Subadditive and Diversity ObjectivesProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319701(254-268)Online publication date: 25-Jun-2019
  • (2018)Distinct Sampling on Streaming Data with Near-DuplicatesProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196978(369-382)Online publication date: 27-May-2018
  • (2018)Optimal algorithms for selecting top-k combinations of attributesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0485-227:1(27-52)Online publication date: 1-Feb-2018
  • (2018)Sampling in Space Restricted SettingsAlgorithmica10.1007/s00453-017-0335-z80:5(1439-1458)Online publication date: 1-May-2018
  • (2017)Proportional Representation in Vote StreamsProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems10.5555/3091125.3091134(15-23)Online publication date: 8-May-2017
  • (2017)Approximate Query ProcessingProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3056097(511-519)Online publication date: 9-May-2017
  • (2016)Matrix Sketching Over Sliding WindowsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915228(1465-1480)Online publication date: 26-Jun-2016
  • (2016)From Business Intelligence to semantic data stream managementFuture Generation Computer Systems10.1016/j.future.2015.11.01563:C(100-107)Online publication date: 1-Oct-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media