Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Random sampling with a reservoir

Published: 01 March 1985 Publication History

Abstract

We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1 + log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

References

[1]
BENTLEY, J.L. Personal communication, Apr. 1983; see {11}.
[2]
ERNVALL, J. AND NEVALAINEN, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (Jan. 1982), 45-47.
[3]
FAN, C. T., MULLER, M. E., AND REZUCHA, I. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57 (June 1962), 387-402.
[4]
FELLER, W. An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. Wiley, 1968.
[5]
FELLER, W. An Introduction to Probability Theory and Its Applications, vol. II, 2nd ed. Wiley, 1971.
[6]
JONES, T.G. A note on sampling a tape file. Commun. ACM 5, 6 (June 1962), 343.
[7]
KAWARASAKI, J., AND SIBUYA, M. Random numbers for simple random sampling without replacement. Keio Math. Sere. Rep. No. 7 (1982), 1-9.
[8]
KNUTH, D.E. The Art of Computer Programming. vol. 2: Seminumerical Algorithms, 2nd ed. Addison-Wesley, Reading, Mass., 1981.
[9]
SEDGEWICK, R. Algorithms. Addison-Wesley, Reading, Mass., 1981.
[10]
VITTER, J.S. Faster methods for random sampling. Commun. ACM 27, 7 (July 1984). 703-718.
[11]
VITTER, J.S. Optimum algorithms for two random sampling problems. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (Tucson, Az., Nov. 7-9), IEEE, New York, 1983 pp. 65-75.

Cited By

View all
  • (2024)Dynamic frequent subgraph mining algorithms over evolving graphs: a surveyPeerJ Computer Science10.7717/peerj-cs.236110(e2361)Online publication date: 8-Oct-2024
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileFuture Internet10.3390/fi1609033516:9(335)Online publication date: 13-Sep-2024
  • (2024)NLOCL: Noise-Labeled Online Continual LearningElectronics10.3390/electronics1313256013:13(2560)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Reviews

Prokop Vondracek

There is presented a fast algorithm for selecting a random sample of n records without replacement for a pool of N records, where the value of N is unknown beforehand. Algorithm Z, which does the sampling, is designed. It uses constant space and does the sampling in an expected time of O( n(1+log( N/ n))), which is optimum up to a constant factor. Algorithm Z outperforms current methods.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 11, Issue 1
March 1985
78 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3147
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1985
Published in TOMS Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,838
  • Downloads (Last 6 weeks)232
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic frequent subgraph mining algorithms over evolving graphs: a surveyPeerJ Computer Science10.7717/peerj-cs.236110(e2361)Online publication date: 8-Oct-2024
  • (2024)Parallel and Distributed Frugal Tracking of a QuantileFuture Internet10.3390/fi1609033516:9(335)Online publication date: 13-Sep-2024
  • (2024)NLOCL: Noise-Labeled Online Continual LearningElectronics10.3390/electronics1313256013:13(2560)Online publication date: 29-Jun-2024
  • (2024)quickPWCR: Quickly construct and rating large binary pairwiesd comparisonsCRAN: Contributed Packages10.32614/CRAN.package.quickPWCROnline publication date: 25-Apr-2024
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 1-Apr-2024
  • (2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
  • (2024)Subsampling-based modified Bayesian information criterion for large-scale stochastic block modelsElectronic Journal of Statistics10.1214/24-EJS230918:2Online publication date: 1-Jan-2024
  • (2024)Learning manifolds from non-stationary streamsJournal of Big Data10.1186/s40537-023-00872-811:1Online publication date: 23-Mar-2024
  • (2024)Continual learning for fault diagnosis considering variable working conditionsProceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability10.1177/1748006X241252469Online publication date: 21-Jun-2024
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media