Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1064212.1064221acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
Article

A data streaming algorithm for estimating subpopulation flow size distribution

Published: 06 June 2005 Publication History

Abstract

Statistical information about the flow sizes in the traffic passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and improve network performance through traffic engineering. Previous work on estimating the flow size distribution for the complete population of flows has produced techniques that either make inferences from sampled network traffic, or use data streaming approaches. In this work, we identify and solve a more challenging problem of estimating the size distribution and other statistical information about arbitrary subpopulations of flows. Inferring subpopulation flow statistics is more challenging than the complete population counterpart, since subpopulations of interest are often specified a posteriori (i.e., after the data collection is done), making it impossible for the data collection module to "plan in advance".Our solution consists of a novel mechanism that combines data streaming with traditional packet sampling to provide highly accurate estimates of subpopulation flow statistics. The algorithm employs two data collection modules operating in parallel --- a NetFlow-like packet sampler and a streaming data structure made up of an array of counters. Combining the data collected by these two modules, our estimation algorithm uses a statistical estimation procedure that correlates and decodes the outputs (observations) from both data collection modules to obtain flow statistics for any arbitrary subpopulation. Evaluations of this algorithm on real-world Internet traffic traces demonstrate its high measurement accuracy.

References

[1]
N. Duffeld, C. Lund, and M. Thorup, "Estimating flow distribution from sampled flow statistic," in Proc. ACM SIGCOMM Aug. 2003.
[2]
N. Hohn and D. Veitch, "Inverting ampled traffic," in Proc. ACM SIGCOMM Internet Measurement Conference Oct. 2003.
[3]
A. Kumar, M. Sung, J. Xu, and J. Wang, "Data treaming algorithm for efficient and accurate estimation of flow size distribution," in Proc. ACM SIGMETRICS June 2004.
[4]
C. Estan and G. Varghese, "Bitmap algorithm for counting active flows on high speed links," in Proc. ACM SIGCOMM Internet Measurement Conference Oct. 2003.
[5]
P. Indyk, "Stable distributions, pseudorandom generators, embeddings and data tream computation,"in Proc. FOCS 2000, pp. 189--197.
[6]
"Cisco net flow," http://www.cisco.com/warp/public/732/Tech/netflow.
[7]
S. Muthukrishnan," Data streams: Algorithm and applications," available at http://athos.rutgers.edu/muthu/.
[8]
N. Duffeld, C. Lund, and M. Thorup, "Charging from sampled network usage,"in Proc. ACM SIGCOMM Internet Measurement Workshop Nov. 2001.
[9]
N. Duffeld, C. Lund, and M. Thorup, "Propertie and prediction of flow statistics from sampled packet streams," in Proc. ACM SIGCOMM Internet Measurement Workshop Nov.2002.
[10]
C. Estan, K. Keyes, D. Moore, and G. Varghese, "Building a better net flow," in Proc. ACM SIGCOMM Aug. 2004.
[11]
S. Ramabhadran and G. Varghese, "Effcient implementation of a statistic counter architecture," in Proc. ACM SIGMETRICS 2003.
[12]
A. Dempster, N. Laird, and D. Rubin, "Maxim likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B vol.39, no.1, pp. 1--38, 1977.
[13]
"National Laboratory for Applied Network Research," Traces available at http://moat.nlanr.net/Traces/.
[14]
N. Duffeld, "Private communication," Oct. 2004.
[15]
B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, "Sketch-based change detection:methods, evaluation, and applications,"in Proc. ACM SIGCOMM Internet Measurement Conference Oct. 2003.
[16]
C. Estan and G. Varghese, "New direction in traffic measurement and accounting," in Proc. ACM SIGCOMM Aug.2002.
[17]
B. Bloom, "Space/time trade-offs in hash coding with allowable errors," CACM vol. 13, no. 7, pp. 422--426, 1970.
[18]
L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary cache: a scalable wide-area Web cache sharing protocol," IEEE/ACM Transactions on Networking vol. 8, no. 3, pp. 281--293, 2000.
[19]
R. Motwani and P. Raghavan, Randomized Algorithms chapter 3.6,Cambridge University Press, 1995.

Cited By

View all
  • (2022)Regressing Image Sub-Population Distributions with Deep LearningSensors10.3390/s2223921822:23(9218)Online publication date: 27-Nov-2022
  • (2020)Accessible Streaming Algorithms for the Chi-Square TestProceedings of the 32nd International Conference on Scientific and Statistical Database Management10.1145/3400903.3400905(1-12)Online publication date: 7-Jul-2020
  • (2019)Continuously Distinct Sampling over Centralized and Distributed High Speed Data StreamsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286545230:2(300-314)Online publication date: 1-Feb-2019
  • Show More Cited By

Recommendations

Reviews

Angele M. Hamel

Monitoring flow in a network, particularly the flow size distribution, provides a means for better understanding network traffic. Instead of considering the entire flow, this paper estimates the flow size distribution for subpopulations of flows. These subpopulations may be defined, for example, by Internet protocol addresses or by protocol. The authors employ two data structures previously used in this area. One is a variant of sampled NetFlow used in Cisco routers [1]; the other is a streaming data structure that consists of an array of counters. A hash function hashes flow labels to produce an array index, and the counter in that index is augmented. This builds on the previous work of some of the authors [2]. The paper also designs a novel estimation algorithm that combines these data structures and uses a statistical approach to compute the subpopulation flow size distribution. The authors also test their algorithm against Internet traffic traces. The paper is a natural progression from previous work, and is a meaningful contribution to the field. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 2005
428 pages
ISBN:1595930221
DOI:10.1145/1064212
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 33, Issue 1
    Performance evaluation review
    June 2005
    417 pages
    ISSN:0163-5999
    DOI:10.1145/1071690
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EM algorithm
  2. data streaming
  3. flow statistics
  4. statistical inference
  5. traffic analysis

Qualifiers

  • Article

Conference

SIGMETRICS05

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Regressing Image Sub-Population Distributions with Deep LearningSensors10.3390/s2223921822:23(9218)Online publication date: 27-Nov-2022
  • (2020)Accessible Streaming Algorithms for the Chi-Square TestProceedings of the 32nd International Conference on Scientific and Statistical Database Management10.1145/3400903.3400905(1-12)Online publication date: 7-Jul-2020
  • (2019)Continuously Distinct Sampling over Centralized and Distributed High Speed Data StreamsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286545230:2(300-314)Online publication date: 1-Feb-2019
  • (2019)Detecting a Variety of Long-Term Stealthy User Behaviors on High Speed LinksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.287331931:10(1912-1925)Online publication date: 1-Oct-2019
  • (2019)Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities Over Time2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00101(1094-1105)Online publication date: Apr-2019
  • (2014)A New Sketch Method for Measuring Host Connection Degree DistributionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2014.23125449:6(948-960)Online publication date: 1-Jun-2014
  • (2013)Modeling residual-geometric flow samplingIEEE/ACM Transactions on Networking10.1109/TNET.2012.223143521:4(1090-1103)Online publication date: 1-Aug-2013
  • (2012)Router support for fine-grained latency measurementsIEEE/ACM Transactions on Networking10.1109/TNET.2012.218890520:3(811-824)Online publication date: 1-Jun-2012
  • (2011)A Data Streaming Method for Monitoring Host Connection Degrees of High-Speed LinksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2011.21230946:3(1086-1098)Online publication date: 1-Sep-2011
  • (2011)Modeling residual-geometric flow sampling2011 Proceedings IEEE INFOCOM10.1109/INFCOM.2011.5934980(1808-1816)Online publication date: Apr-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media