Abstract
With the rapid growth of link speed, obtaining detailed traffic statistics becomes much more difficult. In order to reduce the resource consumption of measurement systems, more and more passive traffic measurement employs sampling at the packet level. Packet sampling has become an attractive and scalable method to measure flow data on high-speed links. However, knowing the length distributions of traffic flows passing through a network link is very useful for network operation and management. In this paper, we consider the problem of estimating flow length distributions based on sampled flow data. This paper introduces two algorithms for estimating flow length distributions, fitting estimation and factoring estimation. The fitting estimation uses piecewise Pareto distribution to fit the original traffic for a small sampling period. The factoring estimation is used for a large sampling period. It first factorizes the large sampling period into a product of some smaller integer factors, then iteratively invokes fitting estimation or other algorithms such as \({\textit{EM}}\) in the arranged order of the factors. Evaluations of the proposed algorithms on large Internet traces obtained from several sources demonstrate that they have high measurement accuracy with low computation overhead. The algorithms allow us to recover the complete flow length distributions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li T, Chen S, Ling Y (2012) Per-flow traffic measurement through randomized counter sharing. IEEE/ACM Trans Netw 20(5):1622–1634
Hu C, Liu B, Wang S, Tian J, Cheng Y, Chen Y (2012) ANLS: adaptive non-linear sampling method for accurate flow size measurement. IEEE Trans commun 60(3):789–798
Marold A, Lieven P, Scheuermann B (2012) Probabilistic parallel measurement of network traffic at multiple locations. IEEE Netw 26(1):6–12
Yoon M, Li T, Chen S, Peir J (2011) Fit a compact spread estimator in small high-speed memory. IEEE/ACM Trans Netw 19(5):1253–1264
Duffield N, Lund C, Thorup M (2005) Estimating flow distributions from sampled flow statistics. IEEE/ACM Trans Netw 13(5):933–945
Wojcik R, Jajszczyk A (2012) Flow oriented approaches to QoS assurance. ACM Comput Surv 44(1):5
Hohn N, Veitch D (2006) Inverting sampled traffic. IEEE/ACM Trans Netw 14(1):68–80
Kumar A, Xu J, Li L, Wang J (2004) Space code bloom filter for efficient traffic flow measurement. In: IEEE INFOCOM 2004, pp 1762–1773
Kumar A, Sung M, Xu J, Wang J (2004) Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: ACM SIGMETRICS 2004, pp 177–188
Packet Sampling (psamp) (2005) http://www.ietf.org/html.charters/psamp-charter.html. Accessed 02 Feb 2005
Duffield N, Grossglauser M (2001) Trajectory sampling for direct traffic observation. IEEE/ACM Trans Netw 9(3):280–292
Duffield N, Grossglauser M (2004) Trajectory sampling with unreliable reporting. In: IEEE Infocom 2004, pp 1570–1581
Sampled Cisco http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_guide09186a0080081201.html. Accessed 8 Dec 2002
NeTraMet Version 4.4. http://www2.auckland.ac.nz/net/Accounting/ntm.Release.note.html. Accessed Dec 2002
Duffield N, Lund C, Thorup M (2002) Properties and prediction of flow statistics from sampled packet streams. In: ACM SIGCOMM internet measurement, workshop 2002, November, pp 159–171
Yang L, Michailidis G (2007) Sampled based estimation of network traffic flow characteristics. In: IEEE INFOCOM 2007, pp 1775–1783
Jeff Wu CF (1983) On the convergence properties of the EM algorithm. Ann Stat. 11(1):95–103
Loiseau P, Gonçalves P, Girard S, Forbes F, Vicat-Blanc P (2009) Maximum likelihood estimation of the flow size distribution tail index from sampled packet data. In: ACM sigmetrics conference, Seattle, WA, USA, June 2009, pp 263–274
Ribeiro B, Towsley D, Ye T, Bolot J (2006) Fisher information on sampled packets: an application to flow size estimation. In: ACM/SIGCOMM internet measurement conference, Rio de Janeiro, Oct 2006, pp 15–26
Tune P, Veitch D (2008) Towards optimal sampling for flow size estimation. In: ACM IMC 2008, October
Li K, Shen H (2005) Coordinated enroute multimedia object caching in transcoding proxies for tree networks. ACM Trans Multimed Comput Commun Appl 5(3):289–314
Li K, Shen H, Chin F, Zheng S (2005) Optimal methods for coordinated enroute web caching for tree networks. ACM Trans Internet Technol 5(3):480–507
Hardy GH, Wright EM (1979) An introduction to the theory of numbers, 5th edn. Oxford University Press, New York
WIDE (2012) http://tracer.csl.sony.co.jp/mawi/samplepoint-F/
JSLAB (2012) http://ntds.njnet.edu.cn/data/index.php. Accessed Oct 2012
NLANR (2012) ftp://wits.cs.waikato.ac.nz/pma/long/ipls/3/
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported in part by National Natural Science Foundation of China under Grant No.61370198 and 61370199, the Scientific Research Fund of Liaoning Provincial Education Department under Grant No.L2013195, and the Fundamental Research Funds for the Central Universities under Grant No.3132013040 and 3132013335.
Rights and permissions
About this article
Cite this article
Liu, W., Qu, W. & Liu, Z. Scalable algorithms for estimating flow length distributions from sampled data. Computing 96, 527–543 (2014). https://doi.org/10.1007/s00607-014-0386-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0386-9