Abstract
Finding flows with a large spread (i.e., the number of distinct elements) in multiple periods has many practical applications, such as detecting network scanners, click fraud, and DDoS attacks, etc.. Most previous works focus on finding persistent flows or estimating flow spreads. To the best of our knowledge, there is no previous work to find the flows with spread exceeding the threshold frequently. In this paper, we study a new problem called finding k-persistent t-spread flows in high-speed networks, which detects the flows whose spreads exceed a preset threshold t during at least \(\lceil k*T\rceil \) out of T measurement periods, where \(k \!\in \!(0, 1]\), and the parameters can be arbitrarily defined. To this end, we propose a novel sketch, called KTSketch, to find k-persistent t-spread flows in real time. The key idea of KTSketch is first to find out those flows whose spread exceeds the threshold t in the current period as potential flows and then separately estimate their spreads to find k-persistent t-spread flows. We compare the performance of KTSketch with three baseline solutions (Bloom Filter+Count-Min, VHLL, and FreeBS-SSD). The experimental results show that KTSketch can achieve around 49.4%, 59.0%, 27.1% higher F1 score, 3.89, 2.13, 13.8 times higher throughput, and 43.1%, 81.2%, 72.2% lower ARE, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., Braverman, V.: One sketch to rule them all: rethinking network flow monitoring with UnivMon. In: ACM SIGCOMM (2016)
Huang, H., et al.: Spread estimation with non-duplicate sampling in high-speed networks. IEEE/ACM Trans. Networking 29(5), 2073–2086 (2021)
Yang, T., Gong, J., Zhang, H., Zou, L., Shi, L., Li, X.: HeavyGuardian: separate and guard hot items in data streams. In: ACM SIGKDD (2018)
Yang, T., et al.: Elastic Sketch: adaptive and fast network-wide measurements In: ACM SIGCOMM (2018)
Ting, D.: Data sketches for disaggregated subset sum and frequent item estimation. In: ACM SIGMOD (2018)
Yoon, M., Li, T., Chen, S., Peir, J.-K.: Fit a spread estimator in small memory. In: IEEE INFOCOM (2009)
Xiao, Q., Chen, S., Chen, M., Ling, Y.: Hyper-compact virtual estimators for big network data based on register sharing. In: ACM SIGMETRICS (2015)
Jia, P., et al.: Accurately estimating user cardinalities and detecting super spreaders over time. IEEE Trans. Knowl. Data Eng. 34(1), 92–106 (2020)
Dai, H., Shahzad, M., Liu, A.X., Li, M., Zhong, Y., Chen, G.: Identifying and estimating persistent items in data streams. IEEE/ACM Trans. Networking 26(6), 2429–2442 (2018)
Dai, H., Li, M., Liu, A.X., Zheng, J., Chen, G.: Finding persistent items in distributed datasets. IEEE/ACM Trans. Networking 28(1), 1–14 (2019)
Zhang, Y., et al.: On-off sketch: a fast and accurate sketch on persistence. Proc. VLDB Endowment 14(2), 128–140 (2020)
Chen, S., Chen, M., Xiao, Q.: Traffic Measurement for Big Network Data. WN, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47340-6
Haddadi, H.: Fighting online click-fraud using bluff ads. ACM SIGCOMM Comput. Commun. Rev. 40(2), 21–25 (2010)
García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014). https://doi.org/10.1016/j.cose.2014.05.011
Whang, K.-Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst. 15(2), 208–229 (1990)
Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Discrete Math. Theor. Comput. Sci. (2007). https://doi.org/10.46298/dmtcs.3545
Fan, Z., et al.: Pisketch: finding persistent and infrequent flows. In: Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures (2022)
Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: IEEE ICNP (2014)
Huang, H., et al.: An efficient K-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Trans. Networking 28(4), 1463–1476 (2020)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)
Murmurhash. https://github.com/aappleby/smhasher/
The CAIDA anonymized internet traces. https://www.caida.org/catalog/datasets/
The network dataset internet traces. http://snap.stanford.edu/data/
The data center dataset. https://pages.cs.wisc.edu/
Acknowledgement
This research was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62332013, 62102275, 62072322, U20A20182, in part by the NSF of Jiangsu in China under Grant BK20210704, and in part by China Postdoctoral Science Foundation under Grant 2023M732537.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, S. et al. (2024). KTSketch: Finding k-Persistent t-Spread Flows in High-Speed Networks. In: Zhang, W., Tung, A., Zheng, Z., Yang, Z., Wang, X., Guo, H. (eds) Web and Big Data. APWeb-WAIM 2024. Lecture Notes in Computer Science, vol 14964. Springer, Singapore. https://doi.org/10.1007/978-981-97-7241-4_21
Download citation
DOI: https://doi.org/10.1007/978-981-97-7241-4_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-7240-7
Online ISBN: 978-981-97-7241-4
eBook Packages: Computer ScienceComputer Science (R0)