Nothing Special   »   [go: up one dir, main page]

Skip to main content

KTSketch: Finding k-Persistent t-Spread Flows in High-Speed Networks

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2024)

Abstract

Finding flows with a large spread (i.e., the number of distinct elements) in multiple periods has many practical applications, such as detecting network scanners, click fraud, and DDoS attacks, etc.. Most previous works focus on finding persistent flows or estimating flow spreads. To the best of our knowledge, there is no previous work to find the flows with spread exceeding the threshold frequently. In this paper, we study a new problem called finding k-persistent t-spread flows in high-speed networks, which detects the flows whose spreads exceed a preset threshold t during at least \(\lceil k*T\rceil \) out of T measurement periods, where \(k \!\in \!(0, 1]\), and the parameters can be arbitrarily defined. To this end, we propose a novel sketch, called KTSketch, to find k-persistent t-spread flows in real time. The key idea of KTSketch is first to find out those flows whose spread exceeds the threshold t in the current period as potential flows and then separately estimate their spreads to find k-persistent t-spread flows. We compare the performance of KTSketch with three baseline solutions (Bloom Filter+Count-Min, VHLL, and FreeBS-SSD). The experimental results show that KTSketch can achieve around 49.4%, 59.0%, 27.1% higher F1 score, 3.89, 2.13, 13.8 times higher throughput, and 43.1%, 81.2%, 72.2% lower ARE, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., Braverman, V.: One sketch to rule them all: rethinking network flow monitoring with UnivMon. In: ACM SIGCOMM (2016)

    Google Scholar 

  2. Huang, H., et al.: Spread estimation with non-duplicate sampling in high-speed networks. IEEE/ACM Trans. Networking 29(5), 2073–2086 (2021)

    Article  Google Scholar 

  3. Yang, T., Gong, J., Zhang, H., Zou, L., Shi, L., Li, X.: HeavyGuardian: separate and guard hot items in data streams. In: ACM SIGKDD (2018)

    Google Scholar 

  4. Yang, T., et al.: Elastic Sketch: adaptive and fast network-wide measurements In: ACM SIGCOMM (2018)

    Google Scholar 

  5. Ting, D.: Data sketches for disaggregated subset sum and frequent item estimation. In: ACM SIGMOD (2018)

    Google Scholar 

  6. Yoon, M., Li, T., Chen, S., Peir, J.-K.: Fit a spread estimator in small memory. In: IEEE INFOCOM (2009)

    Google Scholar 

  7. Xiao, Q., Chen, S., Chen, M., Ling, Y.: Hyper-compact virtual estimators for big network data based on register sharing. In: ACM SIGMETRICS (2015)

    Google Scholar 

  8. Jia, P., et al.: Accurately estimating user cardinalities and detecting super spreaders over time. IEEE Trans. Knowl. Data Eng. 34(1), 92–106 (2020)

    Article  Google Scholar 

  9. Dai, H., Shahzad, M., Liu, A.X., Li, M., Zhong, Y., Chen, G.: Identifying and estimating persistent items in data streams. IEEE/ACM Trans. Networking 26(6), 2429–2442 (2018)

    Article  Google Scholar 

  10. Dai, H., Li, M., Liu, A.X., Zheng, J., Chen, G.: Finding persistent items in distributed datasets. IEEE/ACM Trans. Networking 28(1), 1–14 (2019)

    Article  Google Scholar 

  11. Zhang, Y., et al.: On-off sketch: a fast and accurate sketch on persistence. Proc. VLDB Endowment 14(2), 128–140 (2020)

    Article  Google Scholar 

  12. Chen, S., Chen, M., Xiao, Q.: Traffic Measurement for Big Network Data. WN, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47340-6

    Book  Google Scholar 

  13. Haddadi, H.: Fighting online click-fraud using bluff ads. ACM SIGCOMM Comput. Commun. Rev. 40(2), 21–25 (2010)

    Article  Google Scholar 

  14. García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014). https://doi.org/10.1016/j.cose.2014.05.011

    Article  Google Scholar 

  15. Whang, K.-Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst. 15(2), 208–229 (1990)

    Article  Google Scholar 

  16. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Discrete Math. Theor. Comput. Sci. (2007). https://doi.org/10.46298/dmtcs.3545

  17. Fan, Z., et al.: Pisketch: finding persistent and infrequent flows. In: Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures (2022)

    Google Scholar 

  18. Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: IEEE ICNP (2014)

    Google Scholar 

  19. Huang, H., et al.: An efficient K-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Trans. Networking 28(4), 1463–1476 (2020)

    Article  Google Scholar 

  20. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)

    Article  MathSciNet  Google Scholar 

  21. Murmurhash. https://github.com/aappleby/smhasher/

  22. The CAIDA anonymized internet traces. https://www.caida.org/catalog/datasets/

  23. The network dataset internet traces. http://snap.stanford.edu/data/

  24. The data center dataset. https://pages.cs.wisc.edu/

Download references

Acknowledgement

This research was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62332013, 62102275, 62072322, U20A20182, in part by the NSF of Jiangsu in China under Grant BK20210704, and in part by China Postdoctoral Science Foundation under Grant 2023M732537.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guoju Gao or Yu-e Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, S. et al. (2024). KTSketch: Finding k-Persistent t-Spread Flows in High-Speed Networks. In: Zhang, W., Tung, A., Zheng, Z., Yang, Z., Wang, X., Guo, H. (eds) Web and Big Data. APWeb-WAIM 2024. Lecture Notes in Computer Science, vol 14964. Springer, Singapore. https://doi.org/10.1007/978-981-97-7241-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-7241-4_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-7240-7

  • Online ISBN: 978-981-97-7241-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics