Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3630050.3630176acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

DataZoo: Streamlining Traffic Classification Experiments

Published: 05 December 2023 Publication History

Abstract

The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification. DataZoo provides a standardized API for accessing three extensive datasets--CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-YEAR22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.

References

[1]
Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2019. Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges. IEEE Transactions on Network and Service Management 16, 2 (June 2019), 445--458. https://doi.org/10.1109/TNSM.2019.2899085
[2]
Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2021. DISTILLER: Encrypted traffic classification via multimodal multitask deep learning. Journal of Network and Computer Applications 183--184 (June 2021), 102985. https://doi.org/10.1016/j.jnca.2021.102985
[3]
Iman Akbari, Mohammad A. Salahuddin, Leni Ven, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, and Stephane Tuffin. 2021. A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 1 (Feb. 2021), 04:1--04:26. https://doi.org/10.1145/3447382
[4]
Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques 14, 3 (Aug. 2018), 195--211. https://doi.org/10.1007/s11416-017-0306-6
[5]
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. 2022. Dos and don'ts of machine learning in computer security. In 31st USENIX Security Symposium (USENIX Security 22). 3971--3988.
[6]
Thilini Dahanayaka, Yasod Ginige, Yi Huang, Guillaume Jourjon, and Suranga Seneviratne. 2023. Robust open-set classification for encrypted traffic fingerprinting. Computer Networks 236 (2023), 109991. https://doi.org/10.1016/j.comnet.2023.109991
[7]
Gerard Draper-Gil, Arash Habibi Lashkari, Mohammad Saiful Islam Mamun, and Ali A. Ghorbani. 2016. Characterization of Encrypted and VPN Traffic using Time-related Features:. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Rome, Italy, 407--414. https://doi.org/10.5220/0005740704070414
[8]
Kevin Fauvel, Fuxing Chen, and Dario Rossi. 2023. A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD '23). Association for Computing Machinery, New York, NY, USA, 4013--4023. https://doi.org/10.1145/3580305.3599762
[9]
Corey Horien, Stephanie Noble, Abigail S Greene, Kangjoo Lee, Daniel S Barron, Siyuan Gao, David O'Connor, Mehraveh Salehi, Javid Dadashkarimi, Xilin Shen, et al. 2021. A hitchhiker's guide to working with large, open-source neuroimaging datasets. Nature human behaviour 5, 2 (2021), 185--193.
[10]
Ying Hu, Guang Cheng, Wenchao Chen, and Bomiao Jiang. 2022. Attribute-Based Zero-Shot Learning for Encrypted Traffic Classification. IEEE Transactions on Network and Service Management 19, 4 (2022), 4583--4599. https://doi.org/10.1109/TNSM.2022.3183247
[11]
Ding Li, Wenzhong Li, Xiaoliang Wang, Cam-Tu Nguyen, and Sanglu Lu. 2020. App trajectory recognition over encrypted internet traffic based on deep neural network. Computer Networks 179 (Oct. 2020), 107372. https://doi.org/10.1016/j.comnet.2020.107372
[12]
Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li. 2021. Energy-based Out-of-distribution Detection. arXiv:2010.03759 (April 2021). http://arxiv.org/abs/2010.03759
[13]
Jan Luxemburk and Tomáš Čejka. 2023. Fine-grained TLS services classification with reject option. Computer Networks 220 (Jan. 2023), 109467. https://doi.org/10.1016/j.comnet.2022.109467
[14]
Jan Luxemburk, Karel Hynek, Tomáš Čejka, Andrej Lukačovič, and Pavel Šiška. 2023. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines. Data in Brief 46 (Feb. 2023), 108888. https://doi.org/10.1016/j.dib.2023.108888
[15]
Jan Luxemburk, Karel Hynek, and Tomáš Čejka. 2023. Encrypted traffic classification: the QUIC case. In 2023 7th Network Traffic Measurement and Analysis Conference (TMA). 1--10. https://doi.org/10.23919/TMA58422.2023.10199052
[16]
Navid Malekghaini, Elham Akbari, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, and Stephane Tuffin. 2023. Deep learning for encrypted traffic classification in the face of data drift: An empirical study. Computer Networks 225 (2023), 109648. https://doi.org/10.1016/j.comnet.2023.109648
[17]
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. {TESSERACT}: Eliminating experimental bias in malware classification across space and time. In 28th USENIX Security Symposium (USENIX Security 19). 729--746.
[18]
Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2020. Large-scale Mobile App Identification Using Deep Learning. IEEE Access 8 (2020), 348--362. https://doi.org/10.1109/ACCESS.2019.2962018
[19]
Wazen M. Shbair, Thibault Cholez, Jerome Francois, and Isabelle Chrisment. 2016. A multi-level framework to identify HTTPS services. In NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium. 240--248. https://doi.org/10.1109/NOMS.2016.7502818
[20]
Vincent F. Taylor, Riccardo Spolaor, Mauro Conti, and Ivan Martinovic. 2018. Robust Smartphone App Identification via Encrypted Network Traffic Analysis. IEEE Transactions on Information Forensics and Security 13, 1 (Jan. 2018), 63--78. https://doi.org/10.1109/TIFS.2017.2737970
[21]
Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, and Dario Rossi. 2022. AppClassNet: A Commercial-Grade Dataset for Application Identification Research. SIGCOMM Comput. Commun. Rev. 52, 3 (sep 2022), 19--27. https://doi.org/10.1145/3561954.3561958
[22]
Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng. 2017. Malware traffic classification using convolutional neural network for representation learning. In 2017 International Conference on Information Networking (ICOIN). 712--717. https://doi.org/10.1109/ICOIN.2017.7899588
[23]
Lixuan Yang, Alessandro Finamore, Feng Jun, and Dario Rossi. 2021. Deep Learning and Zero-Day Traffic Classification: Lessons Learned From a Commercial-Grade Dataset. IEEE Transactions on Network and Service Management 18, 4 (Dec. 2021), 4103--4118. https://doi.org/10.1109/tnsm.2021.3122940
[24]
Lixuan Yang and Dario Rossi. 2021. Thinkback: Task-Specific Out-of-Distribution Detection. arXiv:2107.06668 (July 2021). http://arxiv.org/abs/2107.06668

Cited By

View all
  • (2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
  • (2024)Analysis of Statistical Distribution Changes of Input Features in Network Traffic Classification DomainNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575630(1-4)Online publication date: 6-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAFE '23: Proceedings of the 2023 on Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking
December 2023
37 pages
ISBN:9798400704499
DOI:10.1145/3630050
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. QUIC
  2. TLS
  3. application identification
  4. encrypted traffic
  5. machine learning
  6. open datasets
  7. open-world evaluation
  8. toolset
  9. traffic classification

Qualifiers

  • Research-article

Funding Sources

  • Grant Agency of the Czech Technical University in Prague
  • Ministry of the Interior of the Czech Republic

Conference

CoNEXT 2023
Sponsor:

Upcoming Conference

CoNEXT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)14
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
  • (2024)Analysis of Statistical Distribution Changes of Input Features in Network Traffic Classification DomainNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575630(1-4)Online publication date: 6-May-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media