research-article

DataZoo: Streamlining Traffic Classification Experiments

Authors:

Karel HynekAuthors Info & Claims

SAFE '23: Proceedings of the 2023 on Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking

Pages 3 - 7

https://doi.org/10.1145/3630050.3630176

Published: 05 December 2023 Publication History

Abstract

The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification. DataZoo provides a standardized API for accessing three extensive datasets--CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-YEAR22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.

References

[1]

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2019. Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges. IEEE Transactions on Network and Service Management 16, 2 (June 2019), 445--458. https://doi.org/10.1109/TNSM.2019.2899085

[2]

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2021. DISTILLER: Encrypted traffic classification via multimodal multitask deep learning. Journal of Network and Computer Applications 183--184 (June 2021), 102985. https://doi.org/10.1016/j.jnca.2021.102985

[3]

Iman Akbari, Mohammad A. Salahuddin, Leni Ven, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, and Stephane Tuffin. 2021. A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 1 (Feb. 2021), 04:1--04:26. https://doi.org/10.1145/3447382

Digital Library

[4]

Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques 14, 3 (Aug. 2018), 195--211. https://doi.org/10.1007/s11416-017-0306-6

[5]

Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. 2022. Dos and don'ts of machine learning in computer security. In 31st USENIX Security Symposium (USENIX Security 22). 3971--3988.

[6]

Thilini Dahanayaka, Yasod Ginige, Yi Huang, Guillaume Jourjon, and Suranga Seneviratne. 2023. Robust open-set classification for encrypted traffic fingerprinting. Computer Networks 236 (2023), 109991. https://doi.org/10.1016/j.comnet.2023.109991

[7]

Gerard Draper-Gil, Arash Habibi Lashkari, Mohammad Saiful Islam Mamun, and Ali A. Ghorbani. 2016. Characterization of Encrypted and VPN Traffic using Time-related Features:. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Rome, Italy, 407--414. https://doi.org/10.5220/0005740704070414

[8]

Kevin Fauvel, Fuxing Chen, and Dario Rossi. 2023. A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD '23). Association for Computing Machinery, New York, NY, USA, 4013--4023. https://doi.org/10.1145/3580305.3599762

Digital Library

[9]

Corey Horien, Stephanie Noble, Abigail S Greene, Kangjoo Lee, Daniel S Barron, Siyuan Gao, David O'Connor, Mehraveh Salehi, Javid Dadashkarimi, Xilin Shen, et al. 2021. A hitchhiker's guide to working with large, open-source neuroimaging datasets. Nature human behaviour 5, 2 (2021), 185--193.

[10]

Ying Hu, Guang Cheng, Wenchao Chen, and Bomiao Jiang. 2022. Attribute-Based Zero-Shot Learning for Encrypted Traffic Classification. IEEE Transactions on Network and Service Management 19, 4 (2022), 4583--4599. https://doi.org/10.1109/TNSM.2022.3183247

[11]

Ding Li, Wenzhong Li, Xiaoliang Wang, Cam-Tu Nguyen, and Sanglu Lu. 2020. App trajectory recognition over encrypted internet traffic based on deep neural network. Computer Networks 179 (Oct. 2020), 107372. https://doi.org/10.1016/j.comnet.2020.107372

[12]

Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li. 2021. Energy-based Out-of-distribution Detection. arXiv:2010.03759 (April 2021). http://arxiv.org/abs/2010.03759

[13]

Jan Luxemburk and Tomáš Čejka. 2023. Fine-grained TLS services classification with reject option. Computer Networks 220 (Jan. 2023), 109467. https://doi.org/10.1016/j.comnet.2022.109467

Digital Library

[14]

Jan Luxemburk, Karel Hynek, Tomáš Čejka, Andrej Lukačovič, and Pavel Šiška. 2023. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines. Data in Brief 46 (Feb. 2023), 108888. https://doi.org/10.1016/j.dib.2023.108888

[15]

Jan Luxemburk, Karel Hynek, and Tomáš Čejka. 2023. Encrypted traffic classification: the QUIC case. In 2023 7th Network Traffic Measurement and Analysis Conference (TMA). 1--10. https://doi.org/10.23919/TMA58422.2023.10199052

[16]

Navid Malekghaini, Elham Akbari, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, and Stephane Tuffin. 2023. Deep learning for encrypted traffic classification in the face of data drift: An empirical study. Computer Networks 225 (2023), 109648. https://doi.org/10.1016/j.comnet.2023.109648

Digital Library

[17]

Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. {TESSERACT}: Eliminating experimental bias in malware classification across space and time. In 28th USENIX Security Symposium (USENIX Security 19). 729--746.

[18]

Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2020. Large-scale Mobile App Identification Using Deep Learning. IEEE Access 8 (2020), 348--362. https://doi.org/10.1109/ACCESS.2019.2962018

[19]

Wazen M. Shbair, Thibault Cholez, Jerome Francois, and Isabelle Chrisment. 2016. A multi-level framework to identify HTTPS services. In NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium. 240--248. https://doi.org/10.1109/NOMS.2016.7502818

Digital Library

[20]

Vincent F. Taylor, Riccardo Spolaor, Mauro Conti, and Ivan Martinovic. 2018. Robust Smartphone App Identification via Encrypted Network Traffic Analysis. IEEE Transactions on Information Forensics and Security 13, 1 (Jan. 2018), 63--78. https://doi.org/10.1109/TIFS.2017.2737970

[21]

Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, and Dario Rossi. 2022. AppClassNet: A Commercial-Grade Dataset for Application Identification Research. SIGCOMM Comput. Commun. Rev. 52, 3 (sep 2022), 19--27. https://doi.org/10.1145/3561954.3561958

Digital Library

[22]

Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng. 2017. Malware traffic classification using convolutional neural network for representation learning. In 2017 International Conference on Information Networking (ICOIN). 712--717. https://doi.org/10.1109/ICOIN.2017.7899588

[23]

Lixuan Yang, Alessandro Finamore, Feng Jun, and Dario Rossi. 2021. Deep Learning and Zero-Day Traffic Classification: Lessons Learned From a Commercial-Grade Dataset. IEEE Transactions on Network and Service Management 18, 4 (Dec. 2021), 4103--4118. https://doi.org/10.1109/tnsm.2021.3122940

[24]

Lixuan Yang and Dario Rossi. 2021. Thinkback: Task-Specific Out-of-Distribution Detection. arXiv:2107.06668 (July 2021). http://arxiv.org/abs/2107.06668

Cited By

Luxemburk JHynek K(2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
https://doi.org/10.23919/TMA62044.2024.10559009
Jančička LKoumar JSoukup DČejka T(2024)Analysis of Statistical Distribution Changes of Input Features in Network Traffic Classification DomainNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575630(1-4)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10575630

Index Terms

DataZoo: Streamlining Traffic Classification Experiments

Recommendations

Multi-classification approaches for classifying mobile app traffic

The growing usage of smartphones in everyday life is deeply (and rapidly) changing the nature of traffic traversing home and enterprise networks, and the Internet. Different tools and middleboxes, such as performance enhancement proxies, network ...
Learning for accurate classification of real-time traffic
CoNEXT '06: Proceedings of the 2006 ACM CoNEXT conference

Accurate network traffic classification is an important task. We intend to develop an intelligent classification system by learning the types of service inside a network flow using machine learning techniques. Previous work used Bayesian methods for ...
A survey of methods for encrypted traffic classification and analysis

With the widespread use of encrypted data transport, network traffic encryption is becoming a standard nowadays. This presents a challenge for traffic measurement, especially for analysis and anomaly detection methods, which are dependent on the type of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAFE '23: Proceedings of the 2023 on Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking

December 2023

37 pages

ISBN:9798400704499

DOI:10.1145/3630050

Program Chairs:
Kamal Singh
University St-Etienne, France
,
Abbas Bradai
University of Poitiers, France
,
Pham Tran Anh Quang
Huawei Technologies, France
,
Antonio Pescape
University of Napoli Federico II, Italy
,
Claudio Fiandrino
IMDEA Networks Institute, Madrid, Spain

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Grant Agency of the Czech Technical University in Prague
Ministry of the Interior of the Czech Republic

Conference

CoNEXT 2023

Sponsor:

SIGCOMM

CoNEXT 2023: The 19th International Conference on emerging Networking EXperiments and Technologies

December 8, 2023

Paris, France

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)14

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Luxemburk JHynek K(2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
https://doi.org/10.23919/TMA62044.2024.10559009
Jančička LKoumar JSoukup DČejka T(2024)Analysis of Statistical Distribution Changes of Input Features in Network Traffic Classification DomainNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575630(1-4)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10575630

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents