Abstract
Currently, the proportion of unknown traffic in networks continues to increase. This poses great challenges to the management and security of cyberspace. The unknown traffic refers to network traffic generated by previously unknown protocols in a preconstructed traffic identification system. Measures to address this challenge can be developed by grouping the mixed unknown traffic into multiple clusters, where, ideally, each cluster contains just one traffic class. In this paper, we propose a novel scheme for clustering unknown traffic, named dual-path autoencoder-based clustering, to discover protocol-based traffic classes. The dual-path autoencoder model refers to the combination of convolutional autoencoder and deep autoencoder, which realizes the extraction and aggregation of payload features and statistical features. Then, the fusion feature is clustered by the correlation-adjusted clustering module, and the unknown traffic flows are divided into multiple high-purity clusters. To evaluate our scheme, experiments are conducted on two public network traffic datasets and one campus network dataset. Using seven common application layer protocols to simulate unknown traffic, the evaluation results show that our scheme can achieve above 98% on each dataset when the preset number of clusters is 60. This establishes the effectiveness of the proposed scheme for clustering unknown network protocols.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The ISCX2012 and ISCX_Botnet datasets that support the findings of this study are available in “https://www.unb.ca/cic/datasets/”. The selfDataset that support the findings of this study is available on request from the corresponding author. The selfDataset is not publicly available due to it containing information that could compromise research participant privacy.
References
Biersack E, Callegari C, Matijasevic M (2013) Data traffic monitoring and analysis: from measurement, classification, and anomaly detection to quality of experience. Lect Notes Comput Sci 5(23):12561–12570
Rezaei S, Liu X (2019) Deep learning for encrypted traffic classification: an overview. IEEE Commun Mag 57(5):76–81
Zhang J, Xiao C, Yang X, Zhou W, Jie W (2015) Robust network traffic classification. IEEE/ACM Trans Netw 23(4):1257–1270
Zhang Y, Zhao S, Sang Y (2019) Towards unknown traffic identification using deep auto-encoder and constrained clustering. In: International conference on computational science
Chen Y, Li Z, Shi J, Gou G, Xiong G (2020) Not afraid of the unseen: a siamese network based scheme for unknown traffic discovery. In: IEEE symposium on computers and communications (ISCC)
Yang Z, Lin W (2020) Unknown traffic identification based on deep adaptation networks. In: IEEE 45th LCN symposium on emerging topics in networking (LCN symposium), pp 10–18
Qin M, Lei K, Bai B, Zhang G (2019) Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns. In: SIGCOMM 2019 NetAI workshop
Palmieri F, Fiore U (2009) A nonlinear, recurrence-based approach to traffic classification. Comput Netw 53(6):761–773
Tongaonkar A, Keralapura R, Nucci A (2013) Santaclass: a self adaptive network traffic classification system. IFIP Netw Conf 2013:1–9
Yun X, Wang Y, Zhang Y, Zhou Y (2016) A semantics-aware approach to the automated network protocol identification. IEEE/ACM Trans Netw 24(1):583–595
Wang Y, Yun X, Zhang Y (2015) Rethinking robust and accurate application protocol identification: a nonparametric approach. In: IEEE 23rd International conference on network protocols (ICNP)
Zhao S, Zhang Y, Sang Y (2019) Towards unknown traffic identification via embeddings and deep autoencoders. In: 26th International conference on telecommunications (ICT)
Sun F, Wang S, Zhang C, Zhang H (2020) Clustering of unknown protocol messages based on format comparison. Comput Netw 179:107296
Zhang J, Yang X, Zhou W, Yu W (2013) Unsupervised traffic classification using flow statistical properties and IP packet payload. J Comput Syst Sci 79(5):573–585
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213
Aouini Z, Pekar A (2022) NFStream: a flexible network data analysis framework. Comput Netw 204:108719. https://doi.org/10.1016/j.comnet.2021.108719
Deri L, Martinelli M, Bujlow T, Cardigliano A (2014) ndpi: Open-source high-speed deep packet inspection. In: International wireless communications and mobile computing conference (IWCMC), pp 617–622. https://doi.org/10.1109/IWCMC.2014.6906427
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Liu Y, Zhang S, Ding B, Li X, Wang Y (2018) A cascade forest approach to application classification of mobile traces, pp 1–6
Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96
Kingma D.P, Ba J (2014) Adam: a method for stochastic optimization, pp 273–297. arXiv preprint. arXiv:1412.6980
Chiu K-C, Liu C-C, Chou L-D (2020) CAPC: packet-based network service classifier with convolutional autoencoder. IEEE Access 8:218081–218094
Erman J, Arlitt MF, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2nd annual ACM workshop on mining network data, MineNet 2006, Pisa, Italy
Usama M, Qadir J, Raza A et al (2017) Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access 7:65579–65615
Baldi M, Baldini A, Cascarano N, Risso F (2009) Service-based traffic classification: principles and validation. In: IEEE Sarnoff symposium
Cascarano N, Risso F, Torino PD, Este A, Gringoli F, Salgarelli L, Finamore R, Mellia M (2010) Comparing P2PTV traffic classifiers. In: IEEE Xplore
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374. https://doi.org/10.1016/j.cose.2011.12.012
Beigi EB, Jazi HH, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: IEEE conference on communications and network security, pp 247–255. https://doi.org/10.1109/CNS.2014.6997492
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, Y., Li, X., Li, X. et al. Clustering unknown network traffic with dual-path autoencoder. Neural Comput & Applic 35, 8955–8966 (2023). https://doi.org/10.1007/s00521-022-08138-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08138-9