Clustering unknown network traffic with dual-path autoencoder

Yating Fu¹,
Xuan Li²,
Xiaofan Li¹,
Shuyuan Zhao³ &
…
Fengyu Wang ORCID: orcid.org/0000-0003-2296-8410¹

358 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Currently, the proportion of unknown traffic in networks continues to increase. This poses great challenges to the management and security of cyberspace. The unknown traffic refers to network traffic generated by previously unknown protocols in a preconstructed traffic identification system. Measures to address this challenge can be developed by grouping the mixed unknown traffic into multiple clusters, where, ideally, each cluster contains just one traffic class. In this paper, we propose a novel scheme for clustering unknown traffic, named dual-path autoencoder-based clustering, to discover protocol-based traffic classes. The dual-path autoencoder model refers to the combination of convolutional autoencoder and deep autoencoder, which realizes the extraction and aggregation of payload features and statistical features. Then, the fusion feature is clustered by the correlation-adjusted clustering module, and the unknown traffic flows are divided into multiple high-purity clusters. To evaluate our scheme, experiments are conducted on two public network traffic datasets and one campus network dataset. Using seven common application layer protocols to simulate unknown traffic, the evaluation results show that our scheme can achieve above 98% on each dataset when the preset number of clusters is 60. This establishes the effectiveness of the proposed scheme for clustering unknown network protocols.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Unknown Traffic Identification Using Deep Auto-Encoder and Constrained Clustering

A Framework for Unknown Traffic Identification Based on Neural Networks and Constraint Information

Novel Approach for Network Anomaly Detection Using Autoencoder on CICIDS Dataset

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The ISCX2012 and ISCX_Botnet datasets that support the findings of this study are available in “https://www.unb.ca/cic/datasets/”. The selfDataset that support the findings of this study is available on request from the corresponding author. The selfDataset is not publicly available due to it containing information that could compromise research participant privacy.

Notes

https://www.ntop.org/products/packet-capture/pf_ring/.

References

Biersack E, Callegari C, Matijasevic M (2013) Data traffic monitoring and analysis: from measurement, classification, and anomaly detection to quality of experience. Lect Notes Comput Sci 5(23):12561–12570
Google Scholar
Rezaei S, Liu X (2019) Deep learning for encrypted traffic classification: an overview. IEEE Commun Mag 57(5):76–81
Article Google Scholar
Zhang J, Xiao C, Yang X, Zhou W, Jie W (2015) Robust network traffic classification. IEEE/ACM Trans Netw 23(4):1257–1270
Article Google Scholar
Zhang Y, Zhao S, Sang Y (2019) Towards unknown traffic identification using deep auto-encoder and constrained clustering. In: International conference on computational science
Chen Y, Li Z, Shi J, Gou G, Xiong G (2020) Not afraid of the unseen: a siamese network based scheme for unknown traffic discovery. In: IEEE symposium on computers and communications (ISCC)
Yang Z, Lin W (2020) Unknown traffic identification based on deep adaptation networks. In: IEEE 45th LCN symposium on emerging topics in networking (LCN symposium), pp 10–18
Qin M, Lei K, Bai B, Zhang G (2019) Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns. In: SIGCOMM 2019 NetAI workshop
Palmieri F, Fiore U (2009) A nonlinear, recurrence-based approach to traffic classification. Comput Netw 53(6):761–773
Article MATH Google Scholar
Tongaonkar A, Keralapura R, Nucci A (2013) Santaclass: a self adaptive network traffic classification system. IFIP Netw Conf 2013:1–9
Google Scholar
Yun X, Wang Y, Zhang Y, Zhou Y (2016) A semantics-aware approach to the automated network protocol identification. IEEE/ACM Trans Netw 24(1):583–595
Article Google Scholar
Wang Y, Yun X, Zhang Y (2015) Rethinking robust and accurate application protocol identification: a nonparametric approach. In: IEEE 23rd International conference on network protocols (ICNP)
Zhao S, Zhang Y, Sang Y (2019) Towards unknown traffic identification via embeddings and deep autoencoders. In: 26th International conference on telecommunications (ICT)
Sun F, Wang S, Zhang C, Zhang H (2020) Clustering of unknown protocol messages based on format comparison. Comput Netw 179:107296
Article Google Scholar
Zhang J, Yang X, Zhou W, Yu W (2013) Unsupervised traffic classification using flow statistical properties and IP packet payload. J Comput Syst Sci 79(5):573–585
Article MathSciNet Google Scholar
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213
Article Google Scholar
Aouini Z, Pekar A (2022) NFStream: a flexible network data analysis framework. Comput Netw 204:108719. https://doi.org/10.1016/j.comnet.2021.108719
Article Google Scholar
Deri L, Martinelli M, Bujlow T, Cardigliano A (2014) ndpi: Open-source high-speed deep packet inspection. In: International wireless communications and mobile computing conference (IWCMC), pp 617–622. https://doi.org/10.1109/IWCMC.2014.6906427
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Liu Y, Zhang S, Ding B, Li X, Wang Y (2018) A cascade forest approach to application classification of mobile traces, pp 1–6
Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96
Article MATH Google Scholar
Kingma D.P, Ba J (2014) Adam: a method for stochastic optimization, pp 273–297. arXiv preprint. arXiv:1412.6980
Chiu K-C, Liu C-C, Chou L-D (2020) CAPC: packet-based network service classifier with convolutional autoencoder. IEEE Access 8:218081–218094
Article Google Scholar
Erman J, Arlitt MF, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2nd annual ACM workshop on mining network data, MineNet 2006, Pisa, Italy
Usama M, Qadir J, Raza A et al (2017) Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access 7:65579–65615
Article Google Scholar
Baldi M, Baldini A, Cascarano N, Risso F (2009) Service-based traffic classification: principles and validation. In: IEEE Sarnoff symposium
Cascarano N, Risso F, Torino PD, Este A, Gringoli F, Salgarelli L, Finamore R, Mellia M (2010) Comparing P2PTV traffic classifiers. In: IEEE Xplore
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374. https://doi.org/10.1016/j.cose.2011.12.012
Article Google Scholar
Beigi EB, Jazi HH, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: IEEE conference on communications and network security, pp 247–255. https://doi.org/10.1109/CNS.2014.6997492

Download references

Author information

Authors and Affiliations

School of Software, Shandong University, Jinan, China
Yating Fu, Xiaofan Li & Fengyu Wang
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China
Xuan Li
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Shuyuan Zhao

Authors

Yating Fu
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofan Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuyuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengyu Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fu, Y., Li, X., Li, X. et al. Clustering unknown network traffic with dual-path autoencoder. Neural Comput & Applic 35, 8955–8966 (2023). https://doi.org/10.1007/s00521-022-08138-9

Download citation

Received: 15 May 2022
Accepted: 29 November 2022
Published: 07 January 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08138-9

Clustering unknown network traffic with dual-path autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Unknown Traffic Identification Using Deep Auto-Encoder and Constrained Clustering

A Framework for Unknown Traffic Identification Based on Neural Networks and Constraint Information

Novel Approach for Network Anomaly Detection Using Autoencoder on CICIDS Dataset

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Clustering unknown network traffic with dual-path autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Unknown Traffic Identification Using Deep Auto-Encoder and Constrained Clustering

A Framework for Unknown Traffic Identification Based on Neural Networks and Constraint Information

Novel Approach for Network Anomaly Detection Using Autoencoder on CICIDS Dataset

Explore related subjects

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation