Abstract
The increasing complexity of cyberattacks has prompted researchers to keep pace with this trend by proposing automated cyberattack classification methods. Current research directions favor supervised learning detection methods; however, they are limited by the fact that they must be continually trained on vast labelled datasets and cannot generalize to unseen events. We propose a novel unsupervised learning detection approach that performs deep packet inspection on HTTP-specific features, contrary to other works that work with generic numerical network-based features. Our method is divided into three phases: pre-processing, dimension reduction and clustering. By analyzing the content of each HTTP packet, we achieve the perfect isolation of each web attack in the CIC-IDS2017 dataset in separate clusters. Further, we run our method on real-world data collected from a honeypot platform to demonstrate its classification abilities. For future work, the proposed method could be applied to other protocols and extended with more correlation techniques to classify complex attacks.
This research was supported by Thales Research and Technology (TRT) Canada.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Censys—industry-leading cloud and internet asset discovery solutions. https://censys.io/
curl. https://curl.se/
CVE - CVE-2019-16759. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16759
difflib - Helpers for computing deltas - Python 3.10.6 documentation. https://docs.python.org/3/library/difflib.html
“l9explore,” original-date: 2020-12-15T00:39:15Z. https://github.com/LeakIX/l9explore
Azhar, N.B.: “gohttp,” original-date: 2017-11-08T15:28:32Z. https://github.com/nahid/gohttp
NDI/LDAP service provider. https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-ldap.html
Overview - OkHttp. https://square.github.io/okhttp/
Prince \(\cdot \) PyPI. https://pypi.org/project/prince/
Product catalog—mercury security access control hardware & solutions. https://mercury-security.com/portal/
Projectdiscovery.io. https://projectdiscovery.io/#/
PycURL home page. http://pycurl.io/
Graham, R.D.: “MASSCAN: Mass IP port scanner,” original-date: 2013-07-28T05:35:33Z. https://github.com/robertdavidgraham/masscan
Requests \(\cdot \) PyPI. https://pypi.org/project/requests/
urllib - URL handling modules - python 3.11.0 documentation. https://docs.python.org/3/library/urllib.html
vBulletin 5 connect, the world’s leading community software. https://www.vbulletin.com/
Welcome to AIOHTTP - aiohttp 3.8.3 documentation. https://docs.aiohttp.org/en/stable/
“ZGrab 2.0,” original-date: 2016-08-19T23:22:02Z. https://github.com/zmap/zgrab2
ZmEu, “Zmeubot - module for ZNC (v0.1),” original-date: 2016-01-22T12:00:27Z. https://github.com/happyhater/zmeubot-znc
Abdi, H., Valentin, D.: Multiple correspondence analysis, p. 13 (2007)
Ahmetoglu, H., Das, R.: A comprehensive review on detection of cyber-attacks: data sets, methods, challenges, and future research directions. Internet of Things 20, 100615 (2022). https://doi.org/10.1016/j.iot.2022.100615, https://www.sciencedirect.com/science/article/pii/S254266052200097X
Bejarano, J., et al.: Sampling within k-means algorithm to cluster large datasets. UMBC Student Collection (2011)
Boukela, L., Zhang, G., Bouzefrane, S., Zhou, J.: An outlier ensemble for unsupervised anomaly detection in honeypots data. Intell. Data Anal. 24(4), 743–758 (2020)
Faker, O., Dogdu, E.: Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast Conference, ACM SE 2019, pp. 86–93. Association for Computing Machinery (2019)
Ghurab, M., Gaphari, G., Alshami, F., Alshamy, R., Othman, S.: A detailed analysis of benchmark datasets for network intrusion detection system (2021)
Lippmann, R., Haines, J.W., Fried, D.J., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Comput. Netw. 34(4), 579–595 (2000)
Matin, I.M.M., Rahardjo, B.: Malware detection using honeypot and machine learning. In: 2019 7th International Conference on Cyber and IT Service Management (CITSM), vol. 7, pp. 1–4. IEEE (2019)
Meira, J., et al.: Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. J. Ambient Intell. Human Comput. 11(11), 4477–4489 (2020)
Mokube, I., Adams, M.: Honeypots: concepts, approaches, and challenges. In: Proceedings of the 45th Annual Southeast Regional Conference, pp. 321–326 (2007)
Owezarski, P.: Unsupervised classification and characterization of honeypot attacks. In: 10th International Conference on Network and Service Management (CNSM) and Workshop, pp. 10–18. IEEE (2014)
Panigrahi, R., Borah, S.: A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7, 479–482 (2018)
Pelletier, Z., Abualkibash, M.: Evaluating the CIC IDS-2017 dataset using machine learning methods and creating multiple predictive models in the statistical computing language R. Int. Res. J. Adv. Eng. Sci. 5(2), 5 (2020)
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection data sets. Comput. Secur. 86, 147–167 (2019)
Sinaga, K.P., Yang, M.S.: Unsupervised k-means clustering algorithm. IEEE Access 8, 80716–80727 (2020)
Takyi, K., Bagga, A., Goopta, P.: Clustering techniques for traffic classification: a comprehensive review. In: 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 224–230 (2018)
Wu, Y., Wei, D., Feng, J.: Network attacks detection methods based on deep learning techniques: a survey. Secur. Commun. Netw. 2020, e8872923 (2020)
Yavanoglu, O., Aydos, M.: A review on cyber security datasets for machine learning algorithms. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2186–2193 (2017)
Zanero, S., Savaresi, S.M.: Unsupervised learning techniques for an intrusion detection system. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, pp. 412–419. Association for Computing Machinery (2004)
Zhang, X., Chen, J., Zhou, Y., Han, L., Lin, J.: A multiple-layer representation learning model for network-based attack detection. IEEE Access 7, 91992–92008 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aurora, V., Neal, C., Proulx, A., Boulahia Cuppens, N., Cuppens, F. (2024). Unsupervised Clustering of Honeypot Attacks by Deep HTTP Packet Inspection. In: Mosbah, M., Sèdes, F., Tawbi, N., Ahmed, T., Boulahia-Cuppens, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2023. Lecture Notes in Computer Science, vol 14551. Springer, Cham. https://doi.org/10.1007/978-3-031-57537-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-57537-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57536-5
Online ISBN: 978-3-031-57537-2
eBook Packages: Computer ScienceComputer Science (R0)