Unsupervised Clustering of Honeypot Attacks by Deep HTTP Packet Inspection

Victor Aurora¹³,
Christopher Neal^13,14,
Alexandre Proulx^13,15,
Nora Boulahia Cuppens¹³ &
…
Frédéric Cuppens¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14551))

Included in the following conference series:

International Symposium on Foundations and Practice of Security

200 Accesses

Abstract

The increasing complexity of cyberattacks has prompted researchers to keep pace with this trend by proposing automated cyberattack classification methods. Current research directions favor supervised learning detection methods; however, they are limited by the fact that they must be continually trained on vast labelled datasets and cannot generalize to unseen events. We propose a novel unsupervised learning detection approach that performs deep packet inspection on HTTP-specific features, contrary to other works that work with generic numerical network-based features. Our method is divided into three phases: pre-processing, dimension reduction and clustering. By analyzing the content of each HTTP packet, we achieve the perfect isolation of each web attack in the CIC-IDS2017 dataset in separate clusters. Further, we run our method on real-world data collected from a honeypot platform to demonstrate its classification abilities. For future work, the proposed method could be applied to other protocols and extended with more correlation techniques to classify complex attacks.

This research was supported by Thales Research and Technology (TRT) Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Censys—industry-leading cloud and internet asset discovery solutions. https://censys.io/
curl. https://curl.se/
CVE - CVE-2019-16759. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16759
difflib - Helpers for computing deltas - Python 3.10.6 documentation. https://docs.python.org/3/library/difflib.html
“l9explore,” original-date: 2020-12-15T00:39:15Z. https://github.com/LeakIX/l9explore
Azhar, N.B.: “gohttp,” original-date: 2017-11-08T15:28:32Z. https://github.com/nahid/gohttp
NDI/LDAP service provider. https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-ldap.html
Overview - OkHttp. https://square.github.io/okhttp/
Prince $\cdot $ PyPI. https://pypi.org/project/prince/
Product catalog—mercury security access control hardware & solutions. https://mercury-security.com/portal/
Projectdiscovery.io. https://projectdiscovery.io/#/
PycURL home page. http://pycurl.io/
Graham, R.D.: “MASSCAN: Mass IP port scanner,” original-date: 2013-07-28T05:35:33Z. https://github.com/robertdavidgraham/masscan
Requests $\cdot $ PyPI. https://pypi.org/project/requests/
urllib - URL handling modules - python 3.11.0 documentation. https://docs.python.org/3/library/urllib.html
vBulletin 5 connect, the world’s leading community software. https://www.vbulletin.com/
Welcome to AIOHTTP - aiohttp 3.8.3 documentation. https://docs.aiohttp.org/en/stable/
“ZGrab 2.0,” original-date: 2016-08-19T23:22:02Z. https://github.com/zmap/zgrab2
ZmEu, “Zmeubot - module for ZNC (v0.1),” original-date: 2016-01-22T12:00:27Z. https://github.com/happyhater/zmeubot-znc
Abdi, H., Valentin, D.: Multiple correspondence analysis, p. 13 (2007)
Google Scholar
Ahmetoglu, H., Das, R.: A comprehensive review on detection of cyber-attacks: data sets, methods, challenges, and future research directions. Internet of Things 20, 100615 (2022). https://doi.org/10.1016/j.iot.2022.100615, https://www.sciencedirect.com/science/article/pii/S254266052200097X
Bejarano, J., et al.: Sampling within k-means algorithm to cluster large datasets. UMBC Student Collection (2011)
Google Scholar
Boukela, L., Zhang, G., Bouzefrane, S., Zhou, J.: An outlier ensemble for unsupervised anomaly detection in honeypots data. Intell. Data Anal. 24(4), 743–758 (2020)
Article Google Scholar
Faker, O., Dogdu, E.: Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast Conference, ACM SE 2019, pp. 86–93. Association for Computing Machinery (2019)
Google Scholar
Ghurab, M., Gaphari, G., Alshami, F., Alshamy, R., Othman, S.: A detailed analysis of benchmark datasets for network intrusion detection system (2021)
Google Scholar
Lippmann, R., Haines, J.W., Fried, D.J., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Comput. Netw. 34(4), 579–595 (2000)
Google Scholar
Matin, I.M.M., Rahardjo, B.: Malware detection using honeypot and machine learning. In: 2019 7th International Conference on Cyber and IT Service Management (CITSM), vol. 7, pp. 1–4. IEEE (2019)
Google Scholar
Meira, J., et al.: Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. J. Ambient Intell. Human Comput. 11(11), 4477–4489 (2020)
Article Google Scholar
Mokube, I., Adams, M.: Honeypots: concepts, approaches, and challenges. In: Proceedings of the 45th Annual Southeast Regional Conference, pp. 321–326 (2007)
Google Scholar
Owezarski, P.: Unsupervised classification and characterization of honeypot attacks. In: 10th International Conference on Network and Service Management (CNSM) and Workshop, pp. 10–18. IEEE (2014)
Google Scholar
Panigrahi, R., Borah, S.: A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7, 479–482 (2018)
Google Scholar
Pelletier, Z., Abualkibash, M.: Evaluating the CIC IDS-2017 dataset using machine learning methods and creating multiple predictive models in the statistical computing language R. Int. Res. J. Adv. Eng. Sci. 5(2), 5 (2020)
Google Scholar
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection data sets. Comput. Secur. 86, 147–167 (2019)
Article Google Scholar
Sinaga, K.P., Yang, M.S.: Unsupervised k-means clustering algorithm. IEEE Access 8, 80716–80727 (2020)
Article Google Scholar
Takyi, K., Bagga, A., Goopta, P.: Clustering techniques for traffic classification: a comprehensive review. In: 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 224–230 (2018)
Google Scholar
Wu, Y., Wei, D., Feng, J.: Network attacks detection methods based on deep learning techniques: a survey. Secur. Commun. Netw. 2020, e8872923 (2020)
Article Google Scholar
Yavanoglu, O., Aydos, M.: A review on cyber security datasets for machine learning algorithms. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2186–2193 (2017)
Google Scholar
Zanero, S., Savaresi, S.M.: Unsupervised learning techniques for an intrusion detection system. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, pp. 412–419. Association for Computing Machinery (2004)
Google Scholar
Zhang, X., Chen, J., Zhou, Y., Han, L., Lin, J.: A multiple-layer representation learning model for network-based attack detection. IEEE Access 7, 91992–92008 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Polytechnique Montreal, Montreal, Canada
Victor Aurora, Christopher Neal, Alexandre Proulx, Nora Boulahia Cuppens & Frédéric Cuppens
IRT SystemX, Palaiseau, France
Christopher Neal
Thales Research and Technology, Quebec City, Canada
Alexandre Proulx

Authors

Victor Aurora
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Neal
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Proulx
View author publications
You can also search for this author in PubMed Google Scholar
Nora Boulahia Cuppens
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Cuppens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Neal .

Editor information

Editors and Affiliations

University of Bordeaux, Bordeaux, France
Mohamed Mosbah
Toulouse III - Paul Sabatier University, Toulouse, France
Florence Sèdes
Université Laval, Québec, QC, Canada
Nadia Tawbi
University of Bordeaux, Bordeaux, France
Toufik Ahmed
Polytechnique Montréal, Montreal, QC, Canada
Nora Boulahia-Cuppens
Telecom SudParis, Palaiseau, France
Joaquin Garcia-Alfaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aurora, V., Neal, C., Proulx, A., Boulahia Cuppens, N., Cuppens, F. (2024). Unsupervised Clustering of Honeypot Attacks by Deep HTTP Packet Inspection. In: Mosbah, M., Sèdes, F., Tawbi, N., Ahmed, T., Boulahia-Cuppens, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2023. Lecture Notes in Computer Science, vol 14551. Springer, Cham. https://doi.org/10.1007/978-3-031-57537-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-57537-2_4
Published: 25 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57536-5
Online ISBN: 978-3-031-57537-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics