Feature Extraction and Malware Detection on Large HTTPS Data Using MapReduce

Přemysl Čech¹⁶,
Jan Kohout¹⁷,
Jakub Lokoč¹⁶,
Tomáš Komárek¹⁷,
Jakub Maroušek¹⁶ &
…
Tomáš Pevný¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9939))

Included in the following conference series:

International Conference on Similarity Search and Applications

1404 Accesses
2 Citations

Abstract

Secure HTTP network traffic represents a challenging immense data source for machine learning tasks. The tasks usually try to learn and identify infected network nodes, given only limited traffic features available for secure HTTP data. In this paper, we investigate the performance of grid histograms that can be used to aggregate traffic features of network nodes considering just 5-min batches for snapshots. We compare the representation using linear and k-NN classifiers. We also demonstrate that all presented feature extraction and classification tasks can be implemented in a scalable way using the MapReduce approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach

An approach to application-layer DoS detection

Article Open access 13 February 2023

ENS-RFMC: An Encrypted Network Traffic Sampling Method Based on Rule-Based Feature Extraction and Multi-hierarchical Clustering for Intrusion Detection

Notes

1.
The statistical descriptor is a d-dimensional vector x capturing statistical properties of the communication. For more details see Sect. 2.
2.
We would like to thank Lu et al. [11] for sharing their codes with us.
3.
The cell $c_i^S$ query ball is defined by pivot $p_i$ and radius that equals to max $d(p_i, o_j)$ for all $o_j \in c_i^S$ determined in the preprocessing phase.

References

Cisco Annual Security Report 2016 (2016). http://www.cisco.com/c/en/us/products/security/annual_security_report.html
Bohm, C., Kriegel, H.P.: A cost model and index architecture for the similarity join. In: Proceedings of the 17th International Conference on Data Engineering, pp. 411–420 (2001)
Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37, 5–16 (2007)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dusi, M., Crotti, M., Gringoli, F., Salgarelli, L.: Tunnel hunter: detecting application-layer tunnels with statistical fingerprinting. Comput. Netw. 53, 81–97 (2009)
Article Google Scholar
Kohout, J., Pevny, T.: Automatic discovery of web servers hosting similar applications. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM) (2015)
Google Scholar
Kohout, J., Pevny, T.: Unsupervised detection of malware in persistent web traffic. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Google Scholar
Lee, Y., Lee, Y.: Toward scalable internet traffic measurement and analysis with hadoop. SIGCOMM Comput. Commun. Rev. 43(1), 5–13 (2012)
Article Google Scholar
Lokoc, J., Kohout, J., Cech, P., Skopal, T., Pevný, T.: k-NN classification of malware in HTTPS traffic using the metric space approach. In: Chau, M., Wang, G.A. (eds.) PAISI 2016. LNCS, vol. 9650, pp. 131–145. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31863-9_10
Chapter Google Scholar
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. Proc. VLDB Endow. 5(10), 1016–1027 (2012)
Article Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)
Article Google Scholar
Pevny, T., Ker, A.D.: Towards dependable steganalysis. In: IS&T/SPIE Electronic Imaging (2015)
Google Scholar
Roesch, M.: Snort - lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration, LISA 1999, pp. 229–238. USENIX Association, Berkeley (1999)
Google Scholar
Wright, C., Monrose, F., Masson, G.M.: On inferring application protocol behaviors in encrypted network traffic. J. Mach. Learn. Res. 7, 2745–2769 (2006)
MathSciNet MATH Google Scholar
Xia, C., Lu, H., Ooi, B.C., Hu, J.: Gorder: an efficient method for KNN join processing. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 756–767. VLDB Endowment (2004)
Google Scholar
Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)
Article Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer, New York (2005)
MATH Google Scholar

Download references

Acknowledgments

This project was supported by the GAČR 15-08916S and GAUK 201515 grants.

Author information

Authors and Affiliations

SIRET Research Group, Faculty of Mathematics and Physics, Department of Software Engineering, Charles University in Prague, Prague, Czech Republic
Přemysl Čech, Jakub Lokoč & Jakub Maroušek
FEE, Cognitive Research Center in Prague, Czech Technical University in Prague, Cisco Systems, Inc., Prague, Czech Republic
Jan Kohout, Tomáš Komárek & Tomáš Pevný

Authors

Přemysl Čech
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kohout
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Lokoč
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Komárek
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Maroušek
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Pevný
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Přemysl Čech .

Editor information

Editors and Affiliations

CNRS–IRISA , Rennes, France
Laurent Amsaleg
National Institute of Informatics , Tokyo, Japan
Michael E. Houle
Ludwig-Maximilians-Universität München , München, Germany
Erich Schubert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Čech, P., Kohout, J., Lokoč, J., Komárek, T., Maroušek, J., Pevný, T. (2016). Feature Extraction and Malware Detection on Large HTTPS Data Using MapReduce. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-46759-7_24
Published: 27 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46758-0
Online ISBN: 978-3-319-46759-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Extraction and Malware Detection on Large HTTPS Data Using MapReduce

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach

An approach to application-layer DoS detection

ENS-RFMC: An Encrypted Network Traffic Sampling Method Based on Rule-Based Feature Extraction and Multi-hierarchical Clustering for Intrusion Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Extraction and Malware Detection on Large HTTPS Data Using MapReduce

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach

An approach to application-layer DoS detection

ENS-RFMC: An Encrypted Network Traffic Sampling Method Based on Rule-Based Feature Extraction and Multi-hierarchical Clustering for Intrusion Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation