Abstract
In this paper, we present detection of malware in HTTPS traffic using k-NN classification. We focus on the metric space approach for approximate k-NN searches over dataset of sparse high-dimensional descriptors of network traffic. We show the classification based on approximate k-NN search using metric index exhibits false positive rate reduced by an order of magnitude when compared to the state of the art method, while keeping the classification fast enough.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In our case, the descriptors are high-dimensional sparse vectors representing network traffic and the distance function is the Euclidean distance.
- 2.
The exact cannot be published due to non-disclosure agreements.
- 3.
Specifically, the hash was considered to be malicious if the corresponding process was detected by at least 20 anti-viruses used by virustotal.com service.
- 4.
virustotal.com.
- 5.
\(r_{\mathrm {up}}\) is the number of bytes sent from the client to the server, \(r_{\mathrm {down}}\) is the number of bytes received by the client from the server, \(r_{\mathrm {td}}\) is the duration of the connection (in milliseconds), and \(r_{\mathrm {ti}}\) is the time in seconds elapsed between start of the current and previous request of the same client.
- 6.
The experiments have run on 64-bit Windows Server 2008 R2 Standard with Intel Xeon CPU X5660, 2.8 GHz, 12 cores supporting hyper-threading. The training of the ECM classifier has run on a virtual machine (VMWare) using 8 cores CPU 2.2 GHz and 132 GB RAM. Matlab library MinFunc has been used.
- 7.
For a given query, the approximation error is computed as a normed overlap distance between the query result returned by approximate k-NN search and the correct result returned by exact k-NN search.
References
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of 21th International Conference on Very Large Data Bases, VLDB 1995, 11–15 September 1995, Zurich, Switzerland, pp. 574–584 (1995). http://www.vldb.org/conf/1995/P574.PDF
Chaudhuri, K., Dasgupta, S.: Rates of convergence for nearest neighbor classification. In: Advances in Neural Information Processing Systems (2014)
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26(9), 1363–1376 (2005). http://dx.doi.org/10.1016/j.patrec.2004.11.014
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB 1997, pp. 426–435 (1997)
Cisco: Cisco IOS NetFlow. http://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow/index.html
Cisco: Cloud Web Security (CWS). http://www.cisco.com/c/en/us/products/security/cloud-web-security/index.html
Claise, B., Trammell, B., Aitken, P.: Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information (2013). https://tools.ietf.org/html/rfc7011
Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37, 5–16 (2007)
Dusi, M., Crotti, M., Gringoli, F., Salgarelli, L.: Tunnel hunter: detecting application-layer tunnels with statistical fingerprinting. Comput. Netw. 53, 81–97 (2009)
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext Transfer Protocol – HTTP/1.1. https://tools.ietf.org/html/rfc2616
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645925.671516
Kohout, J., Pevny, T.: Automatic discovery of web servers hosting similar applications. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM) (2015)
Kohout, J., Pevny, T.: Unsupervised detection of malware in persistent web traffic. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: Proceedings of the 22nd USENIX Conference on Security (2013)
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)
Novak, D., Kyselak, M., Zezula, P.: On locality-sensitive indexing in generic metric spaces. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 59–66. ACM, New York (2010). http://doi.acm.org/10.1145/1862344.1862354
Perdisci, R., Ariu, D., Giacinto, G.: Scalable fine-grained behavioral clustering of HTTP-based malware. Comput. Netw. 57, 487–500 (2013)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (2010)
Pevny, T., Ker, A.D.: Towards dependable steganalysis. In: IS&T/SPIE Electronic Imaging (2015)
Wright, C., Monrose, F., Masson, G.M.: On inferring application protocol behaviors in encrypted network traffic. J. Mach. Learn. Res. 7, 2745–2769 (2006)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer, New York (2005)
Acknowledgments
This research has been supported by Czech Science Foundation project (GAČR) 15-08916S.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lokoč, J., Kohout, J., Čech, P., Skopal, T., Pevný, T. (2016). k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach. In: Chau, M., Wang, G., Chen, H. (eds) Intelligence and Security Informatics. PAISI 2016. Lecture Notes in Computer Science(), vol 9650. Springer, Cham. https://doi.org/10.1007/978-3-319-31863-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-31863-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31862-2
Online ISBN: 978-3-319-31863-9
eBook Packages: Computer ScienceComputer Science (R0)