Abstract
With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Auld, T., Moore, A., Gull, S.: Bayesian neural networks for internet traffic classification. IEEE Transactions on Neural Networks 18(1), 223–239 (2007)
Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3), 849–851 (2003)
Bermolen, P., Mellia, M., Meo, M., Rossi, D., Valenti, S.: Abacus: Accurate behavioral classification of p2p-tv traffic. Computer Networks 55(6), 1394–1411 (2011)
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36(2), 23–26 (2006)
Callado, A., Kelner, J., Sadok, D., Alberto Kamienski, C., Fernandes, S.: Better network traffic identification through the independent combination of techniques. Journal of Network and Computer Applications 33(4), 433–446 (2010)
Carela-Español, V., Barlet-Ros, P., Cabellos-Aparicio, A., Solé-Pareta, J.: Analysis of the impact of sampling on netflow traffic classification. Computer Networks 55(5), 1083–1099 (2011)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter 6(1), 1–6 (2004)
Finamore, A., Mellia, M., Meo, M.: Mining unclassified traffic using automatic clustering techniques. In: Domingo-Pascual, J., Shavitt, Y., Uhlig, S. (eds.) TMA 2011. LNCS, vol. 6613, pp. 150–163. Springer, Heidelberg (2011)
Glatz, E., Dimitropoulos, X.: Classifying internet one-way traffic. In: Proceedings of the 2012 ACM Conference on Internet Measurement Conference, pp. 37–50. ACM (2012)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2011)
Jin, Y., Duffield, N., Erman, J., Haffner, P., Sen, S., Zhang, Z.L.: A modular machine learning system for flow-level traffic classification in large networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1), 4 (2012)
Joshi, M.V., Kumar, V., Agarwal, R.C.: Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In: Proceedings of the IEEE International Conference on Data Mining, ICDM 2001, pp. 257–264. IEEE (2001)
Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. ACM SIGCOMM Computer Communication Review 35, 229–240 (2005)
Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)
Nguyen, T.T., Armitage, G., Branch, P., Zander, S.: Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Transactions on Networking (TON) 20(6), 1880–1894 (2012)
Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys Tutorials 10(4), 56–76 (2008)
Papagiannaki, K., Taft, N., Bhattacharyya, S., Thiran, P., Salamatian, K., Diot, C.: A pragmatic definition of elephants in internet backbone traffic. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, pp. 175–176. ACM (2002)
Wang, Y., Xiang, Y., Yu, S.Z.: An automatic application signature construction system for unknown traffic. Concurrency and Computation: Practice and Experience 22(13), 1927–1944 (2010)
Wang, Y., Xiang, Y., Zhang, J., Yu, S.: Internet traffic clustering with constraints. In: 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 619–624. IEEE (2012)
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC, pp. 49–56 (2003)
Zander, S., Nguyen, T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: The IEEE Conference on Local Computer Networks 30th Anniversary, pp. 250–257 (November 2005)
Zhang, J., Xiang, Y., Wang, Y., Zhou, W., Xiang, Y., Guan, Y.: Network traffic classification using correlation information. IEEE Transactions on Parallel and Distributed Systems 24(1), 104–117 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, D., Chen, X., Chen, C., Zhang, J., Xiang, Y., Zhou, W. (2014). On Addressing the Imbalance Problem: A Correlated KNN Approach for Network Traffic Classification. In: Au, M.H., Carminati, B., Kuo, CC.J. (eds) Network and System Security. NSS 2015. Lecture Notes in Computer Science, vol 8792. Springer, Cham. https://doi.org/10.1007/978-3-319-11698-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-11698-3_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11697-6
Online ISBN: 978-3-319-11698-3
eBook Packages: Computer ScienceComputer Science (R0)