research-article

Empowering In-Network Classification in Programmable Switches by Binary Decision Tree and Knowledge Distillation

Authors:

Yuan YangAuthors Info & Claims

IEEE/ACM Transactions on Networking, Volume 32, Issue 1

Pages 382 - 395

https://doi.org/10.1109/TNET.2023.3287091

Published: 21 June 2023 Publication History

Abstract

Given the high packet processing efficiency of programmable switches (e.g., P4 switches of Tbps), several works are proposed to offload the decision tree (DT) to P4 switches for in-network classification. Although the DT is suitable for the match-action paradigm in P4 switches, the range match rules used in the DT may not be supported across devices of different P4 standards. Additionally, emerging models including neural networks (NNs) and ensemble models, have shown their superior performance in networking tasks. But their sophisticated operations pose new challenges to the deployment of these models in switches. In this paper, we propose Mousikav2 to address these drawbacks successfully. First, we design a new tree model, i.e., the binary decision tree (BDT). Unlike the DT, our BDT consists of classification rules in the form of bits, which is a good fit for the standard ternary match supported by different hardware/software switches. Second, we introduce a teacher-student knowledge distillation architecture in Mousikav2, which enables the general transfer from other sophisticated models to the BDT. Through this transfer, sophisticated models are indirectly deployed in switches to avoid switch constraints. Finally, a lightweight P4 program is developed to perform classification tasks in switches with the BDT after knowledge distillation. Experiments on three networking tasks and three commodity switches show that Mousikav2 not only improves the classification accuracy by 3.27%, but also reduces the switch stage and memory usage by <inline-formula> <tex-math notation="LaTeX">$2.00\times $ </tex-math></inline-formula> and 28.67%, respectively. Code is available at <uri>https://github.com/xgr19/Mousika</uri>.

References

[1]

M. Wang, Y. Cui, X. Wang, S. Xiao, and J. Jiang, “Machine learning for networking: Workflow, advances and opportunities,” IEEE Netw., vol. 32, no. 2, pp. 92–99, Mar. 2018.

[2]

J. Li and Z. Pan, “Network traffic classification based on deep learning,” Trans. Internet Inf. Syst., vol. 14, no. 11, pp. 4246–4267, 2020.

[3]

P. Poupartet al., “Online flow size prediction for improved network routing,” in Proc. IEEE 24th Int. Conf. Netw. Protocols (ICNP), Nov. 2016, pp. 1–6.

[4]

M. Lotfollahi, M. J. Siavoshani, R. S. H. Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Comput., vol. 24, no. 3, pp. 1999–2012, Feb. 2020.

[5]

S. M. Kasongo and Y. Sun, “A deep learning method with wrapper based feature extraction for wireless intrusion detection system,” Comput. Secur., vol. 92, May 2020, Art. no.

[6]

Z. Zhao, X. Shi, Z. Wang, Q. Li, H. Zhang, and X. Yin, “Efficient and accurate flow record collection with HashFlow,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 5, pp. 1069–1083, May 2022.

[7]

P. Bosshartet al., “p4: Programming protocol-independent packet processors,” ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95, Jul. 2014.

[8]

Z. Xiong and N. Zilberman, “Do switches dream of machine learning?: Toward in-network classification,” in Proc. 18th ACM Workshop Hot Topics Netw., Nov. 2019, pp. 25–33.

[9]

C. Zhenget al., “Iisy: Practical in-network classification,” 2022, arXiv:2205.08243.

[10]

J.-H. Lee and K. Singh, “SwitchTree: In-network computing and traffic analyses with random forests,” Neural Comput. Appl., pp. 1–12, Nov. 2020. [Online]. Available: https://link.springer.com/article/10.1007/s00521-020-05440-2

[11]

C. Busse-Grawitz, R. Meier, A. Dietmüller, T. Bühler, and L. Vanbever, “pForest: In-network inference with random forests,” 2022, arXiv:1909.05680.

[12]

C. Zheng and N. Zilberman, “Planter: Seeding trees within switches,” in Proc. SIGCOMM Poster Demo Sessions, Aug. 2021, pp. 12–14.

[13]

C. Zhenget al., “Automating in-network machine learning,” 2022.

[14]

P4 Language Consortium. v1model.p4. Website. Accessed: Sep. 13, 2022. [Online]. Available: https://github.com/p4lang/p4c/blob/main/p4include/v1model.p4

[15]

Barefoot Networks. Tofino Switch. Website. Accessed: Sep. 13, 2022. [Online]. Available: https://www.barefootnetworks.com/products/brief-tofino/

[16]

G. Xie, Q. Li, and Y. Jiang, “Self-attentive deep learning method for online traffic classification and its interpretability,” Comput. Netw., vol. 196, Sep. 2021, Art. no.

[17]

A. Aleesa, M. Younis, A. A. Mohammed, and N. Sahar, “Deep-intrusion detection system with enhanced UNSW-NB15 dataset based on deep learning techniques,” J. Eng. Sci. Technol., vol. 16, pp. 711–727, Feb. 2021.

[18]

G. Xie, Q. Li, Y. Dong, G. Duan, Y. Jiang, and J. Duan, “Mousika: Enable general in-network intelligence in programmable switches by knowledge distillation,” in Proc. IEEE INFOCOM Conf. Comput. Commun., May 2022, pp. 1938–1947.

[19]

P4 Language Consortium. Core.p4. Website. Accessed: Apr. 27, 2023. [Online]. Available: https://github.com/p4lang/p4c/blob/main/p4include/core.p4#L72

[20]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learn. Represent., 2015, pp. 1–14.

[21]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 26th Annu. Conf. Neural Inf. Process. Syst., 2012, pp. 1106–1114.

[22]

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2019, pp. 4171–4186.

[23]

H. Penget al., “Large-scale hierarchical text classification with recursively regularized deep graph-CNN,” in Proc. World Wide Web Conf. World Wide Web (WWW), 2018, pp. 1063–1072.

[24]

X. Liuet al., “Attention-based bidirectional GRU networks for efficient HTTPS traffic classification,” Inf. Sci., vol. 541, pp. 297–315, Dec. 2020.

[25]

T. Panet al., “Sailfish: Accelerating cloud-scale multi-tenant multi-service gateways with programmable switches,” in Proc. ACM SIGCOMM Conf., Aug. 2021, pp. 194–206.

[26]

D. Firestoneet al., “Azure accelerated networking: Smartnics in the public cloud,” in Proc. 15th USENIX Symp. Netw. Syst. Design Implement., 2018, pp. 51–66.

[27]

D. Barradas, N. Santos, L. Rodrigues, S. Signorello, F. M. V. Ramos, and A. Madeira, “FlowLens: Enabling efficient flow classification for ML-based network security applications,” in Proc. Netw. Distrib. Syst. Secur. Symp., 2021, pp. 1–18.

[28]

H. Liu, “Efficient mapping of range classifier into ternary-CAM,” in Proc. 10th Symp. High Perform. Interconnects, Aug. 2002, pp. 95–100.

[29]

F. Zane, G. Narlikar, and A. Basu, “CoolCAMs: Power-efficient TCAMs for forwarding engines,” in Proc. IEEE INFOCOM 22nd Annu. Joint Conf. IEEE Comput. Commun. Societies, Mar. 2003, pp. 42–52.

[30]

R. Miao, H. Zeng, C. Kim, J. Lee, and M. Yu, “SilkRoad: Making stateful Layer-4 load balancing fast and cheap using switching ASICs,” in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2017, pp. 15–28.

[31]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification Regression Trees. Belmont, CA, USA: Wadsworth, 1984.

[32]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.

[33]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, arXiv:1412.3555.

[34]

L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

[35]

S. Kumar. Essential Guide to Perform Feature Binning Using a Decision Tree Model. Website. Accessed: May 3, 2023. [Online]. Available: https://towardsdatascience.com/essential-guide-to-perform-feature-binning-using-a-decision-tree-model-90bcc66d61f9

[36]

R. Pramoditha. One-Hot Encode Scalar-Value Labels for Deep Learning Models. Website. Accessed: May 3, 2023. [Online]. Available: https://medium.com/data-science-365/one-hot-encode-scalar-value-labels-for-deep-learning-models-4d4053f185c5

[37]

J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, Jun. 2021.

[38]

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015, arXiv:1503.02531s.

[39]

J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2014, pp. 2654–2662.

[40]

G. Urbanet al., “Do deep convolutional nets really need to be deep and convolutional?” in Proc. 5th Int. Conf. Learn. Represent., 2017, pp. 1–13.

[41]

J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S.-T. Xia, “Multinomial random forest,” Pattern Recognit., vol. 122, Feb. 2022, Art. no.

[42]

N. Frosst and G. E. Hinton, “Distilling a neural network into a soft decision tree,” in Proc. 1st Int. Workshop Comprehensibility Explanation AI ML Co-Located With 16th Int. Conf. Italian Assoc. Artif. Intell., vol. 2071, 2017, pp. 1–8.

[43]

J. Bai, Y. Li, J. Li, Y. Jiang, and S. Xia, “Rectified decision trees: Towards interpretability, compression and empirical soundness,” 2020, arXiv:1903.05965.

[44]

T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. 10th ACM SIGCOMM Conf. Internet Meas., Nov. 2010, pp. 267–280.

[45]

G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and VPN traffic using time-related features,” in Proc. 2nd Int. Conf. Inf. Syst. Secur. Privacy, 2016, pp. 407–414.

[46]

N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset,” Future Gener. Comput. Syst., vol. 100, pp. 779–796, Nov. 2019.

[47]

R. Bolla, R. Bruschi, C. Lombardo, and D. Suino, “Evaluating the energy-awareness of future internet devices,” in Proc. IEEE 12th Int. Conf. High Perform. Switching Routing, Jul. 2011, pp. 36–43.

[48]

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Statist., vol. 29, no. 5, pp. 1189–1232, Oct. 2001.

[49]

L. Van Efferen and A. M. T. Ali-Eldin, “A multi-layer perceptron approach for flow-based anomaly detection,” in Proc. Int. Symp. Netw., Comput. Commun. IEEE, 2017, pp. 1–6.

[50]

F. Pedregosaet al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, no. 10, pp. 2825–2830, Jul. 2017.

[51]

A. Paszkeet al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2019, pp. 8024–8035.

[52]

G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Toward effective mobile encrypted traffic classification through deep learning,” Neurocomputing, vol. 409, pp. 306–315, Oct. 2020.

[53]

Y. Zhuet al., “Teach less, learn more: On the undistillable classes in knowledge distillation,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2022, pp. 1–14.

[54]

C. Wang, S. Zhang, S. Song, and G. Huang, “Learn from the past: Experience ensemble knowledge distillation,” in Proc. 26th Int. Conf. Pattern Recognit. (ICPR), Aug. 2022, pp. 4736–4743.

[55]

J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 7130–7138.

[56]

W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3962–3971.

[57]

G. Ji and Z. Zhu, “Knowledge distillation in wide neural networks: Risk bound, data efficiency and imperfect teacher,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2020, pp. 1–16.

[58]

G. Zhou, Z. Liu, C. Fu, Q. Li, and K. Xu, “An efficient design of intelligent network data plane,” in Proc. 32nd USENIX Secur. Symp., 2022, pp. 2461–2478.

[59]

J. R. Quinlan, “Simplifying decision trees,” Int. J. Hum.-Comput. Stud., vol. 51, no. 2, pp. 497–510, Aug. 1999.

[60]

M. Bohanec and I. Bratko, “Trading accuracy for simplicity in decision trees,” Mach. Learn., vol. 15, no. 3, pp. 223–250, Jun. 1994.

Cited By

Yoon SKim HJeong HBae CKim HPack S(2024)Multi-task Aware Resource Efficient Traffic Classification via in-Network InferenceProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673803(69-74)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672198.3673803
Zhang KSamaan NKarmouch A(2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3402074
Zheng CHong XDing DVargaftik SBen-Itzhak YZilberman N(2023)In-Network Machine Learning Using Programmable Network Devices: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.334435126:2(1171-1200)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1109/COMST.2023.3344351

Index Terms

Empowering In-Network Classification in Programmable Switches by Binary Decision Tree and Knowledge Distillation

Index terms have been assigned to the content through auto-classification.

Recommendations

Mousika: Enable General In-Network Intelligence in Programmable Switches by Knowledge Distillation
IEEE INFOCOM 2022 - IEEE Conference on Computer Communications
Given the power efficiency and Tbps throughput of packet processing, several works are proposed to offload the decision tree (DT) to programmable switches, i.e., in-network intelligence. Though the DT is suitable for the switches’ match-action ...
Neural Network and Decision Tree Induction: A Comparison in the Domain of Classification of Sonar Signal
ICETET '08: Proceedings of the 2008 First International Conference on Emerging Trends in Engineering and Technology

Research into the problem of classification of sonar signals has been taken up as a challenging task for the neural networks. Our purpose is to make a comparison of decision tree induction and neural network classifiers in a classification of sonar ...
A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems
Soft Computing and Intelligent Systems: Techniques and Applications

Breakthrough classification performances have been achieved by utilizing ensemble techniques in machine learning and data mining. Bagging is one such ensemble technique that has outperformed single models in obtaining higher predictive performances. This ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Networking

IEEE/ACM Transactions on Networking Volume 32, Issue 1

Feb. 2024

916 pages

Issue’s Table of Contents

1558-2566 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 21 June 2023

Published in TON Volume 32, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
23
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yoon SKim HJeong HBae CKim HPack S(2024)Multi-task Aware Resource Efficient Traffic Classification via in-Network InferenceProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673803(69-74)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672198.3673803
Zhang KSamaan NKarmouch A(2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3402074
Zheng CHong XDing DVargaftik SBen-Itzhak YZilberman N(2023)In-Network Machine Learning Using Programmable Network Devices: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.334435126:2(1171-1200)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1109/COMST.2023.3344351

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents