Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Empowering In-Network Classification in Programmable Switches by Binary Decision Tree and Knowledge Distillation

Published: 21 June 2023 Publication History

Abstract

Given the high packet processing efficiency of programmable switches (e.g., P4 switches of Tbps), several works are proposed to offload the decision tree (DT) to P4 switches for in-network classification. Although the DT is suitable for the match-action paradigm in P4 switches, the range match rules used in the DT may not be supported across devices of different P4 standards. Additionally, emerging models including neural networks (NNs) and ensemble models, have shown their superior performance in networking tasks. But their sophisticated operations pose new challenges to the deployment of these models in switches. In this paper, we propose Mousikav2 to address these drawbacks successfully. First, we design a new tree model, i.e., the binary decision tree (BDT). Unlike the DT, our BDT consists of classification rules in the form of bits, which is a good fit for the standard ternary match supported by different hardware/software switches. Second, we introduce a teacher-student knowledge distillation architecture in Mousikav2, which enables the general transfer from other sophisticated models to the BDT. Through this transfer, sophisticated models are indirectly deployed in switches to avoid switch constraints. Finally, a lightweight P4 program is developed to perform classification tasks in switches with the BDT after knowledge distillation. Experiments on three networking tasks and three commodity switches show that Mousikav2 not only improves the classification accuracy by 3.27&#x0025;, but also reduces the switch stage and memory usage by <inline-formula> <tex-math notation="LaTeX">$2.00\times $ </tex-math></inline-formula> and 28.67&#x0025;, respectively. Code is available at <uri>https://github.com/xgr19/Mousika</uri>.

References

[1]
M. Wang, Y. Cui, X. Wang, S. Xiao, and J. Jiang, “Machine learning for networking: Workflow, advances and opportunities,” IEEE Netw., vol. 32, no. 2, pp. 92–99, Mar. 2018.
[2]
J. Li and Z. Pan, “Network traffic classification based on deep learning,” Trans. Internet Inf. Syst., vol. 14, no. 11, pp. 4246–4267, 2020.
[3]
P. Poupartet al., “Online flow size prediction for improved network routing,” in Proc. IEEE 24th Int. Conf. Netw. Protocols (ICNP), Nov. 2016, pp. 1–6.
[4]
M. Lotfollahi, M. J. Siavoshani, R. S. H. Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Comput., vol. 24, no. 3, pp. 1999–2012, Feb. 2020.
[5]
S. M. Kasongo and Y. Sun, “A deep learning method with wrapper based feature extraction for wireless intrusion detection system,” Comput. Secur., vol. 92, May 2020, Art. no.
[6]
Z. Zhao, X. Shi, Z. Wang, Q. Li, H. Zhang, and X. Yin, “Efficient and accurate flow record collection with HashFlow,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 5, pp. 1069–1083, May 2022.
[7]
P. Bosshartet al., “p4: Programming protocol-independent packet processors,” ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95, Jul. 2014.
[8]
Z. Xiong and N. Zilberman, “Do switches dream of machine learning?: Toward in-network classification,” in Proc. 18th ACM Workshop Hot Topics Netw., Nov. 2019, pp. 25–33.
[9]
C. Zhenget al., “Iisy: Practical in-network classification,” 2022, arXiv:2205.08243.
[10]
J.-H. Lee and K. Singh, “SwitchTree: In-network computing and traffic analyses with random forests,” Neural Comput. Appl., pp. 1–12, Nov. 2020. [Online]. Available: https://link.springer.com/article/10.1007/s00521-020-05440-2
[11]
C. Busse-Grawitz, R. Meier, A. Dietmüller, T. Bühler, and L. Vanbever, “pForest: In-network inference with random forests,” 2022, arXiv:1909.05680.
[12]
C. Zheng and N. Zilberman, “Planter: Seeding trees within switches,” in Proc. SIGCOMM Poster Demo Sessions, Aug. 2021, pp. 12–14.
[13]
C. Zhenget al., “Automating in-network machine learning,” 2022.
[14]
P4 Language Consortium. v1model.p4. Website. Accessed: Sep. 13, 2022. [Online]. Available: https://github.com/p4lang/p4c/blob/main/p4include/v1model.p4
[15]
Barefoot Networks. Tofino Switch. Website. Accessed: Sep. 13, 2022. [Online]. Available: https://www.barefootnetworks.com/products/brief-tofino/
[16]
G. Xie, Q. Li, and Y. Jiang, “Self-attentive deep learning method for online traffic classification and its interpretability,” Comput. Netw., vol. 196, Sep. 2021, Art. no.
[17]
A. Aleesa, M. Younis, A. A. Mohammed, and N. Sahar, “Deep-intrusion detection system with enhanced UNSW-NB15 dataset based on deep learning techniques,” J. Eng. Sci. Technol., vol. 16, pp. 711–727, Feb. 2021.
[18]
G. Xie, Q. Li, Y. Dong, G. Duan, Y. Jiang, and J. Duan, “Mousika: Enable general in-network intelligence in programmable switches by knowledge distillation,” in Proc. IEEE INFOCOM Conf. Comput. Commun., May 2022, pp. 1938–1947.
[19]
P4 Language Consortium. Core.p4. Website. Accessed: Apr. 27, 2023. [Online]. Available: https://github.com/p4lang/p4c/blob/main/p4include/core.p4#L72
[20]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learn. Represent., 2015, pp. 1–14.
[21]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 26th Annu. Conf. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
[22]
J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2019, pp. 4171–4186.
[23]
H. Penget al., “Large-scale hierarchical text classification with recursively regularized deep graph-CNN,” in Proc. World Wide Web Conf. World Wide Web (WWW), 2018, pp. 1063–1072.
[24]
X. Liuet al., “Attention-based bidirectional GRU networks for efficient HTTPS traffic classification,” Inf. Sci., vol. 541, pp. 297–315, Dec. 2020.
[25]
T. Panet al., “Sailfish: Accelerating cloud-scale multi-tenant multi-service gateways with programmable switches,” in Proc. ACM SIGCOMM Conf., Aug. 2021, pp. 194–206.
[26]
D. Firestoneet al., “Azure accelerated networking: Smartnics in the public cloud,” in Proc. 15th USENIX Symp. Netw. Syst. Design Implement., 2018, pp. 51–66.
[27]
D. Barradas, N. Santos, L. Rodrigues, S. Signorello, F. M. V. Ramos, and A. Madeira, “FlowLens: Enabling efficient flow classification for ML-based network security applications,” in Proc. Netw. Distrib. Syst. Secur. Symp., 2021, pp. 1–18.
[28]
H. Liu, “Efficient mapping of range classifier into ternary-CAM,” in Proc. 10th Symp. High Perform. Interconnects, Aug. 2002, pp. 95–100.
[29]
F. Zane, G. Narlikar, and A. Basu, “CoolCAMs: Power-efficient TCAMs for forwarding engines,” in Proc. IEEE INFOCOM 22nd Annu. Joint Conf. IEEE Comput. Commun. Societies, Mar. 2003, pp. 42–52.
[30]
R. Miao, H. Zeng, C. Kim, J. Lee, and M. Yu, “SilkRoad: Making stateful Layer-4 load balancing fast and cheap using switching ASICs,” in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2017, pp. 15–28.
[31]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification Regression Trees. Belmont, CA, USA: Wadsworth, 1984.
[32]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[33]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, arXiv:1412.3555.
[34]
L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[35]
S. Kumar. Essential Guide to Perform Feature Binning Using a Decision Tree Model. Website. Accessed: May 3, 2023. [Online]. Available: https://towardsdatascience.com/essential-guide-to-perform-feature-binning-using-a-decision-tree-model-90bcc66d61f9
[36]
R. Pramoditha. One-Hot Encode Scalar-Value Labels for Deep Learning Models. Website. Accessed: May 3, 2023. [Online]. Available: https://medium.com/data-science-365/one-hot-encode-scalar-value-labels-for-deep-learning-models-4d4053f185c5
[37]
J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, Jun. 2021.
[38]
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015, arXiv:1503.02531s.
[39]
J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2014, pp. 2654–2662.
[40]
G. Urbanet al., “Do deep convolutional nets really need to be deep and convolutional?” in Proc. 5th Int. Conf. Learn. Represent., 2017, pp. 1–13.
[41]
J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S.-T. Xia, “Multinomial random forest,” Pattern Recognit., vol. 122, Feb. 2022, Art. no.
[42]
N. Frosst and G. E. Hinton, “Distilling a neural network into a soft decision tree,” in Proc. 1st Int. Workshop Comprehensibility Explanation AI ML Co-Located With 16th Int. Conf. Italian Assoc. Artif. Intell., vol. 2071, 2017, pp. 1–8.
[43]
J. Bai, Y. Li, J. Li, Y. Jiang, and S. Xia, “Rectified decision trees: Towards interpretability, compression and empirical soundness,” 2020, arXiv:1903.05965.
[44]
T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proc. 10th ACM SIGCOMM Conf. Internet Meas., Nov. 2010, pp. 267–280.
[45]
G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and VPN traffic using time-related features,” in Proc. 2nd Int. Conf. Inf. Syst. Secur. Privacy, 2016, pp. 407–414.
[46]
N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset,” Future Gener. Comput. Syst., vol. 100, pp. 779–796, Nov. 2019.
[47]
R. Bolla, R. Bruschi, C. Lombardo, and D. Suino, “Evaluating the energy-awareness of future internet devices,” in Proc. IEEE 12th Int. Conf. High Perform. Switching Routing, Jul. 2011, pp. 36–43.
[48]
J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Statist., vol. 29, no. 5, pp. 1189–1232, Oct. 2001.
[49]
L. Van Efferen and A. M. T. Ali-Eldin, “A multi-layer perceptron approach for flow-based anomaly detection,” in Proc. Int. Symp. Netw., Comput. Commun. IEEE, 2017, pp. 1–6.
[50]
F. Pedregosaet al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, no. 10, pp. 2825–2830, Jul. 2017.
[51]
A. Paszkeet al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2019, pp. 8024–8035.
[52]
G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Toward effective mobile encrypted traffic classification through deep learning,” Neurocomputing, vol. 409, pp. 306–315, Oct. 2020.
[53]
Y. Zhuet al., “Teach less, learn more: On the undistillable classes in knowledge distillation,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2022, pp. 1–14.
[54]
C. Wang, S. Zhang, S. Song, and G. Huang, “Learn from the past: Experience ensemble knowledge distillation,” in Proc. 26th Int. Conf. Pattern Recognit. (ICPR), Aug. 2022, pp. 4736–4743.
[55]
J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 7130–7138.
[56]
W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3962–3971.
[57]
G. Ji and Z. Zhu, “Knowledge distillation in wide neural networks: Risk bound, data efficiency and imperfect teacher,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2020, pp. 1–16.
[58]
G. Zhou, Z. Liu, C. Fu, Q. Li, and K. Xu, “An efficient design of intelligent network data plane,” in Proc. 32nd USENIX Secur. Symp., 2022, pp. 2461–2478.
[59]
J. R. Quinlan, “Simplifying decision trees,” Int. J. Hum.-Comput. Stud., vol. 51, no. 2, pp. 497–510, Aug. 1999.
[60]
M. Bohanec and I. Bratko, “Trading accuracy for simplicity in decision trees,” Mach. Learn., vol. 15, no. 3, pp. 223–250, Jun. 1994.

Cited By

View all
  • (2024)Multi-task Aware Resource Efficient Traffic Classification via in-Network InferenceProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673803(69-74)Online publication date: 4-Aug-2024
  • (2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 1-Aug-2024
  • (2023)In-Network Machine Learning Using Programmable Network Devices: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.334435126:2(1171-1200)Online publication date: 19-Dec-2023

Index Terms

  1. Empowering In-Network Classification in Programmable Switches by Binary Decision Tree and Knowledge Distillation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE/ACM Transactions on Networking
          IEEE/ACM Transactions on Networking  Volume 32, Issue 1
          Feb. 2024
          916 pages

          Publisher

          IEEE Press

          Publication History

          Published: 21 June 2023
          Published in TON Volume 32, Issue 1

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)23
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 16 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Multi-task Aware Resource Efficient Traffic Classification via in-Network InferenceProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673803(69-74)Online publication date: 4-Aug-2024
          • (2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 1-Aug-2024
          • (2023)In-Network Machine Learning Using Programmable Network Devices: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.334435126:2(1171-1200)Online publication date: 19-Dec-2023

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media