Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3544216.3544229acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

LiteFlow: towards high-performance adaptive neural networks for kernel datapath

Published: 22 August 2022 Publication History

Abstract

Adaptive neural networks (NN) have been used to optimize OS kernel datapath functions because they can achieve superior performance under changing environments. However, how to deploy these NNs remains a challenge. One approach is to deploy these adaptive NNs in the userspace. However, such userspace deployments suffer from either high cross-space communication overhead or low responsiveness, significantly compromising the function performance. On the other hand, pure kernel-space deployments also incur a large performance degradation because the computation logic of model tuning algorithm is typically complex, interfering with the performance of normal datapath execution.
This paper presents LiteFlow, a hybrid solution to build high-performance adaptive NNs for kernel datapath. At its core, LiteFlow decouples the control path of adaptive NNs into: (1) a kernel-space fast path for efficient model inference, and (2) a userspace slow path for effective model tuning. We have implemented LiteFlow with Linux kernel datapath and evaluated it with three popular datapath functions including congestion control, flow scheduling, and load balancing. Compared to prior works, LiteFlow achieves 44.4% better goodput for congestion control, and improves the completion time for long flows by 33.7% and 56.7% for flow scheduling and load balancing, respectively.

Supplementary Material

PDF File (p414-zhang-supp.pdf)
Supplemental material.

References

[1]
2020. Aurora Codebase. https://github.com/PCCproject/PCC-RL. (2020).
[2]
2020. GCC, the GNU Compiler Collection. https://gcc.gnu.org. (2020).
[3]
2020. Linux Kernel v4.1.5. https://lwn.net/Articles/654091/. (2020).
[4]
2020. Mellanox SN2100 Switch. https://www.mellanox.com/products/ethernet-switches/sn2000. (2020).
[5]
2020. mpstat. https://man7.org/linux/man-pages/man1/mpstat.1.html. (2020).
[6]
2020. netem. https://man7.org/linux/man-pages/man8/tc-netem.8.html. (2020).
[7]
2020. ns3-gym. https://www.nsnam.org/news/2018/12/07/ns3-gym-app.html. (2020).
[8]
2020. Python Jinja. https://jinja.palletsprojects.com/en/3.0.x. (2020).
[9]
2022. Neural Network Optimization with AIMET. https://developer.qualcomm.com/blog/neural-network-optimization-aimet. (2022).
[10]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In USENIX OSDI.
[11]
Soheil Abbasloo, Chen-Yu Yen, and H Jonathan Chao. 2020. Classic meets modern: a pragmatic learning-based congestion control for the internet. In ACM SIGCOMM.
[12]
Mohamed S Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O'Connell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C Ling, et al. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In IEEE FPL.
[13]
Alireza Aghasi, Afshin Abdi, Nam Nguyen, and Justin Romberg. 2017. Nettrim: Convex pruning of deep neural networks with performance guarantee. In NeurIPS.
[14]
Ibrahim Umit Akgun, Ali Selman Aydin, and Erez Zadok. 2020. KMLIB: Towards Machine Learning for Operating Systems. In Proceedings of the On-Device Intelligence Workshop, co-located with the MLSys Conference.
[15]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center tcp (dctcp). In ACM SIGCOMM.
[16]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pfabric: Minimal near-optimal data-center transport. In ACM SIGCOMM.
[17]
Wei Bai, Li Chen, Kai Chen, Dongsu Han, Chen Tian, and Hao Wang. 2015. Information-agnostic flow scheduling for commodity data centers. In USENIX NSDI.
[18]
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT. 177--186.
[19]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
[20]
Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2016. BBR: Congestion-based congestion control. ACM Queue 14, 5 (2016), 20--53.
[21]
Nicolo Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, learning, and games. Cambridge university press.
[22]
Li Chen, Kai Chen, Wei Bai, and Mohammad Alizadeh. 2016. Scheduling mix-flows in commodity datacenters with karuna. In ACM SIGCOMM.
[23]
Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In ACM SIGCOMM.
[24]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
[25]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In USENIX NSDI.
[26]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: Smartnics in the public cloud. In USENIX NSDI.
[27]
Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. 2021. BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing. In USENIX NSDI.
[28]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021).
[29]
Thomas G Goodwillie. 2003. Calculus iii: Taylor series. Geometry & Topology 7, 2 (2003), 645--711.
[30]
Yunhong Gu and Robert L Grossman. 2007. UDT: UDP-based data transfer for high-speed wide area networks. Computer Networks 51, 7 (2007), 1777--1799.
[31]
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2016. From model to FPGA: Software-hardware co-design for efficient neural network acceleration. In IEEE Hot Chips.
[32]
Saransh Gupta, Mohsen Imani, Harveen Kaur, and Tajana Simunic Rosing. 2019. Nnpim: A processing in-memory architecture for neural network acceleration. IEEE Trans. Comput. 68, 9 (2019), 1325--1337.
[33]
Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS operating systems review 42, 5 (2008), 64--74.
[34]
Christian Hopps et al. 2000. Analysis of an equal-cost multi-path algorithm. Technical Report. RFC 2992, November.
[35]
Shuihai Hu, Kai Chen, Haitao Wu, Wei Bai, Chang Lan, Hao Wang, Hongze Zhao, and Chuanxiong Guo. 2015. Explicit path control in commodity data centers: Design and applications. In USENIX NSDI.
[36]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869--6898.
[37]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE CVPR.
[38]
Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar. 2019. A deep reinforcement learning perspective on internet congestion control. In ICML.
[39]
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. 2009. The nature of data center traffic: measurements & analysis. In ACM IMC.
[40]
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In EuroSys.
[41]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[42]
Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
[43]
Aayan Kumar, Vivek Seshadri, and Rahul Sharma. 2020. Shiftry: RNN inference in 2kb of RAM. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1--30.
[44]
Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan Iyengar, et al. 2017. The quic transport protocol: Design and internet-scale deployment. In ACM SIGCOMM.
[45]
Yanfang Le, Hyunseok Chang, Sarit Mukherjee, Limin Wang, Aditya Akella, Michael M Swift, and TV Lakshman. 2017. UNO: Uniflying host and smart NIC offload for flexible packet processing. In SoCC.
[46]
Eric Liang, Hang Zhu, Xin Jin, and Ion Stoica. 2019. Neural packet classification. In ACM SIGCOMM.
[47]
Yiqing Ma, Han Tian, Xudong Liao, Junxue Zhang, Weiyan Wang, Kai Chen, and Xin Jin. 2022. Multi-Objective Congestion Control. In ACM EuroSys.
[48]
Akshay Narayan, Frank Cangialosi, Deepti Raghavan, Prateesh Goyal, Srinivas Narayana, Radhika Mittal, Mohammad Alizadeh, and Hari Balakrishnan. 2018. Restructuring endpoint congestion control. In ACM SIGCOMM.
[49]
John Ousterhout. 2021. A linux kernel implementation of the homa transport protocol. In USENIX ATC 21.
[50]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS.
[51]
Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2020. A Computational Approach to Packet Classification. In ACM SIGCOMM.
[52]
Doyen SAHOO, Hong Quang PHAM, Jing LU, and Steven CH HOI. 2018. Online deep learning: Learning deep neural networks on the fly. In IJCAI.
[53]
Peter Jay Salzman, Michael Burian, and Ori Pomerantz. 2007. The linux kernel module programming guide. (2007).
[54]
Giuseppe Siracusano, Davide Sanvito, Salvator Galea, and Roberto Bifulco. 2018. Deep learning inference on commodity network interface cards. In NeurIPS.
[55]
Vojislav Ðukić, Sangeetha Abdu Jyothi, Bojan Karlaš, Muhsen Owaida, Ce Zhang, and Ankit Singla. 2019. Is advance knowledge of flow sizes a plausible assumption?. In USENIX NSDI.
[56]
Asaf Valadarsky, Michael Schapira, Dafna Shahaf, and Aviv Tamar. 2017. Learning to route. In ACM HotNets.
[57]
S. Wilson Prakash and P. Deepalakshmi. 2019. Artificial Neural Network Based Load Balancing On Software Defined Networking. In INCOS.
[58]
Qiongwen Xu, Michael D. Wong, Tanvi Wagle, Srinivas Narayana, and Anirudh Sivaraman. 2021. Synthesizing Safe and Efficient Kernel Extensions for Packet Processing. In ACM SIGCOMM.
[59]
Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis, and Keith Winstein. 2018. Pantheon: the training ground for Internet congestion-control research. In USENIX ATC.
[60]
Haipeng Yao, Xin Yuan, Peiying Zhang, Jingjing Wang, Chunxiao Jiang, and Mohsen Guizani. 2019. A Machine Learning Approach of Load Balance Routing to Support Next-Generation Wireless Networks. In IWCMC.
[61]
Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowdhury. 2017. Resilient datacenter load balancing in the wild. In ACM SIGCOMM.
[62]
Junxue Zhang, Wei Bai, and Kai Chen. 2019. Enabling ECN for datacenter networks with RTT variations. In ACM CoNEXT.
[63]
Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In ICML.

Cited By

View all
  • (2024)Efficient DRL-Based Congestion Control With Ultra-Low OverheadIEEE/ACM Transactions on Networking10.1109/TNET.2023.333073732:3(1888-1903)Online publication date: Jun-2024
  • (2023)MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00086(570-578)Online publication date: 14-Dec-2023

Index Terms

  1. LiteFlow: towards high-performance adaptive neural networks for kernel datapath

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference
    August 2022
    858 pages
    ISBN:9781450394208
    DOI:10.1145/3544216
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive neural network
    2. deployment
    3. kernel datapath

    Qualifiers

    • Research-article

    Funding Sources

    • Key-Area Research and Development Program of Guangdong Province
    • NSFC
    • Hong Kong RGC TRS

    Conference

    SIGCOMM '22
    Sponsor:
    SIGCOMM '22: ACM SIGCOMM 2022 Conference
    August 22 - 26, 2022
    Amsterdam, Netherlands

    Acceptance Rates

    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)279
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient DRL-Based Congestion Control With Ultra-Low OverheadIEEE/ACM Transactions on Networking10.1109/TNET.2023.333073732:3(1888-1903)Online publication date: Jun-2024
    • (2023)MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00086(570-578)Online publication date: 14-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media