Abstract
TCP latency is critical to the performance of Web services. However, packet loss greatly impairs the TCP performance due to its poor loss recovery mechanisms. Recent work FUSO addressed this problem by leveraging multi-path diversity for proactive loss recovery, i.e., using “good” paths to proactively retransmit the potentially lost packet on “bad” paths before they are retransmitted after duplicate ACKs or timeout. Nevertheless, since it has no clue about which packet is (or will be) lost, FUSO simply proactively retransmits the oldest unACKed packet whenever there is a chance for proactive loss recovery. Through analysis and comprehensive experiments, we show that although FUSO behaves well in data center networks, which it is originally designed for, in the Internet scenario, such simple proactive retransmission of the oldest unACKed packet is not accurate enough to recover the lost packets, which causes performance penalty. To address the problem, this paper presents CoFUSO, a Coding-Based Fast Multi-Path Loss Recovery. Different from FUSO, when there is a chance for proactive loss recovery, CoFUSO generates a coding packet that codes all (or multiple) unACKed packets together. As such, CoFUSO can always proactively retransmit the “right” lost packet, since the receiver side can decode the lost packet by combining the coding packet with other received packets. We implement CoFUSO in Linux kernel with \(\sim\)2K lines of code. Testbed and simulation results show that, under lossy condition, CoFUSO can greatly improve the average and 99th percentile flow completion time (FCT) by \(\sim\)12% and \(\sim\)59% in the testbed, and up to \(\sim\)16.9% and \(\sim\)54.5% in the simulation, respectively.
Similar content being viewed by others
Notes
“Sub-flows” and “paths” are interchangeably used in the paper.
For ease of presentation, in this paper, TCP refers to both TCP and multi-path transport such as MPTCP [21] used for accessing web services. They have the same basic loss recovery mechanism, i.e., through duplicate ACKs and RTOs.
Encoding is quick in RS-code so we mainly consider decoding time.
RS-code uses online encoding which requires no extra buffer at the sender side.
Note that we also adopt the receiving side optimization in FUSO to directly push the sub-flow data packet into the data-level receive buffer.
References
Flach, T., Dukkipati, N., Terzis, A. et al. (2013). Reducing web latency: the virtue of gentle aggression. In Proceedings of the ACM SIGCOMM, 2013 conference on SIGCOMM (pp. 159–170). Hong Kong, China: ACM.
Chen, Y., Mahajan, R., Sridharan, B., et al. (2013). A provider-side view of web search response time. ACM SIGCOMM Computer Communication Review, 43(4), 243–254.
Liu, D., Zhao, Y., & Sui, K., et al. (2016). FOCUS: shedding light on the high search response time in the wild. In The 35th annual IEEE international conference on computer communications (pp. 1–9). San Francisco, CA, USA: IEEE.
Arapakis, I., Bai, X., & Cambazoglu, B.B. (2014). Impact of response latency on user behavior in web search. In The 37th international ACM SIGIR conference on research & development in information retrieval. New York, USA: ACM. 103\(\sim\)112.
Zhou, J., Wu, Q., & Li, Z., et al. (2015). Demystifying and mitigating TCP stalls at the server side. The 11th ACM conference on emerging networking experiments and technologies (pp. 1–13). Heidelberg, Germany: ACM.
Chen, G., Lu, Y., & Meng, Y. et al. (2016). Fast and cautious: leveraging multi-path diversity for transport loss recovery in data centers. In USENIX conference on usenix annual technical conference (pp. 29–42). Berkeley, CA, USA: USENIX.
Chen, G., Lu, Y., Meng, Y., et al. (2018). FUSO: fast multi-path loss recovery for data center networks. IEEE/ACM Transactions on Networking, 26(3), 1376–1389.
Reed, S., & Slolmon, G. (1960). Polynomial codes over certain finite fields. Society of Industrial and Applied Mathematics, 8(2), 300–304.
Xu, H., & Li, B. (2013). RepFlow: minimizing flow completion times with replicated flows in data centers. In IEEE INFOCOM 2014-IEEE conference on computer communications (pp. 1581–1589). Toronto, ON, Canada: IEEE.
Fan, X., Li, H., & (2014). Design and implementation of TCP with network coding. In The 2014 2nd international conference on systems and informatics (pp. 570–575). Shanghai, China: IEEE.
Bao, W., Shah-Mansouri, V., & Wong, V. W. S., et al. (2012). TCP VON: joint congestion control and online network coding for wireless networks. In 2012 IEEE global communications conference (pp. 125–130). Anaheim, CA: IEEE.
Sun, J., Zhang, Y., & Tang, D., et al. (2015). TCP-FNC: a novel TCP with network coding for wireless networks. In 2015 IEEE international conference on communications (pp. 2078–2084). London: IEEE.
Ageneau, P., Boukhatem, N., & Gerla, M., (2017). Practical random linear coding for MultiPath TCP: MPC-TCP. In 2017 24th international conference on telecommunications (pp. 1–6). Limassol: IEEE.
Li, M., Lukyanenko, A., & Cui, Y. (2012). Network coding based multipath TCP. In 2012 proceedings IEEE INFOCOM workshops (pp. 25–30). Orlando, FL: IEEE.
Cui, Y., Wang, L., Wang, X., et al. (2015). FMTCP: a fountain code-based multipath transmission control protocol. IEEE/ACM Transactions on Networking, 23(2), 465–478.
Lu Y, Chen G, & Li B, et al. (2018). Multi-path transport for RDMA in datacenters. In 15th USENIX symposium on networked systems design and implementation (NSDI 18) (pp. 357-371). Renton, USA: USENIX.
Chen, G., Lu, Y., Li, B., et al. (2019). MP-RDMA: enabling RDMA with multi-path transport in datacenters. IEEE/ACM Transactions on Networking, 27(6), 2308–2323.
Marketing Land. (2016). Mobile devices now driving 56 percent of traffic to top sites. https://marketingland.com/mobile-top-sites-165725. Accessed 23 Feb 2016.
Apple. (2016). MUse multipath TCP to create backup connections for iOS. https://support.apple.com/en-us/HT201373. Accessed 10 Aug 2016.
Huawei. (2018). Exclusive: how link turbo works on the Honor V20. https://www.gizmochina.com/2018/12/26/exclusive-how-link-turbo-works-on-the-honor-v20/. Accessed 26 Dec 2018.
Ford, A., Raiciu, C., & Handley, M., et al. (2013). TCP extensions for multipath operation with mul-tiple addresses. RFC 6824.
Nie, X., Zhao, Y., & Pei, D., (2018). Reducing web latency through dynamically setting TCP initial window with reinforcement learning. In 2018 IEEE/ACM 26th international symposium on quality of service (pp. 1–10). Banff, AB, Canada: IEEE.
Paasch, C., Detal, G., & Nalayama, K., et al. (2019). Exclusive: How Link Turbo works on the Honor V20. https://github.com/multipath-tcp/mptcp. Accessed 5 August 2019.
Chen G. (2016). FUSO. https://github.com/1989chenguo/FUSO. Accessed 13 June 2016.
Legout, A., Amir, E., Balakrishnan, H., et al. (2011). The network Simulator-ns-2. http://www.isi.edu/nsnam/ns/. Accessed 4 Nov 2011.
Wu, H., Ju, J., & Lu, G., et al. (2012). Tuning ECN for data center networks. In The 8th international conference on emerging networking experiments and technologies (pp 25–36). Nice, France: ACM.
Dukkipati, N., Refice, T., Cheng, Y., et al. (2010). An argument for increasing TCP’s initial congestion window. ACM SIGCOMM Computer Communication Review, 40(3), 26–33.
Hopps, Christian E. (2000). Analysis of an equal-cost multi-path algorithm. RFC 2992.
Dukkipati, N., Cardwell, N., & Cheng, Y., et al. (2013). Tail loss probe (TLP): an algorithm for fast recovery of tail losses. https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. Accessed 25 Feb 2013.
Alizadeh, M., Edsall, T., & Dharmapurikar, S., et al. (2014). CONGA: distributed congestion-aware load balancing for datacenters. In 2014 ACM conference on SIGCOMM (pp. 503–514). New York, USA: ACM.
Acknowledgements
We thank Xiaoning Zhan for his help on refining the paper. This work was supported in part by the National Natural Science Foundation of China under Grant 6187060280, in part by the Tencent Rhino-Bird Open Research Fund, and in part by the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Zhou, G. & Chen, G. Reducing web latency with coding-based fast multi-path loss recovery. Wireless Netw 27, 195–209 (2021). https://doi.org/10.1007/s11276-020-02443-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11276-020-02443-8