research-article

Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing Optimization

Authors:

Filip De Turck,

Yifang Qin Academic Editor:

Narasimhan VenkateswaranAuthors Info & Claims

Wireless Communications and Mobile Computing, Volume 2022

https://doi.org/10.1155/2022/1231979

Published: 01 January 2022 Publication History

Abstract

In view of the inability of traditional interdomain routing schemes to meet the sudden network changes and adapt the routing policy accordingly, many optimization schemes such as modifying Border Gateway Protocol (BGP) parameters and using software-defined network (SDN) to optimize interdomain routing decisions have been proposed. However, with the change and increase of the demand for network data transmission, the high latency and flexibility of these mechanisms have become increasingly prominent. Recent researches have addressed these challenges through multiagent reinforcement learning (MARL), which can be capable of dynamically meeting interdomain requirements, and the multiagent Markov Decision Process (MDP) is introduced to construct this routing optimization problem. Thus, in this paper, an interdomain collaborative routing scheme is proposed in interdomain collaborative architecture. The proposed Feudal Multiagent Actor-Critic (FMAAC) algorithm is designed based on multiagent actor-critic and feudal reinforcement learning to solve this competition-cooperative problem. Our multiagent learns about the optimal interdomain routing decisions, focused on different optimization objectives such as end-to-end delay, throughput, and average delivery rate. Experiments were carried out in the interdomain testbed to verify the convergence and effectiveness of the FMAAC algorithm. Experimental results show that our approach can significantly improve various Quality of Service (QoS) indicators, containing reduced end-to-end delay, increased throughput, and guaranteed over 90% average delivery rate.

References

[1]

H. Yang, Y. Liang, J. Yuan, Q. Yao, A. Yu, and J. Zhang, “Distributed blockchain-based trusted multidomain collaboration for mobile edge computing in 5G and beyond,” IEEE Transactions on Industrial Informatics, vol. 16, no. 11, pp. 7094–7104, 2020.

[2]

R. B. Silva and E. S. Mota, “A survey on approaches to reduce BGP interdomain routing convergence delay on the Internet,” IEEE Communication Surveys and Tutorials, vol. 19, no. 4, pp. 2949–2984, 2017.

[3]

J. Zhao, F. Li, D. Ren, J. Hu, Q. Yao, and W. Li, “An intelligent inter-domain routing scheme under the consideration of diffserv QoS and energy saving in multi-domain software-defined flexible optical networks,” Optics Communications, vol. 366, pp. 229–240, 2016.

[4]

L. You, L. Wei, L. Junzhou, J. Jian, and X. Nu, “An inter-domain multi-path flow transfer mechanism based on SDN and multi-domain collaboration,” in 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 758–761, Ottawa, ON, Canada, 2015.

[5]

H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proceedings of the 15th ACM workshop on hot topics in networks, pp. 50–56, 2016.

Digital Library

[6]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2015, https://arxiv.org/abs/1509.02971.

[7]

X. Zhao, C. Wu, and F. Le, “Improving inter-domain routing through multi-agent reinforcement learning,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1129–1134, Toronto, ON, Canada, 2020.

[8]

S. Agarwal, C. N. Chuah, and R. H. Katz, “OPCA: robust interdomain policy routing and traffic control,” in 2003 IEEE Conference onOpen Architectures and Network Programming, pp. 55–64, San Francisco, CA, USA, 2003.

[9]

P. Sermpezis and X. Dimitropoulos, “Can SDN accelerate BGP convergence? — a performance analysis of inter-domain routing centralization,” in 2017 IFIP Networking Conference (IFIP Networking) and Workshops, pp. 1–9, Stockholm, Sweden, 2017.

[10]

E. A. Alabdulkreem, H. S. Al-Raweshidy, and M. F. Abbod, “MRAI optimization for BGP convergence time reduction without increasing the number of advertisement messages,” Procedia Computer Science, vol. 62, pp. 419–426, 2015.

[11]

Q. Xiang, J. Zhang, K. Gao, Y. S. Lim, F. Le, G. Li, and Y. R. Yang, “Toward optimal software-defined interdomain routing,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pp. 1529–1538, Toronto, ON, Canada, 2020.

Digital Library

[12]

A. Arins, “Blockchain based inter-domain latency aware routing proposal in software defined network,” in 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–2, Vilnius, Lithuania, 2018.

[13]

Z. Zhong, N. Hua, Z. Yuan, Y. Li, and X. Zheng, Routing without routing algorithms: an AI-based routing paradigm for multi-domain optical networks, Optical Fiber Communication Conference (OFC) 2019, OSA Technical Digest (Optical Society of America), 2019, paper Th2A.24.

[14]

P. Sun, Y. Hu, J. Lan, and M. Chen, “TIDE: Time-relevant deep reinforcement learning for routing optimization,” Future Generation Computer Systems, vol. 99, pp. 401–409, 2019.

Digital Library

[15]

J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, pp. 1146–1155, 2017.

Digital Library

[16]

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in the 31st International Conference on Neural Information Processing Systems, pp. 6382–6393, 2017.

Digital Library

[17]

P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning,” 2017, https://arxiv.org/abs/1706.05296.

[18]

S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 2961–2970, 2019, http://proceedings.mlr.press/v97/iqbal19a.html.

[19]

J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018, https://ojs.aaai.org/index.php/AAAI/article/view/11794.

[20]

K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully decentralized multi-agent reinforcement learning with networked agents,” in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 5872–5881, 2018, http://proceedings.mlr.press/v80/zhang18n.html.

[21]

S. Mukhopadhyay and B. Jain, “Multi-agent Markov decision processes with limited agent communication,” in Proceeding of the 2001 IEEE International Symposium on Intelligent Control (ISIC '01) (Cat. No.01CH37206), pp. 7–12, Mexico City, Mexico, 2001.

[22]

T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000.

[23]

A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “Feudal networks for hierarchical reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, PMLR, vol. 70, pp. 3540–3549, 2017, http://proceedings.mlr.press/v70/vezhnevets17a.html.

[24]

I. Casanueva, P. Budzianowski, P. H. Su, S. Ultes, L. Rojas-Barahona, B. H. Tseng, and M. Gašić, “Feudal reinforcement learning for dialogue management in large domains,” 2018, https://arxiv.org/abs/1803.03232.

[25]

J. Ma and F. Wu, “Feudal multi-agent deep reinforcement learning for traffic signal control,” in the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 816–824, 2020.

Digital Library

[26]

C. Xu, S. Liu, C. Zhang, Y. Huang, Z. Lu, and L. Yang, “Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems,” IEEE Transactions on Vehicular Technology, vol. 70, no. 2, pp. 1658–1672, 2021.

[27]

K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 5887–5896, 2019, https://proceedings.mlr.press/v97/son19a.html.

[28]

H. Ryu, H. Shin, and J. Park, “Multi-agent actor-critic with hierarchical graph attention network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 7236–7243, 2020.

[29]

M. A. L. Silva, S. R. de Souza, M. J. F. Souza, and A. L. Bazzan, “A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems,” Expert Systems with Applications, vol. 131, pp. 148–171, 2019.

Digital Library

[30]

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” 2018, https://arxiv.org/abs/1812.05905.

[31]

P. Gawłowicz and A. Zubow, “ns-3 meets OpenAI gym: the playground for machine learning in networking research,” in the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp. 113–120, 2019.

Digital Library

[32]

G. Brockman, V. Cheung, L. Pettersson, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI gym,” 2016, https://arxiv.org/abs/1606.01540.

[33]

G. Riley and T. Henderson, Modeling and Tools for Network Simulation, Springer, 2010.

[34]

V. Le, N. Ioini, H. Barzegar, and C. Pahl, “A multi-domain network simulator based on NS-3,” in the 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications, pp. 217–224, 2020, https://www.scitepress.org/Link.aspx?doi=10.5220/0009831602170224.

[35]

A. Alanazi and E. Khaled, “Real-time QoS routing protocols in wireless multimedia sensor networks: study and analysis,” Sensors, vol. 15, no. 9, pp. 22209–22233, 2015.

[36]

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1861–1870, 2018, https://proceedings.mlr.press/v80/haarnoja18b.

[37]

Z. Chen, J. Bi, Y. Fu, Y. Wang, and A. Xu, “MLV: a multi-dimension routing information exchange mechanism for inter-domain SDN,” in 2015 IEEE 23rd International Conference on Network Protocols (ICNP), pp. 438–445, San Francisco, CA, USA, 2015.

[38]

V. Kotronis, A. Gamperli, and X. Dimitropoulos, “Routing centralization across domains via SDN: a model and emulation framework for BGP evolution,” Computer Networks, vol. 92, pp. 227–239, 2015.

Digital Library

Cited By

Mobile Computing W(2023)RetractedWireless Communications & Mobile Computing10.1155/2023/98617612023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/9861761

Recommendations

On understanding transient interdomain routing failures

The convergence time of the interdomain routing protocol, BGP, can last as long as 30 minutes. Yet, routing behavior during BGP route convergence is poorly understood. During route convergence, an end-to-end Internet path can experience a transient loss ...
Incentive-compatible interdomain routing
EC '06: Proceedings of the 7th ACM conference on Electronic commerce

The routing of traffic between Internet domains, or Autonomous Systems (ASes), a task known as interdomain routing, is currently handled by the Border Gateway Protocol (BGP) [17]. Using BGP, autonomous systems can apply semantically rich routing ...
Interdomain multipath routing

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Wireless Communications & Mobile Computing

Wireless Communications & Mobile Computing Volume 2022, Issue

2022

25330 pages

ISSN:1530-8669

Issue’s Table of Contents

Copyright © 2022 Zhuo Li et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 01 January 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mobile Computing W(2023)RetractedWireless Communications & Mobile Computing10.1155/2023/98617612023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/9861761

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents