Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing Optimization

Published: 01 January 2022 Publication History

Abstract

In view of the inability of traditional interdomain routing schemes to meet the sudden network changes and adapt the routing policy accordingly, many optimization schemes such as modifying Border Gateway Protocol (BGP) parameters and using software-defined network (SDN) to optimize interdomain routing decisions have been proposed. However, with the change and increase of the demand for network data transmission, the high latency and flexibility of these mechanisms have become increasingly prominent. Recent researches have addressed these challenges through multiagent reinforcement learning (MARL), which can be capable of dynamically meeting interdomain requirements, and the multiagent Markov Decision Process (MDP) is introduced to construct this routing optimization problem. Thus, in this paper, an interdomain collaborative routing scheme is proposed in interdomain collaborative architecture. The proposed Feudal Multiagent Actor-Critic (FMAAC) algorithm is designed based on multiagent actor-critic and feudal reinforcement learning to solve this competition-cooperative problem. Our multiagent learns about the optimal interdomain routing decisions, focused on different optimization objectives such as end-to-end delay, throughput, and average delivery rate. Experiments were carried out in the interdomain testbed to verify the convergence and effectiveness of the FMAAC algorithm. Experimental results show that our approach can significantly improve various Quality of Service (QoS) indicators, containing reduced end-to-end delay, increased throughput, and guaranteed over 90% average delivery rate.

References

[1]
H. Yang, Y. Liang, J. Yuan, Q. Yao, A. Yu, and J. Zhang, “Distributed blockchain-based trusted multidomain collaboration for mobile edge computing in 5G and beyond,” IEEE Transactions on Industrial Informatics, vol. 16, no. 11, pp. 7094–7104, 2020.
[2]
R. B. Silva and E. S. Mota, “A survey on approaches to reduce BGP interdomain routing convergence delay on the Internet,” IEEE Communication Surveys and Tutorials, vol. 19, no. 4, pp. 2949–2984, 2017.
[3]
J. Zhao, F. Li, D. Ren, J. Hu, Q. Yao, and W. Li, “An intelligent inter-domain routing scheme under the consideration of diffserv QoS and energy saving in multi-domain software-defined flexible optical networks,” Optics Communications, vol. 366, pp. 229–240, 2016.
[4]
L. You, L. Wei, L. Junzhou, J. Jian, and X. Nu, “An inter-domain multi-path flow transfer mechanism based on SDN and multi-domain collaboration,” in 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 758–761, Ottawa, ON, Canada, 2015.
[5]
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proceedings of the 15th ACM workshop on hot topics in networks, pp. 50–56, 2016.
[6]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2015, https://arxiv.org/abs/1509.02971.
[7]
X. Zhao, C. Wu, and F. Le, “Improving inter-domain routing through multi-agent reinforcement learning,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1129–1134, Toronto, ON, Canada, 2020.
[8]
S. Agarwal, C. N. Chuah, and R. H. Katz, “OPCA: robust interdomain policy routing and traffic control,” in 2003 IEEE Conference onOpen Architectures and Network Programming, pp. 55–64, San Francisco, CA, USA, 2003.
[9]
P. Sermpezis and X. Dimitropoulos, “Can SDN accelerate BGP convergence? — a performance analysis of inter-domain routing centralization,” in 2017 IFIP Networking Conference (IFIP Networking) and Workshops, pp. 1–9, Stockholm, Sweden, 2017.
[10]
E. A. Alabdulkreem, H. S. Al-Raweshidy, and M. F. Abbod, “MRAI optimization for BGP convergence time reduction without increasing the number of advertisement messages,” Procedia Computer Science, vol. 62, pp. 419–426, 2015.
[11]
Q. Xiang, J. Zhang, K. Gao, Y. S. Lim, F. Le, G. Li, and Y. R. Yang, “Toward optimal software-defined interdomain routing,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pp. 1529–1538, Toronto, ON, Canada, 2020.
[12]
A. Arins, “Blockchain based inter-domain latency aware routing proposal in software defined network,” in 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–2, Vilnius, Lithuania, 2018.
[13]
Z. Zhong, N. Hua, Z. Yuan, Y. Li, and X. Zheng, Routing without routing algorithms: an AI-based routing paradigm for multi-domain optical networks, Optical Fiber Communication Conference (OFC) 2019, OSA Technical Digest (Optical Society of America), 2019, paper Th2A.24.
[14]
P. Sun, Y. Hu, J. Lan, and M. Chen, “TIDE: Time-relevant deep reinforcement learning for routing optimization,” Future Generation Computer Systems, vol. 99, pp. 401–409, 2019.
[15]
J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, pp. 1146–1155, 2017.
[16]
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in the 31st International Conference on Neural Information Processing Systems, pp. 6382–6393, 2017.
[17]
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning,” 2017, https://arxiv.org/abs/1706.05296.
[18]
S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 2961–2970, 2019, http://proceedings.mlr.press/v97/iqbal19a.html.
[19]
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018, https://ojs.aaai.org/index.php/AAAI/article/view/11794.
[20]
K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully decentralized multi-agent reinforcement learning with networked agents,” in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 5872–5881, 2018, http://proceedings.mlr.press/v80/zhang18n.html.
[21]
S. Mukhopadhyay and B. Jain, “Multi-agent Markov decision processes with limited agent communication,” in Proceeding of the 2001 IEEE International Symposium on Intelligent Control (ISIC '01) (Cat. No.01CH37206), pp. 7–12, Mexico City, Mexico, 2001.
[22]
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000.
[23]
A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “Feudal networks for hierarchical reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, PMLR, vol. 70, pp. 3540–3549, 2017, http://proceedings.mlr.press/v70/vezhnevets17a.html.
[24]
I. Casanueva, P. Budzianowski, P. H. Su, S. Ultes, L. Rojas-Barahona, B. H. Tseng, and M. Gašić, “Feudal reinforcement learning for dialogue management in large domains,” 2018, https://arxiv.org/abs/1803.03232.
[25]
J. Ma and F. Wu, “Feudal multi-agent deep reinforcement learning for traffic signal control,” in the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 816–824, 2020.
[26]
C. Xu, S. Liu, C. Zhang, Y. Huang, Z. Lu, and L. Yang, “Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems,” IEEE Transactions on Vehicular Technology, vol. 70, no. 2, pp. 1658–1672, 2021.
[27]
K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 5887–5896, 2019, https://proceedings.mlr.press/v97/son19a.html.
[28]
H. Ryu, H. Shin, and J. Park, “Multi-agent actor-critic with hierarchical graph attention network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 7236–7243, 2020.
[29]
M. A. L. Silva, S. R. de Souza, M. J. F. Souza, and A. L. Bazzan, “A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems,” Expert Systems with Applications, vol. 131, pp. 148–171, 2019.
[30]
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” 2018, https://arxiv.org/abs/1812.05905.
[31]
P. Gawłowicz and A. Zubow, “ns-3 meets OpenAI gym: the playground for machine learning in networking research,” in the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp. 113–120, 2019.
[32]
G. Brockman, V. Cheung, L. Pettersson, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI gym,” 2016, https://arxiv.org/abs/1606.01540.
[33]
G. Riley and T. Henderson, Modeling and Tools for Network Simulation, Springer, 2010.
[34]
V. Le, N. Ioini, H. Barzegar, and C. Pahl, “A multi-domain network simulator based on NS-3,” in the 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications, pp. 217–224, 2020, https://www.scitepress.org/Link.aspx?doi=10.5220/0009831602170224.
[35]
A. Alanazi and E. Khaled, “Real-time QoS routing protocols in wireless multimedia sensor networks: study and analysis,” Sensors, vol. 15, no. 9, pp. 22209–22233, 2015.
[36]
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1861–1870, 2018, https://proceedings.mlr.press/v80/haarnoja18b.
[37]
Z. Chen, J. Bi, Y. Fu, Y. Wang, and A. Xu, “MLV: a multi-dimension routing information exchange mechanism for inter-domain SDN,” in 2015 IEEE 23rd International Conference on Network Protocols (ICNP), pp. 438–445, San Francisco, CA, USA, 2015.
[38]
V. Kotronis, A. Gamperli, and X. Dimitropoulos, “Routing centralization across domains via SDN: a model and emulation framework for BGP evolution,” Computer Networks, vol. 92, pp. 227–239, 2015.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Wireless Communications & Mobile Computing
Wireless Communications & Mobile Computing  Volume 2022, Issue
2022
25330 pages
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 01 January 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media