research-article

Public Access

Network planning with deep reinforcement learning

Authors:

Satyajeet Singh Ahuja,

Xin JinAuthors Info & Claims

SIGCOMM '21: Proceedings of the 2021 ACM SIGCOMM 2021 Conference

Pages 258 - 271

https://doi.org/10.1145/3452296.3472902

Published: 09 August 2021 Publication History

Abstract

Network planning is critical to the performance, reliability and cost of web services. This problem is typically formulated as an Integer Linear Programming (ILP) problem. Today's practice relies on hand-tuned heuristics from human experts to address the scalability challenge of ILP solvers.

In this paper, we propose NeuroPlan, a deep reinforcement learning (RL) approach to solve the network planning problem. This problem involves multi-step decision making and cost minimization, which can be naturally cast as a deep RL problem. We develop two important domain-specific techniques. First, we use a graph neural network (GNN) and a novel domain-specific node-link transformation for state encoding, in order to handle the dynamic nature of the evolving network topology during planning decision making. Second, we leverage a two-stage hybrid approach that first uses deep RL to prune the search space and then uses an ILP solver to find the optimal solution. This approach resembles today's practice, but avoids human experts with an RL agent in the first stage. Evaluation on real topologies and setups from large production networks demonstrates that NeuroPlan scales to large topologies beyond the capability of ILP solvers, and reduces the cost by up to 17% compared to hand-tuned heuristics.

Supplementary Material

mahajan-public-review (162-public-review.pdf)

Network Planning with Deep Reinforcement Learning: Public Review

Download
66.53 KB

MP4 File (video-presentation.mp4)

Conference Presentation Video

Download
20.36 MB

MP4 File (video-long.mp4)

Long Version Video

Download
39.61 MB

References

[1]

AlphaFold. https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology.

[2]

Global state of the WAN Report, 2020. https://info.aryaka.com/state-of-the-wan-report-2020.html.

[3]

D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville, and Y. Bengio. An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086, 2016.

[4]

I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.

[5]

Y. Bengio, A. Lodi, and A. Prouvost. Machine learning for combinatorial optimization: a methodological tour d'horizon. European Journal of Operational Research, 2020.

[6]

J. A. Bondy, U. S. R. Murty, et al. Graph theory with applications. Macmillan London, 1976.

[7]

Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, and A. Cire. Combining reinforcement learning and constraint programming for combinatorial optimization. arXiv preprint arXiv:2006.01610, 2020.

[8]

Y. Chang, S. Rao, and M. Tawarmalani. Robust validation of network designs under uncertain demands and failures. In USENIX NSDI, 2017.

Digital Library

[9]

L. Chen, J. Lingys, K. Chen, and F. Liu. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In ACM SIGCOMM, 2018.

Digital Library

[10]

X. Chen and Y. Tian. Learning to perform local rewriting for combinatorial optimization. Advances in Neural Information Processing Systems, 2019.

[11]

CPLEX Optimizer. https://www.ibm.com/analytics/cplex-optimizer.

[12]

D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292, 2015.

[13]

M. Fey and J. E. Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.

[14]

B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In IEEE INFOCOM, 2000.

[15]

O. Gerstel, C. Filsfils, T. Telkamp, M. Gunkel, M. Horneffer, V. Lopez, and A. Mayoral. Multi-layer capacity planning for ip-optical networks. IEEE Communications Magazine, 2014.

[16]

A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.

[17]

L. Gu, D. Zeng, W. Li, S. Guo, A. Y. Zomaya, and H. Jin. Intelligent vnf orchestration and flow scheduling via model-assisted deep reinforcement learning. IEEE Journal on Selected Areas in Communications, 2019.

[18]

P. Gupta, M. Gasse, E. B. Khalil, M. P. Kumar, A. Lodi, and Y. Bengio. Hybrid models for learning to branch. arXiv preprint arXiv:2006.15212, 2020.

[19]

Gurobi solver. https://www.gurobi.com/.

[20]

R. Hartert, S. Vissicchio, P. Schaus, O. Bonaventure, C. Filsfils, T. Telkamp, and P. Francois. A declarative and expressive approach to control forwarding paths in carrier-grade networks. In ACM SIGCOMM, 2015.

Digital Library

[21]

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

[22]

Q. Huang, A. Haj-Ali, W. Moses, J. Xiang, I. Stoica, K. Asanovic, and J. Wawrzynek. Autophase: Juggling hls phase orderings in random forests with deep reinforcement learning. arXiv preprint arXiv:2003.00671, 2020.

[23]

S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, et al. B4: Experience with a globally-deployed software defined wan. In ACM SIGCOMM, 2013.

Digital Library

[24]

N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on internet congestion control. In International Conference on Machine Learning, 2019.

[25]

Z. Jia, M. Zaharia, and A. Aiken. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358, 2018.

[26]

J. M. Kahn and K.-P. Ho. Spectral efficiency limits and modulation/detection techniques for dwdm systems. IEEE Journal of Selected Topics in Quantum Electronics, 2004.

[27]

S. M. Kakade. A natural policy gradient. Advances in Neural Information Processing Systems, 2001.

[28]

E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems, 2017.

[29]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.

[30]

V. Kirilin, A. Sundarrajan, S. Gorinsky, and R. K. Sitaraman. Rl-cache: Learning-based cache admission for content delivery. IEEE Journal on Selected Areas in Communications, 2020.

[31]

V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems, 2000.

[32]

W. Kool, H. Van Hoof, and M. Welling. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018.

[33]

Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015.

[34]

E. Liang, H. Zhu, X. Jin, and I. Stoica. Neural packet classification. In ACM SIGCOMM. 2019.

Digital Library

[35]

C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.

[36]

H. H. Liu, Y. Zhu, J. Padhye, J. Cao, S. Tallapragada, N. P. Lopes, A. Rybalchenko, G. Lu, and L. Yuan. CrystalNet: Faithfully emulating large production networks. In ACM SOSP, 2017.

Digital Library

[37]

Y. Liu, H. Zhang, W. Gongt, and D. Towsley. On the interaction between overlay routing and underlay routing. In IEEE INFOCOM, 2005.

[38]

H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In ACM SIGCOMM, 2017.

Digital Library

[39]

H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh. Learning scheduling algorithms for data processing clusters. In ACM SIGCOMM, 2019.

Digital Library

[40]

N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev. Reinforcement learning for combinatorial optimization: A survey. arXiv preprint arXiv:2003.03600, 2020.

[41]

Z. Meng, M. Wang, J. Bai, M. Xu, H. Mao, and H. Hu. Interpreting deep learning-based networking systems. In ACM SIGCOMM, 2020.

Digital Library

[42]

Mininet. http://mininet.org.

[43]

A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, S. Bae, et al. Chip placement with deep reinforcement learning. arXiv preprint arXiv:2004.10746, 2020.

[44]

J. E. Mitchell. Branch-and-cut algorithms for combinatorial optimization problems. Handbook of applied optimization, 2002.

[45]

A. Mittal, A. Dhawan, S. Manchanda, S. Medya, S. Ranu, and A. Singh. Learning heuristics over large graphs via deep reinforcement learning. arXiv preprint arXiv:1903.03332, 2019.

[46]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.

[47]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 2015.

[48]

T. Nishizeki and N. Chiba. Planar graphs: Theory and algorithms. Elsevier, 1988.

[49]

NS-3 network simulator. https://www.nsnam.org/.

[50]

M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM review, 1991.

[51]

H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang, Y. Song, and Q. Yang. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In WWW, 2018.

[52]

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 2008.

[53]

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.

[54]

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. v. d. Driessche, T. Graepel, and D. Hassabis. Mastering the game of go without human knowledge. Nature, 2017.

[55]

OpenAI Spinning Up. https://spinningup.openai.com/en/latest/.

[56]

J. Suárez-Varela, A. Mestres, J. Yu, L. Kuang, H. Feng, A. Cabellos-Aparicio, and P. Barlet-Ros. Routing in optical transport networks with deep reinforcement learning. IEEE/OSA Journal of Optical Communications and Networking, 2019.

[57]

H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos. Learning to optimize: Training deep neural networks for wireless resource management. In IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2017.

[58]

Y. Tang, S. Agrawal, and Y. Faenza. Reinforcement learning for integer programming: Learning to cut. In International Conference on Machine Learning, 2020.

[59]

Y. Tian, J. Ma, Q. Gong, S. Sengupta, Z. Chen, J. Pinkerton, and C. L. Zitnick. Elf opengo: An analysis and open reimplementation of alphazero. arXiv preprint arXiv:1902.04522, 2019.

[60]

M. Tornatore, G. Maier, and A. Pattavina. Wdm network design by ilp models based on flow aggregation. IEEE/ACM Transactions on Networking, 2007.

[61]

M. Trofin, Y. Qian, E. Brevdo, Z. Lin, K. Choromanski, and D. Li. Mlgo: a machine learning guided compiler optimizations framework. arXiv preprint arXiv:2101.04808, 2021.

[62]

A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar. Learning to route with deep rl. In NIPS Deep Reinforcement Learning Symposium, 2017.

[63]

K. G. Vamvoudakis and F. L. Lewis. Online actor--critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010.

Digital Library

[64]

P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[65]

S. Verdú. Spectral efficiency in the wideband regime. IEEE Transactions on Information Theory, 2002.

Digital Library

[66]

S. Verdú and S. Shamai. Spectral efficiency of cdma with random spreading. IEEE Transactions on Information Theory, 1999.

[67]

J. Wang, D. Ding, H. Wang, C. Christensen, Z. Wang, H. Chen, and J. Li. Polyjuice: High-performance transactions via learned concurrency control. In USENIX OSDI, 2021.

[68]

P. J. Winzer. High-spectral-efficiency optical modulation formats. Journal of Lightwave Technology, 2012.

[69]

Y. Wu and Y. Tian. Training agent for first-person shooter game with actor-critic curriculum learning. In ICLR, 2016.

[70]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.

[71]

K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.

[72]

Z. Yang, B. Chandramouli, C. Wang, J. Gehrke, Y. Li, U. F. Minhas, P.-Å. Larson, D. Kossmann, and R. Acharya. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020.

Digital Library

[73]

J. Yao, R. Tao, R. Gu, J. Nieh, S. Jana, and G. Ryan. Distai: Data-driven automated invariant learning for distributed protocols. In USENIX OSDI, 2021.

[74]

J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems, 2018.

[75]

D. Zeng, L. Gu, S. Pan, J. Cai, and S. Guo. Resource management at the network edge: A deep reinforcement learning approach. IEEE Network, 2019.

[76]

C. Zhang, P. Patras, and H. Haddadi. Deep learning in mobile and wireless networking: A survey. IEEE Communications surveys & tutorials, 2019.

[77]

C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.

Digital Library

[78]

J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.

[79]

D. Zhuo, M. Ghobadi, R. Mahajan, A. Phanishayee, X. K. Zou, H. Guan, A. Krishnamurthy, and T. Anderson. RAIL: A case for redundant arrays of inexpensive links in data center networks. In USENIX NSDI, 2017.

Digital Library

[80]

B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.

Cited By

Agrawal KAbdu Jyothi SEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Cooperative Graceful Degradation in Containerized CloudsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707244(214-232)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707244
Ye MZhang JGuo ZJonathan Chao H(2025)Path-Based Graph Neural Network for Robust and Resilient Routing in Distributed Traffic EngineeringIEEE Journal on Selected Areas in Communications10.1109/JSAC.2025.352881543:2(422-436)Online publication date: Feb-2025
https://doi.org/10.1109/JSAC.2025.3528815
Poorzare RKanellopoulos DSharma VDalapati PWaldhorst O(2025)Network Digital Twin Toward Networking, Telecommunications, and Traffic Engineering: A SurveyIEEE Access10.1109/ACCESS.2025.353194713(16489-16538)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3531947
Show More Cited By

Index Terms

Network planning with deep reinforcement learning
1. Networks
  1. Network services
    1. Network management
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Deep reinforcement learning for transportation network combinatorial optimization: A survey
Abstract
Traveling salesman and vehicle routing problems with their variants, as classic combinatorial optimization problems, have attracted considerable attention for decades of their theoretical and practical value. Many classic algorithms ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Relational Reinforcement Learning

Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCOMM '21: Proceedings of the 2021 ACM SIGCOMM 2021 Conference

August 2021

868 pages

ISBN:9781450383837

DOI:10.1145/3452296

General Chairs:
Fernando Kuipers
Delft University of Technology
,
Matthew Caesar
University of Illinois at Urbana-Champaign

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

SIGCOMM '21

Sponsor:

SIGCOMM

SIGCOMM '21: ACM SIGCOMM 2021 Conference

August 23 - 27, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
6,059
Total Downloads

Downloads (Last 12 months)1,224
Downloads (Last 6 weeks)139

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Agrawal KAbdu Jyothi SEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Cooperative Graceful Degradation in Containerized CloudsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707244(214-232)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707244
Ye MZhang JGuo ZJonathan Chao H(2025)Path-Based Graph Neural Network for Robust and Resilient Routing in Distributed Traffic EngineeringIEEE Journal on Selected Areas in Communications10.1109/JSAC.2025.352881543:2(422-436)Online publication date: Feb-2025
https://doi.org/10.1109/JSAC.2025.3528815
Poorzare RKanellopoulos DSharma VDalapati PWaldhorst O(2025)Network Digital Twin Toward Networking, Telecommunications, and Traffic Engineering: A SurveyIEEE Access10.1109/ACCESS.2025.353194713(16489-16538)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3531947
Yang FMa TShu NLiu CWu TChang C(2025)Is your solution accurate? A fault-oriented performance prediction method for enhancing communication network reliabilityReliability Engineering & System Safety10.1016/j.ress.2024.110793256(110793)Online publication date: Apr-2025
https://doi.org/10.1016/j.ress.2024.110793
Deineko EJungnickel PKehrt C(2024)Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the ArtApplied Sciences10.3390/app1419864214:19(8642)Online publication date: 25-Sep-2024
https://doi.org/10.3390/app14198642
Geromel MCimiano P(2024)Sequence labeling via reinforcement learning with aggregate labelsFrontiers in Artificial Intelligence10.3389/frai.2024.14631647Online publication date: 15-Nov-2024
https://doi.org/10.3389/frai.2024.1463164
Petale SKnapińska AErbayat ELechowicz PWalkowiak KLin SMatsuura MHasegawa HSubramaniam S(2024)PRODIGY+: a robust progressive upgrade approach for elastic optical networksJournal of Optical Communications and Networking10.1364/JOCN.52539216:9(E48)Online publication date: 15-Aug-2024
https://doi.org/10.1364/JOCN.525392
Li SLin XLiu YLi GLi J(2024)OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks OptimizationProceedings of the 1st SIGCOMM Workshop on Hot Topics in Optical Technologies and Applications in Networking10.1145/3672201.3674119(1-6)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672201.3674119
Chen TTan FAi JXiong XWu CRen X(2024)Joint Optimization of Task Offloading and Service Placement for Digital Twin empowered Mobile Edge ComputingProceedings of the 2024 3rd International Conference on Networks, Communications and Information Technology10.1145/3672121.3672145(132-137)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672121.3672145
Ding XDu W(2024)Optimizing Irrigation Efficiency using Deep Reinforcement Learning in the FieldACM Transactions on Sensor Networks10.1145/366218220:4(1-34)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3662182
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten