Abstract
Mobile wireless networks present several challenges for any learning system, due to uncertain and variable device movement, a decentralized network architecture, and constraints on network resources. In this work, we use deep reinforcement learning (DRL) to learn a scalable and generalizable forwarding strategy for such networks. We make the following contributions: (i) we use hierarchical RL to design DRL packet agents rather than device agents to capture the packet forwarding decisions that are made over time and improve training efficiency; (ii) we use relational features to ensure generalizability of the learned forwarding strategy to a wide range of network dynamics and enable offline training; and (iii) we incorporate both forwarding goals and network resource considerations into packet decision-making by designing a weighted reward function. Our results show that the forwarding strategy used by our DRL packet agent often achieves a similar delay per packet delivered as the oracle forwarding strategy and almost always outperforms all other strategies (including state-of-the-art strategies) in terms of delay, even on scenarios on which the DRL agent was not trained.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Not currently available.
Code Availability
Not currently available. We are working on open-sourcing our code.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from http://tensorflow.org/
Albaladejo, C., Sánchez, P., Iborra, A., Soto, F., López, J. A., & Torres, R. (2010). Wireless sensor networks for oceanographic monitoring: A systematic review. Sensors, 10(7).
Almasan, P., Suárez-Varela, J., Rusek, K., Barlet-Ros, P., & Cabellos-Aparicio, A. (2022). Deep reinforcement learning meets graph neural networks: exploring a routing optimization use case. Computer Communications, 196, 184–194.
Asadpour, M., Hummel, K. A., Giustiniano, D., & Draskovic, S. (2016). Route or carry: Motion-driven packet forwarding in micro aerial vehicle networks. IEEE Transactions on Mobile Computing, 16(3), 843–856.
Aschenbruck, N., Ernst, R., Gerhards-Padilla, E., & Schwamborn, M. (2010). Bonnmotion: A mobility scenario generation and analysis tool. In Proceedings of the 3rd international ICST conference on simulation tools and techniques (pp. 1–10).
Bai, F., & Helmy, A. (2006). 1. A survey of mobility modeling and analysis in wireless adhoc networks.
Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1), 41–77.
Battaglia, P., Hamrick, J. B. C., Bapst, V., Sanchez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A., Gilmer, J., Dahl, G. E., Vaswani, A., Allen, K., Nash, C., Langston, V. J., & Pascanu, R. (2018). Relational inductive biases, deep learning, and graph networks. arXiv .
Bello, O., & Zeadally, S. (2014). Intelligent device-to-device communication in the internet of things. IEEE Systems Journal, 10(3), 1172–1182.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in machine learning, 2(1), 1–127.
Boyan, J. A., & Littman, M. L. (1994). Packet routing in dynamically changing networks: A reinforcement learning approach. In Advances in neural information processing systems (pp. 671–678).
Chen, L., Hu, B., Guan, Z.-H., Zhao, L., & Shen, X. (2021). Multiagent meta-reinforcement learning for adaptive multipath routing optimization. IEEE Transactions on Neural Networks and Learning Systems.
Choi, S. P., & Yeung, D.-Y. (1996). Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. Advances in Neural Information Processing Systems, 945–951.
Chollet, F. (2018). Keras: The python deep learning library. Astrophysics Source Code Library, ascl-1806.
Clausen, T., Jacquet, P., Adjih, C., Laouiti, A., Minet, P., Muhlethaler, P., Qayyum, A., & Viennot, L. (2003). Optimized link state routing protocol (OLSR).
Danilov, C., Henderson, T. R., Goff, T., Brewer, O., Kim, J. H., Macker, J., & Adamson, B. (2012). Adaptive routing for tactical communications. In MILCOM 2012-2012 IEEE military communications conference (pp. 1–7). IEEE.
Delosieres, L., & Nadjm-Tehrani, S. (2012). Batman store-and-forward: The best of the two worlds. In 2012 IEEE international conference on pervasive computing and communications workshops (pp. 721–727). IEEE.
Dietterich, T. G. (1998). The MAXQ method for hierarchical reinforcement learning. In ICML (Vol. 98, pp. 118–126). Citeseer.
Di Valerio, V., Presti, F. L., Petrioli, C., Picari, L., Spaccini, D., & Basagni, S. (2019). Carma: Channel-aware reinforcement learning-based multi-path adaptive routing for underwater wireless sensor networks. IEEE Journal on Selected Areas in Communications, 37(11), 2634–2647.
Elwhishi, A., Ho, P.-H., Naik, K., & Shihada, B. (2010). ARBR: Adaptive reinforcement-based routing for DTN. In IEEE 6th international conference on wireless and mobile computing, networking and communications (pp. 376–385).
Feng, M., Qian, L., & Xu, H. (2018). Multi-robot enhanced MANET intelligent routing at uncertain and vulnerable tactical edge. In IEEE Military Communications Conference (MILCOM) (pp. 1–9).
Gerla, M., Lee, E., Pau, G., & Lee, U. (2014). Internet of vehicles: From intelligent grid to autonomous cars and vehicular clouds. In Proceedings of the IEEE world forum on internet of things.
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning (Vol. 1, p. 1296231). Cambridge: MIT Press. 10.
Han, C., Yao, H., Mai, T., Zhang, N., & Guizani, M. (2021). QMIX aided routing in social-based delay-tolerant networks. IEEE Transactions on Vehicular Technology, 71(2), 1952–1963.
Hayes, C. F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L. M., Dazeley, R., Heintz, F., et al. (2022). A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1), 1–59.
Huang, J. -H., Amjad, S., & Mishra, S. (2005). CenWits: A sensor-based loosely coupled search and rescue system using witnesses. In Sensys.
Hu, T., & Fei, Y. (2010). An adaptive and energy-efficient routing protocol based on machine learning for underwater delay tolerant networks. In 2010 IEEE international symposium on modeling, analysis and simulation of computer and telecommunication systems (pp. 381–384). IEEE.
Jain, S., Fall, K., & Patra, R. (2004). Routing in a delay tolerant network. In Proceedings of the SIGCOMM.
Jiang, L., Huang, J. -H., Kamthe, A., Liu, T., Freeman, I., Ledbetter, J., Mishra, S., Han, R., & Cerpa, A. (2009). SenSearch: GPS and witness assisted tracking for delay tolerant sensor networks. In Proceedings of the international conference on ad-hoc and wireless networks (ADHOC-NOW).
Jianmin, L., Qi, W., Chentao, H., & Yongjun, X. (2020). Ardeep: Adaptive and reliable routing protocol for mobile robotic networks with deep reinforcement learning. In IEEE 45th conference on local computer networks (LCN) (pp. 465–468).
Johnson, D. B., & Maltz, D. A. (1996). Dynamic source routing in ad hoc wireless networks. In Mobile computing (pp. 153–181). Alphen aan den Rijn: Kluwer Academic Publishers.
Johnston, M., Danilov, C., & Larson, K. (2018). A reinforcement learning approach to adaptive redundancy for routing in tactical networks. In IEEE military communications conference (MILCOM) (pp. 267–272).
Juang, P., Oki, H., Wang, Y., Martonosi, M., Peh, L., & Rubenstein, D. (2002). Energy-efficient computing for wildlife tracking: Design tradeoffs and early experiences with ZebraNet. In Proceedings of the international conference on architectural support for programming languages and operating systems.
Kaviani, S., Ryu, B., Ahmed, E., Larson, K., Le, A., Yahja, A., & Kim, J. H. (2021). Deepcq+: Robust and scalable routing with multi-agent deep reinforcement learning for highly dynamic networks. In IEEE military communications conference (MILCOM) (pp. 31–36).
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the international conference on representation learning (ICLR).
Koller, D., Friedman, N., Džeroski, S., Sutton, C., McCallum, A., Pfeffer, A., Abbeel, P., Wong, M.-F., Meek, C., Neville, J., et al. (2007). Introduction to statistical relational learning. Cambridge, MA: MIT Press.
Kumar, S., & Miikkulainen, R. (1998). Confidence-based Q-routing: an on-line adaptive network routing algorithm. In Proc. of Artificial Neural Networks in Engineering.
Lakkakorpi, J., Pitkänen, M., & Ott, J. (2010). Adaptive routing in mobile opportunistic networks. In Proceedings of the 13th ACM international conference on modeling, analysis, and simulation of wireless and mobile systems (pp. 101–109).
Le Boudec, J. -Y., & Vojnovic, M. (2005). Perfect simulation and stationarity of a class of mobility models. In Proceedings IEEE 24th annual joint conference of the IEEE computer and communications societies (Vol. 4, pp. 2743–2754).
Liang, B., & Haas, Z. J. (2003). Predictive distance-based mobility management for multi-dimensional pcs networks. In IEEE/ACM transactions on networking (Vol. 11).
Liu, Y., Ding, J., Zhang, Z.-L., & Liu, X. (2021). CLARA: A constrained reinforcement learning based resource allocation framework for network slicing. In IEEE international conference on big data (Big Data)
Li, F., Song, X., Chen, H., Li, X., & Wang, Y. (2018). Hierarchical routing for vehicular ad hoc networks via reinforcement learning. IEEE Transactions on Vehicular Technology, 68(2), 1852–1865.
Lolai, A., Wang, X., Hawbani, A., Dharejo, F. A., Qureshi, T., Farooq, M. U., Mujahid, M., & Babar, A. H. (2022). Reinforcement learning based on routing with infrastructure nodes for data dissemination in vehicular networks (RRIN). Wireless Networks, 1–16.
Luo, L., Sheng, L., Yu, H., & Sun, G. (2021). Intersection-based v2x routing via reinforcement learning in vehicular ad hoc networks. IEEE Transactions on Intelligent Transportation Systems.
Manfredi, V., Crovella, M., & Kurose, J. (2011). Understanding stateful vs stateless communication strategies for ad hoc networks. In Proceedings of the 17th annual international conference on mobile computing and networking (pp. 313–324).
Manfredi, V., Wolfe, A., Wang, B., & Zhang, X. (2021). Relational deep reinforcement learning for routing in wireless networks. In Proceedings of the 22nd IEEE international symposium on a world of wireless, mobile and multimedia networks.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. L., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Mukhutdinov, D., Filchenkov, A., Shalyto, A., & Vyatkin, V. (2019). Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Generation Computer Systems, 94, 587–600.
Navidi, W., & Camp, T. (2004). Stationary distributions for the random waypoint mobility model. IEEE Transactions on Mobile Computing, 3(1), 99–108.
Navidi, W., Camp, T., & Bauer, N. (2004). Improving the accuracy of random waypoint simulations through steady-state initialization. In Proceedings of the 15th international conference on modeling and simulation (pp. 319–326).
Oliveira, L. F. P., Moreira, A. P., & Silva, M. F. (2021). Advances in forest robotics: A state-of-the-art survey. Robotics10(2).
Perkins, C. E., & Bhagwat, P. (1994). Highly dynamic destination-sequenced distance-vector routing for mobile computers. In ACM SIGCOMM.
Perkins, C., Belding-Royer, E., & Das, S. (2003). RFC3561: Ad hoc on-demand distance vector (AODV) routing. RFC Editor.
Poularakis, K., Qin, Q., Nahum, E. M., Rio, M., & Tassiulas, L. (2019). Flexible SDN control in tactical ad hoc networks. Ad Hoc Networks, 85, 71–80.
Qiu, X., Xu, L., Wang, P., Yang, Y., & Liao, Z. (2022). A data-driven packet routing algorithm for an unmanned aerial vehicle swarm: A multi-agent reinforcement learning approach. IEEE Wireless Communications Letters, 11(10), 2160–2164.
Raffelsberger, C., & Hellwagner, H. (2014). Combined mobile ad-hoc and delay/disruption-tolerant routing. In International conference on ad-hoc networks and wireless (pp. 1–14). Springer.
Ramanathan, R., Allan, R., Basu, P., Feinberg, J., Jakllari, G., Kawadia, V., Loos, S., Redi, J., Santivanez, C., & Freebersyser, J. (2010). Scalability of mobile ad hoc networks: Theory vs practice. In MILCOM.
Robinson, W. H., & Lauf, A. P. (2013). Resilient and efficient MANET aerial communications for search and rescue applications. In 2013 international conference on computing, networking and communications (ICNC) (pp. 845–849).
Rolla, V. G., & Curado, M. (2013). A reinforcement learning-based routing for delay tolerant networks. Engineering Applications of Artificial Intelligence, 26(10), 2243–2250.
Rovira-Sugranes, A., Afghah, F., Qu, J., & Razi, A. (2021). Fully-echoed q-routing with simulated annealing inference for flying adhoc networks. IEEE Transactions on Network Science and Engineering, 8(3), 2223–2234.
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80.
Schüler, C., Patchou, M., Sliwa, B., & Wietfeld, C. (2021). Robust machine learning-enabled routing for highly mobile vehicular networks with PARRoT in ns-3. In Proceedings of the workshop on ns-3 (pp. 88–94).
Schüler, C., Sliwa, B., &Wietfeld, C. (2021). Towards machine learning-enabled context adaption for reliable aerial mesh routing. In 2021 IEEE 94th vehicular technology conference (VTC2021-Fall) (pp. 1–7).
Seetharam, A., Heimlicher, S., Kurose, J., & Wei, W. (2015). Routing with adaptive flooding in heterogeneous mobile networks. In 2015 7th international conference on communication systems and networks (COMSNETS) (pp. 1–8). IEEE.
Sharma, D. K., Rodrigues, J. J., Vashishth, V., Khanna, A., & Chhabra, A. (2020). RLProph: A dynamic programming based reinforcement learning approach for optimal routing in opportunistic IoT networks. Wireless Networks, 26(6), 4319–4338.
Sliwa, B., Schüler, C., Patchou, M., & Wietfeld, C. (2021). Parrot: Predictive ad-hoc routing fueled by reinforcement learning and trajectory knowledge. In 2021 IEEE 93rd vehicular technology conference (VTC2021-Spring) (pp. 1–7).
Sommer, C., & Dressler, F. (2014). Vehicular networking. Cambridge: Cambridge University Press.
Spyropoulos, T., Psounis, K., & Raghavendra, C. S. (2004). Single-copy routing in intermittently connected mobile networks. In 2004 first annual IEEE communications society conference on sensor and ad hoc communications and networks, 2004. IEEE SECON 2004 (pp. 235–244).
Spyropoulos, T., Psounis, K., & Raghavendra, C. S. (2008). Efficient routing in intermittently connected mobile networks: The multiple-copy case. IEEE/ACM Transactions on Networking, 16(1), 77–90.
Struyf, J., & Blockeel, H. (2010). Relational learning.
Suarez-Varela, J., Mestres, A., Yu, J., Kuang, L., Feng, H., Barlet-Ros, P., & Cabellos-Aparicio, A. (2019). Feature engineering for deep reinforcement learning based routing. In Proceedings of the IEEE international conference on communications (ICC) (pp. 1–6).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT Press.
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Tassiulas, L., & Ephremides, A. (1990). Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. In IEEE conference on decision and control.
Tie, X., Venkataramani, A., & Balasubramanian, A. (2011). R3: Robust replication routing in wireless networks with diverse connectivity characteristics. In Proceedings of the 17th annual international conference on mobile computing and networking (pp. 181–192).
Toh, C. K. (2001). Ad hoc mobile wireless networks: Protocols and systems. Saddle River, NJ: Prentice Hall.
Vahdat, A., & Becker, D. (2000). Epidemic routing for partially connected ad hoc networks. Duke Univ., Durham, NC, Tech. Report, CS-200006.
Valadarsky, A., Schapira, M., Shahaf, D., & Tamar, A. (2017). Learning to route with deep RL. In NIPS deep reinforcement learning symposium.
Van Moffaert, K., & Nowé, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1), 3483–3512.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
Xu, Z., Tang, J., Meng, J., Zhang, W., Wang, Y., Liu, C. H., & Yang, D. (2018). Experience-driven networking: A deep reinforcement learning based approach. In IEEE INFOCOM (pp. 1871–1879).
Xu, Y., Zhao, Z., Cheng, P., Chen, Z., Ding, M., Vucetic, B., & Li, Y. (2021). Constrained reinforcement learning for resource allocation in network slicing. IEEE Communications Letters, 25(5)
Yang, C., & Stoleru, R. (2016). Hybrid routing in wireless networks with diverse connectivity. In Proceedings of the 17th ACM international symposium on mobile ad hoc networking and computing (pp. 71–80).
Ye, D., Zhang, M., & Yang, Y. (2015). A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5), 10026–10047.
You, X., Li, X., Xu, Y., Feng, H., & Zhao, J. (2019). Toward packet routing with fully-distributed multi-agent deep reinforcement learning. In 2019 international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOPT) (pp. 1–8).
You, X., Li, X., Xu, Y., Feng, H., Zhao, J., & Yan, H. (2020). Toward packet routing with fully distributed multiagent deep reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems.
Zhang, P., Sadler, C. M., Lyon, S. A., & Martonosi, M. (2004). Hardware design experiences in ZebraNet. In Proceedings of the ACM SenSys.
Acknowledgements
The authors thank the reviewers for their helpful and insightful comments.
Funding
This material is based upon work supported by the National Science Foundation (NSF) under award #2154190 and award #2154191. Results presented in this paper were obtained in part using CloudBank, which is partially supported by the NSF under award #2154190. The authors also acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing HPC and consultation resources that have contributed to the research results reported within this paper.
Author information
Authors and Affiliations
Contributions
Victoria Manfredi: Conceptualization; Methodology; Formal analysis and investigation; Software; Writing - original draft preparation; Writing - review and editing; Funding acquisition. Alicia P. Wolfe: Conceptualization; Methodology; Formal analysis and investigation; Software; Writing - review and editing; Funding acquisition. Xiaolan Zhang: Conceptualization; Methodology; Formal analysis and investigation; Software; Writing - review and editing. Bing Wang: Conceptualization; Methodology; Formal analysis and investigation; Writing - review and editing; Funding acquisition.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Editors: Emma Brunskill, Minmin Chen, Omer Gottesman, Lihong Li, Yuxi Li, Yao Liu, Zonging Lu, Niranjani Prasad, Zhiwei Qin, Csaba Szepesvari, Matthew Taylor.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Manfredi, V., Wolfe, A.P., Zhang, X. et al. Learning an adaptive forwarding strategy for mobile wireless networks: resource usage vs. latency. Mach Learn 113, 7157–7193 (2024). https://doi.org/10.1007/s10994-024-06601-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-024-06601-3