Abstract
In recent years, learning effective algorithms for combination optimization problems based on reinforcement learning has become a popular topic in artificial intelligence. In this paper, we propose a model Dy-Drl2Op that combines the multi-head attention mechanism with hierarchical reinforcement learning to solve the traveling salesman problem on a dynamic graph. In the Dy-Drl2Op model, we design a policy network to process the feature vector of each node, with the help of distributed reinforcement learning algorithm to quickly predict the probobility of each node being selected at the next step. Dy-Drl2Op also utilizes the beam search algorithm to explore the optimal solution in the whole solution space. The trained model will generate the action decision sequence that meets the specific target reward function in real time. Dy-Drl2Op is evaluated on a series of dynamic traveling salesman problems. The experimental results show that Dy-Drl2Op is better than several baseline models in terms of solution quality and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kool, W., Van Hoof, H., Welling, M.: Attention, Learn to Solve Routing Problems! (2018)
Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems (2009)
Sutton, R.S., Barto, A.G.: Reinforcement learning. Bradford Book 15(7), 665–685 (1998)
White, C.C., et al.: Markov decision processe. Eur. J. Oper. Res. (1989)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Hochba, D.S.: Approximation algorithms for NP-hard problems. ACM SIGACT News 28(2), 40–52 (1997)
Jordan, M.I., Rumelhart, D.E.: Forward models: supervised learning with a distal teacher. Cogn. Sci. 16(3), 307–354 (2010)
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)
Colorni, A., Dorigo, M., Maffioli, F., et al.: Heuristics from nature for hard combinatorial optimization problems. Int. Trans. Oper. Res. 3(1), 1–21 (2010)
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, pp. 6351–6361 (2017)
Liang, E., Liaw, R., Moritz, P., et al.: RLlib: abstractions for distributed reinforcement learning (2017)
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)
Watkins, C., Dayan, P.: Technical note: q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., et al.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation (2016)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent Neural Network Regularization (2014). Eprint
Edwards, M., Xie, X.: Graph convolutional neural network. In: British Machine Vision Conference (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)
Garg, S., Peitz, S., Nallasamy, U., et al.: Jointly learning to align and translate with transformer models (2019)
Wiseman, S., Rush, A.M.: Sequence-to-sequence learning as beam-search optimization. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, pp. 9860–9870 (2018)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015)
Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN (2016)
Nassar, K.: Transformer-based language modeling and decoding for conversational speech recognition (2020)
Barrett, T., Clements, W., Foerster, J., et al.: Exploratory combinatorial optimization with reinforcement learning. Proc. AAAI Conf. Artif. Intell. 34(4), 3243–3250 (2020)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Technical report (2017)
Acknowledgment
This work was supported by the National Natural Science Foundation of China (No. 61972135), the Natural Science Foundation of Heilongjiang Province in China (No. LH2020F043), the Innovation Talents Project of Science and Technology Bureau of Harbin in China (No. 2017RAQXJ094), and the Postgraduate Innovative Scientific Research Project of Heilongjiang University in China (No.YJSCX2021-197HLJU).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, H., Fan, J., Liu, Y., Lv, X. (2021). Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13110. Springer, Cham. https://doi.org/10.1007/978-3-030-92238-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-92238-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92237-5
Online ISBN: 978-3-030-92238-2
eBook Packages: Computer ScienceComputer Science (R0)