Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning

Haojie Chen¹³,
Jiangting Fan¹³,
Yong Liu¹³ &
…
Xingfeng Lv¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13110))

Included in the following conference series:

International Conference on Neural Information Processing

1792 Accesses

Abstract

In recent years, learning effective algorithms for combination optimization problems based on reinforcement learning has become a popular topic in artificial intelligence. In this paper, we propose a model Dy-Drl2Op that combines the multi-head attention mechanism with hierarchical reinforcement learning to solve the traveling salesman problem on a dynamic graph. In the Dy-Drl2Op model, we design a policy network to process the feature vector of each node, with the help of distributed reinforcement learning algorithm to quickly predict the probobility of each node being selected at the next step. Dy-Drl2Op also utilizes the beam search algorithm to explore the optimal solution in the whole solution space. The trained model will generate the action decision sequence that meets the specific target reward function in real time. Dy-Drl2Op is evaluated on a series of dynamic traveling salesman problems. The experimental results show that Dy-Drl2Op is better than several baseline models in terms of solution quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Graph Sequence Reinforcement Learning for Traveling Salesman Problem

Deep reinforcement learning combined with transformer to solve the traveling salesman problem

Article 12 November 2024

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

References

Kool, W., Van Hoof, H., Welling, M.: Attention, Learn to Solve Routing Problems! (2018)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems (2009)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning. Bradford Book 15(7), 665–685 (1998)
Google Scholar
White, C.C., et al.: Markov decision processe. Eur. J. Oper. Res. (1989)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Hochba, D.S.: Approximation algorithms for NP-hard problems. ACM SIGACT News 28(2), 40–52 (1997)
Article Google Scholar
Jordan, M.I., Rumelhart, D.E.: Forward models: supervised learning with a distal teacher. Cogn. Sci. 16(3), 307–354 (2010)
Article Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Colorni, A., Dorigo, M., Maffioli, F., et al.: Heuristics from nature for hard combinatorial optimization problems. Int. Trans. Oper. Res. 3(1), 1–21 (2010)
Article Google Scholar
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, pp. 6351–6361 (2017)
Google Scholar
Liang, E., Liaw, R., Moritz, P., et al.: RLlib: abstractions for distributed reinforcement learning (2017)
Google Scholar
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)
Google Scholar
Watkins, C., Dayan, P.: Technical note: q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article MATH Google Scholar
Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., et al.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation (2016)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent Neural Network Regularization (2014). Eprint
Google Scholar
Edwards, M., Xie, X.: Graph convolutional neural network. In: British Machine Vision Conference (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Garg, S., Peitz, S., Nallasamy, U., et al.: Jointly learning to align and translate with transformer models (2019)
Google Scholar
Wiseman, S., Rush, A.M.: Sequence-to-sequence learning as beam-search optimization. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Google Scholar
Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, pp. 9860–9870 (2018)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN (2016)
Google Scholar
Nassar, K.: Transformer-based language modeling and decoding for conversational speech recognition (2020)
Google Scholar
Barrett, T., Clements, W., Foerster, J., et al.: Exploratory combinatorial optimization with reinforcement learning. Proc. AAAI Conf. Artif. Intell. 34(4), 3243–3250 (2020)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)
Google Scholar
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Technical report (2017)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 61972135), the Natural Science Foundation of Heilongjiang Province in China (No. LH2020F043), the Innovation Talents Project of Science and Technology Bureau of Harbin in China (No. 2017RAQXJ094), and the Postgraduate Innovative Scientific Research Project of Heilongjiang University in China (No.YJSCX2021-197HLJU).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Heilongjiang University, Harbin, China
Haojie Chen, Jiangting Fan, Yong Liu & Xingfeng Lv

Authors

Haojie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiangting Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xingfeng Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yong Liu or Xingfeng Lv .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Fan, J., Liu, Y., Lv, X. (2021). Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13110. Springer, Cham. https://doi.org/10.1007/978-3-030-92238-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-92238-2_18
Published: 05 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92237-5
Online ISBN: 978-3-030-92238-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Graph Sequence Reinforcement Learning for Traveling Salesman Problem

Deep reinforcement learning combined with transformer to solve the traveling salesman problem

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Graph Sequence Reinforcement Learning for Traveling Salesman Problem

Deep reinforcement learning combined with transformer to solve the traveling salesman problem

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation