Nothing Special   »   [go: up one dir, main page]

Skip to main content

Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13110))

Included in the following conference series:

  • 1792 Accesses

Abstract

In recent years, learning effective algorithms for combination optimization problems based on reinforcement learning has become a popular topic in artificial intelligence. In this paper, we propose a model Dy-Drl2Op that combines the multi-head attention mechanism with hierarchical reinforcement learning to solve the traveling salesman problem on a dynamic graph. In the Dy-Drl2Op model, we design a policy network to process the feature vector of each node, with the help of distributed reinforcement learning algorithm to quickly predict the probobility of each node being selected at the next step. Dy-Drl2Op also utilizes the beam search algorithm to explore the optimal solution in the whole solution space. The trained model will generate the action decision sequence that meets the specific target reward function in real time. Dy-Drl2Op is evaluated on a series of dynamic traveling salesman problems. The experimental results show that Dy-Drl2Op is better than several baseline models in terms of solution quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kool, W., Van Hoof, H., Welling, M.: Attention, Learn to Solve Routing Problems! (2018)

    Google Scholar 

  2. Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems (2009)

    Google Scholar 

  3. Sutton, R.S., Barto, A.G.: Reinforcement learning. Bradford Book 15(7), 665–685 (1998)

    Google Scholar 

  4. White, C.C., et al.: Markov decision processe. Eur. J. Oper. Res. (1989)

    Google Scholar 

  5. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  6. Hochba, D.S.: Approximation algorithms for NP-hard problems. ACM SIGACT News 28(2), 40–52 (1997)

    Article  Google Scholar 

  7. Jordan, M.I., Rumelhart, D.E.: Forward models: supervised learning with a distal teacher. Cogn. Sci. 16(3), 307–354 (2010)

    Article  Google Scholar 

  8. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  9. Colorni, A., Dorigo, M., Maffioli, F., et al.: Heuristics from nature for hard combinatorial optimization problems. Int. Trans. Oper. Res. 3(1), 1–21 (2010)

    Article  Google Scholar 

  10. Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, pp. 6351–6361 (2017)

    Google Scholar 

  11. Liang, E., Liaw, R., Moritz, P., et al.: RLlib: abstractions for distributed reinforcement learning (2017)

    Google Scholar 

  12. Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)

    Google Scholar 

  13. Watkins, C., Dayan, P.: Technical note: q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  MATH  Google Scholar 

  14. Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., et al.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation (2016)

    Google Scholar 

  15. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent Neural Network Regularization (2014). Eprint

    Google Scholar 

  16. Edwards, M., Xie, X.: Graph convolutional neural network. In: British Machine Vision Conference (2016)

    Google Scholar 

  17. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  18. Garg, S., Peitz, S., Nallasamy, U., et al.: Jointly learning to align and translate with transformer models (2019)

    Google Scholar 

  19. Wiseman, S., Rush, A.M.: Sequence-to-sequence learning as beam-search optimization. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)

    Google Scholar 

  20. Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, pp. 9860–9870 (2018)

    Google Scholar 

  21. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)

    Google Scholar 

  22. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015)

    Google Scholar 

  23. Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN (2016)

    Google Scholar 

  24. Nassar, K.: Transformer-based language modeling and decoding for conversational speech recognition (2020)

    Google Scholar 

  25. Barrett, T., Clements, W., Foerster, J., et al.: Exploratory combinatorial optimization with reinforcement learning. Proc. AAAI Conf. Artif. Intell. 34(4), 3243–3250 (2020)

    Google Scholar 

  26. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)

    Google Scholar 

  27. Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Technical report (2017)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 61972135), the Natural Science Foundation of Heilongjiang Province in China (No. LH2020F043), the Innovation Talents Project of Science and Technology Bureau of Harbin in China (No. 2017RAQXJ094), and the Postgraduate Innovative Scientific Research Project of Heilongjiang University in China (No.YJSCX2021-197HLJU).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yong Liu or Xingfeng Lv .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, H., Fan, J., Liu, Y., Lv, X. (2021). Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13110. Springer, Cham. https://doi.org/10.1007/978-3-030-92238-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92238-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92237-5

  • Online ISBN: 978-3-030-92238-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics