Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art
<p>Overview of key planning problems and areas in bimodal freight transport (Source: Own diagram).</p> "> Figure 2
<p>A prototypical illustration of the NCO framework, including the models typically applied in each NCO design block (Source: Own diagram).</p> "> Figure 3
<p>Three agent setups for solving integrated sequential decision problems with Multi-Agent Reinforcement Learning (MARL). Single agent with centralised learning (<b>left</b>), Multi-Agent with decentralised learning (<b>centre</b>, (<b>A</b>)) and centralised (<b>centre</b>, (<b>B</b>)), and Hierarchical Multi-Agent (HRL) (<b>right</b>). (Source: Own diagram).</p> "> Figure 4
<p>Illustration of the three NCO implementation strategies (Source: Own diagram).</p> "> Figure 5
<p>Fifty-nine reviewed publications between 2018 and 2023 related to the six optimisation classes defined in <a href="#sec2dot2-applsci-14-08642" class="html-sec">Section 2.2</a>: Network Optimisation [<a href="#B88-applsci-14-08642" class="html-bibr">88</a>,<a href="#B90-applsci-14-08642" class="html-bibr">90</a>]; Resource Assignment/Scheduling/Dispatching [<a href="#B13-applsci-14-08642" class="html-bibr">13</a>,<a href="#B68-applsci-14-08642" class="html-bibr">68</a>,<a href="#B69-applsci-14-08642" class="html-bibr">69</a>,<a href="#B71-applsci-14-08642" class="html-bibr">71</a>,<a href="#B79-applsci-14-08642" class="html-bibr">79</a>,<a href="#B82-applsci-14-08642" class="html-bibr">82</a>,<a href="#B87-applsci-14-08642" class="html-bibr">87</a>,<a href="#B89-applsci-14-08642" class="html-bibr">89</a>,<a href="#B91-applsci-14-08642" class="html-bibr">91</a>,<a href="#B92-applsci-14-08642" class="html-bibr">92</a>,<a href="#B93-applsci-14-08642" class="html-bibr">93</a>,<a href="#B94-applsci-14-08642" class="html-bibr">94</a>,<a href="#B119-applsci-14-08642" class="html-bibr">119</a>,<a href="#B120-applsci-14-08642" class="html-bibr">120</a>,<a href="#B121-applsci-14-08642" class="html-bibr">121</a>,<a href="#B122-applsci-14-08642" class="html-bibr">122</a>]; Resource Allocation/Balancing/Fleet Composition [<a href="#B77-applsci-14-08642" class="html-bibr">77</a>,<a href="#B97-applsci-14-08642" class="html-bibr">97</a>,<a href="#B98-applsci-14-08642" class="html-bibr">98</a>,<a href="#B99-applsci-14-08642" class="html-bibr">99</a>,<a href="#B123-applsci-14-08642" class="html-bibr">123</a>]; Batching and Sequencing [<a href="#B83-applsci-14-08642" class="html-bibr">83</a>,<a href="#B100-applsci-14-08642" class="html-bibr">100</a>,<a href="#B101-applsci-14-08642" class="html-bibr">101</a>]; Vehicle Routing Problem [<a href="#B55-applsci-14-08642" class="html-bibr">55</a>,<a href="#B56-applsci-14-08642" class="html-bibr">56</a>,<a href="#B58-applsci-14-08642" class="html-bibr">58</a>,<a href="#B59-applsci-14-08642" class="html-bibr">59</a>,<a href="#B80-applsci-14-08642" class="html-bibr">80</a>,<a href="#B102-applsci-14-08642" class="html-bibr">102</a>,<a href="#B104-applsci-14-08642" class="html-bibr">104</a>,<a href="#B105-applsci-14-08642" class="html-bibr">105</a>,<a href="#B106-applsci-14-08642" class="html-bibr">106</a>,<a href="#B107-applsci-14-08642" class="html-bibr">107</a>,<a href="#B108-applsci-14-08642" class="html-bibr">108</a>,<a href="#B109-applsci-14-08642" class="html-bibr">109</a>,<a href="#B110-applsci-14-08642" class="html-bibr">110</a>,<a href="#B111-applsci-14-08642" class="html-bibr">111</a>,<a href="#B112-applsci-14-08642" class="html-bibr">112</a>,<a href="#B113-applsci-14-08642" class="html-bibr">113</a>,<a href="#B118-applsci-14-08642" class="html-bibr">118</a>,<a href="#B124-applsci-14-08642" class="html-bibr">124</a>,<a href="#B125-applsci-14-08642" class="html-bibr">125</a>,<a href="#B126-applsci-14-08642" class="html-bibr">126</a>,<a href="#B127-applsci-14-08642" class="html-bibr">127</a>,<a href="#B128-applsci-14-08642" class="html-bibr">128</a>,<a href="#B129-applsci-14-08642" class="html-bibr">129</a>,<a href="#B130-applsci-14-08642" class="html-bibr">130</a>,<a href="#B131-applsci-14-08642" class="html-bibr">131</a>,<a href="#B132-applsci-14-08642" class="html-bibr">132</a>]; Prediction and Forecasting [<a href="#B23-applsci-14-08642" class="html-bibr">23</a>,<a href="#B114-applsci-14-08642" class="html-bibr">114</a>,<a href="#B115-applsci-14-08642" class="html-bibr">115</a>,<a href="#B116-applsci-14-08642" class="html-bibr">116</a>,<a href="#B117-applsci-14-08642" class="html-bibr">117</a>,<a href="#B133-applsci-14-08642" class="html-bibr">133</a>,<a href="#B134-applsci-14-08642" class="html-bibr">134</a>]. The reviewed studies were classified according to the learning algorithms, applied model architectures, agent environment, and simulator setup strategy (Source: Own diagram).</p> ">
Abstract
:Featured Application
Abstract
1. Introduction
- IFT is a complex, integrated system involving multiple decision-makers with different objectives that need to be optimised in parallel. What are the key operational, tactical, and strategic planning problems in IFT and how can they be mapped to mathematical problem formulations commonly used from an OR perspective? How can these problems be categorised in a structured and systematic manner?
- Learning-based algorithms for combinatorial optimisation problems represent a very promising area for seamless and fast decision-making under uncertainties—a crucial factor for enhancing the overall IFT performance. Over the past five years, research in the area of computer science has converged on methods referred to as Neural Combinatorial Optimisation (NCO), which apply Transformer architecture [10] to combinatorial problems. However, a unified and structured overview of key methods, strategies, and framework setups is lacking in the current State of the Art. This raises the question: what are the key methodological components, algorithm setups and solution strategies in NCO and how can they be structured and abstracted to assist future research in building decision-making tools for integrated transport planning problems?
- The current State of the Art in NCO is highly heterogeneous and primarily involves basic algorithm research with atomic applications to classical OR problems or representative use-cases. To derive insights for more general applications, the question arises: how can the current NCO research be systematised, and what insights can be deduced to support the development of more integrated, generalised NCO frameworks in the future?
2. Transport Planning Problems in IFT
2.1. Intermodal Transport Operations
2.2. Optimisation Problems in Transport Planning
3. Methodological Preliminaries
3.1. DNN Model
3.2. RL Training Algorithm
3.3. Agents’ Setup
3.4. Simulator Setup Strategies
4. Overview of the State of the Art in the Field of NCO
5. Discussion
Future Perspectives on NCO
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Optimisation Problem Class | Transport Problems | Related Problems in OR and Computer Science | Solution Techniques |
---|---|---|---|
Network Optimisation | Network/infrastructure planning, facility location planning, strategical network flow planning, long-term rail-corridors and terminals, assignment of delivery zones or customers to facilities. | Linear Assignment Problem (LAP) and Quadratic Assignment Problem (QAP), Facility Location Problem (FLP), Location Routing Problem (LRP), Hub-Location, Arc-Routing Problem, Clustering, Hub location Problem, Covering, Centre and Median Problem. | Graph optimisation, MIP/MILP, heuristics and metaheuristics, econometric models, aggregated transport models, land-use models, Multi Criteria Decision Analysis (MDCA), analytical methods, Geographic Information System (GIS). |
Resource Assignment/Scheduling/Dispatching | Capacity planning, mode selection, train routing, occupation time scheduling, timetabling (offline) and train dispatching (online), locomotives/vehicles/containers assignment, drayage scheduling and assignment of pick-up and delivery points. | Linear Assignment Problem (LAP), Dynamic Quadratic Assignment Problems (DQAP), Job-Shop Scheduling and Flexible Job-Shop Problems, Resource Assignment, Scheduling and Allocation Problems, Shipping Point Assignment, Network Flow Problem. | Time-space network, graph optimisation, analytical methods, MIP/MILP, heuristics and metaheuristics, approximate dynamic programming, discrete event simulation, simple dispatching rules (e.g., First in First Out), behavioural models, Multi-Agent Systems (MAS). |
Resource Allocation/Redistribution and Fleet Composition | Empty/idle vehicle/container repositioning, vehicle-redistribution, vehicle fleet composition at multiple locations, vehicle fleet management | Linear Assignment Problem (LAP), Vehicle Fleet Optimisation, Vehicle Repositioning, Vehicle Fleet Composition. | Discrete event simulation, MIP/MILP, graph optimisation, heuristics and metaheuristics. |
Batching and Sequencing | Consolidation/shunting/sorting or container/wagons, sorting and batching of orders in terminals, order consolidation, picker routing. | Fixed Time Window Batching (FTWB) and Variable Time Window Batching (VTWB), Bin Packing Problem, (container) Reordering, Resorting, Pre-marshalling Problem, Block(s) Relocation Problem, Load Planning Problem. | Simple sequencing and batching rules, heuristics and metaheuristics, discrete event simulations, MIP/MILP. |
Vehicle Routing Problem | Scheduling, dispatching of vehicles and vehicle routing. | Shortest Path Problem, TSP, VRP, Heterogeneous Fleet VRP (HF-VRP), VRP with Time Windows, Multi-Depot VRP (VRP with TW), VRP with Pickup and Delivery (VRPPD), Capacitated VRP (CVRP), Two-echelon VRP. | Graph optimisation, MIP/MILP, heuristics and metaheuristics, behavioural models, MAS. |
Prediction and Forecasting | Prediction of delays, demand, events, vehicles and resource occurrences. | Delay prediction and propagation, demand forecasting, traffic flow forecasting. | Analytical methods, classical and Bayesian statistics methods, regression, time-series analysis, ML and DL, behavioural models, simulation, System Dynamics, MAS. |
Appendix B
Studies | Problem | Learning | Models | ma * | Solution Strategy |
---|---|---|---|---|---|
Network Optimisation | |||||
Kerkkamp [88] | Network planning | DQL | GCN | End-to-end | |
Zhu et al. [90] | Network planning | Actor–Critic | GCN | Hybrid end-to-end, where the DRL first prunes the search space and then the heuristics improve the solution | |
Resource Assignment/Scheduling/Dispatching | |||||
Agasucci et al. [13] | Train dispatching | Q-Learning | NN | End-to-end. Decentralised and Centralised Deep Q-Learning | |
Obara et al. [68] | Train dispatching | Q-Learning | MLP | End-to-end. Convert train schedule to graph and apply DQL. | |
Khadilkar et al. [69] | Train scheduling | Q-Learning | Tabular | End-to-end. Tabular Q-Learning with simulator | |
Adi et al. [71] | Trucks assignment and dispatching | Q-Learning | Double DQN | End-to-end. Centralised Learning for truck routing/dispatching in intermodal terminals | |
Chen et al. [79] | Dispatching and package matching | QMIX | DQN | X | QMIX (Multi-agent Q-Learning with centralised training and decentralised execution) |
Chen et al. [82] | Dispatching (couriers are assigned to tasks) | PPO | MLP | X | End-to-end MARL with decentralised value function for each vehicle and common policy network with parameter sharing + parameter sharing |
Ni et al. [87] | Generic scheduling problem | RL | GNN + GCN with attention pooling | Learn-to-improve with reward shaping | |
Popescu [89] | Train dispatching | cross-entropy | NN | RL-agent executes action in a simulator | |
Song et al. [91] | Generic scheduling problem | PPO | GNN + GAN | End-to-end | |
Ren et al. [92] | Generic scheduling problem | A3C with baseline | Encoder-decoder with Attention, | End-to-end | |
Han and Yang [93] | Generic scheduling problem | Model-free DRL | Encoder-decoder with RNN | End-to-end | |
Oren et al. [94] | Generic scheduling problem | Q-Learning | GNN | End-to-end. DQL for online and offline dispatching problem | |
Iklassov and Medvedev [119] | Generic assignment | Actor–Critic | GNN, GCN | End-to-end | |
Porras-Valenzuela [120] | Network flow problem | Q-Learning | GCN | End-to-end. Single agent assigns flow links to warehouses under uncertainties | |
Zhang et al. [121] | Synchromodal re-planning | Q-Learning | DQN | Learn-to-improve | |
Zou et al. [122] | Dispatching/assignment | Q-Learning | Double DQN | End-to-end with simulator (SUMO) as environment and single centralised agent | |
Resource Allocation/Balancing/Fleet Composition | |||||
Li et al. [77] | Resource balancing | DQL | MLP for each agent | X | End-to-end. Cooperative Multi-Agent RL (MARL) with shared state and rewards |
Ahn and Park [97] | Assignment and rebalancing | Factorised Actor–Critic | GCN | X | End-to-end. Decentralised policy for each zone with parameters sharing |
Pan et al. [98] | Vehicle repositioning | Actor–Critic | Q-Network for each agent | X | Hierarchical RL (HRL) where each sub-agent adopts LSTM |
Xi et al. [99] | Vehicle repositioning | PPO | DQN and LSTM | X | 3-Level HRL. DQN for workers and LSTM for prediction. |
Zhang et al. [123] | Vehicle fleet reposition | Q-Learning | NN and tabular Q | X | End-to-end |
Batching and Sequencing | |||||
Beeks et al. [83] | Order batching and sequence problem | PPO | DNN | End-to-end with reward shaping for two-objective problems | |
Cals et al. [100] | Batching and sequencing orders | PPO | DNN | Hybrid-method. Heuristics to take sequencing decisions and DRL to take batching decisions. | |
Hottung et al. [101] | Reordering of containers | Supervised | DNN | Learn-to-improve. Heuristics utilise RL for improvement | |
Vehicle Routing Problem | |||||
Kool et al. [55] | TSP | REINFORCE | Encoder-decoder with GNN and attention | End-to-end. Prediction of next nodes using GAN Attention Network, masking and RL-baseline. | |
Deudon et al. [56] | TSP | REINFORCE | Encoder-decoder with self-attention and Pointer Network | End-to-end | |
Li et al. [58] | CVRP with heterogeneous vehicle fleet | policy DRL with baseline. | Encoder-decoder Multi-Head Attention | End-to-end | |
Ren et al. [59] | VRP with TW | Policy gradient method | Encoder-decoder | X | Centralised training with shared parameters and observation |
Vera and Abad [80] | CMVRP | Actor–Critic | Encoder-decoder | X | End-to-end MARL with centralised training and decentralised execution |
Nazari et al. [102] | VRP | Policy-based RL | Encoder-decoder with RNN and attention | End-to-end | |
Joshi et al. [104] | TSP | Supervised | GNN | End-to-end | |
Li et al. [105] | Pick-up and delivery VRP | policy DRL with baseline. | Encoder-decoder with self-attention | End-to-end | |
Falkner and Schmidt-Thieme [106] | CVRP-TW | REINFORCE | Encoder-decoder with attention | End-to-end. Extended encoder-decoder to embed rich problem context | |
Foa et al. [107] | Assignment and routing | modified PPO | CNN | X | End-to-end MARL with two actor-networks for node selection and assignment |
Li et al. [108] | Dynamic pick-up and delivery | Double Q-Learning | Graph spatial–temporal attention network | End-to-end. Predict demand and formulate environment as a graph. Attention net is used to calculate the relationships between vehicle and all the neighbouring vehicles | |
Ma et al. [109] | Dynamic pick-up and delivery | Q-Learning (upper policy) and REINFORCE (lower policy) | Upper-level agent: MLP, Lower level agent: GNN | X | Learn-to-configure. Multi-Agent HRL for assigning orders and delivery |
Wu et al. [110] | TSP/CVRP | Actor–Critic | Encoder-decoder | Learn-to-improve. Heuristics utilise RL for improvement | |
Da Costa et al. [111] | TSP | Actor–Critic | Encoder-decoder with GCN and LSTM | Learn-to-improve. Heuristics utilise RL for improvement. | |
Hottung et al. [112] | CVRP | REINFORCE | Encoder-decoder with attention and FNN | Learn-to-improve. Heuristics utilise RL for improvement | |
Li et al. [113] | Large-scale CVRP | Supervised | Encoder-decoder with self-attention | Learn-to-improve. Clustering heuristics utilise ML for improvement | |
Zhao et al. [118] | VRP | Actor–Critic | Encoder-decoder with graph embedding and attention layer | Learn-to-configure. RL provides initial-solution to heuristic search | |
Lu et al. [124] | CVRP | REINFORCE | Encoder-decoder with self-attention | Learn-to-improve. Heuristics utilise RL for improvement | |
Kalakanti et al. [125] | VRP | Q-Learning | Clustering and tabular | Learn-to-improve. Two phase heuristics with Q-Learning: Clustering to approximate vehicles and tours and then apply Q-Learning to learn routes | |
James et al. [126] | Dynamic VRP | A3C | Encoder-decoder with graph embeddings and Pointer Network | End-to-end | |
Wang [127] | VRP | REINFORCE | Encoder-decoder with GNN and Graph Reasoning Network (GRN) | End-to-end. Decomposition and assembling of graphs with GNN and RL | |
Xing and Tu [128]. | TSP | Monte Carlo Tree Search | GNN | End-to-end | |
Xu et al. [129] | VRP | policy-based DRL | Encoder-decoder with graph attention and MHA | End-to-end | |
Zhang et al. [130] | VRP with TW | policy-based DRL | attention-based encoder-decoder | X | End-to-end. Enc.-Dec. MARL with centralised learning. |
Zhang et al. [131] | Dynamic TSP | policy gradient with rollout baseline | Encoder-decoder with MHA | End-to-end | |
Zong et al. [132] | Pick-up and delivery | Cooperative A2C | Encoder-decoder with MHA | X | End-to-end MARL. utilise cooperative multi-agent decoders to leverage the decision dependence among different vehicle agents |
Prediction and Forecasting | |||||
Li et al. [23] | Delay prediction | Supervised | GNN | End-to-end. GNN applying on temporal multigraph (dynamic railway network) for dynamic spatial–temporal modelling | |
Yu et al. [114] | Spatial–temporal forecasting | Supervised | GCN | GCN for extracting spatial features and Gated CNN for temporal features | |
Hassan et al. [115] | Forecasting freight movement | RL | Not specified | Learn-to-configure. RL adjust weights for forecast models. | |
Guo et al. [116] | Spatial–temporal forecasting | Supervised | Attention-based GCN | Dynamic spatial–temporal forecasting with GCN and attention | |
Zhao et al. [117] | Spatial–temporal forecasting | Supervised | Attention-based GCN | Dynamic spatial–temporal modelling with GCN and attentions | |
Anguita and Olariaga [133] | Demand forecasting | Supervised | CNN + LSTM | Hybrid LSTM + CNN for dynamic spatial–temporal modelling to extract spatial (CNN) and temporal (RNN) context | |
Heglund et al. [134] | Delay prediction | Supervised | GCN | End-to-end. Formulate graph with delays on nodes and edges |
References
- Reis, V. Analysis of mode choice variables in short-distance intermodal freight transport using an agent-based model. Transp. Res. Part A Policy Pract. 2014, 61, 100–120. [Google Scholar] [CrossRef]
- Barua, L.; Zou, B.; Zhou, Y. Machine learning for international freight transportation management: A comprehensive review. Res. Transp. Bus. Manag. 2020, 34, 100453. [Google Scholar] [CrossRef]
- Bešinović, N.; Goverde, R.M. Capacity assessment in railway networks. In Handbook of Optimization in the Railway Industry; Springer: Berlin/Heidelberg, Germany, 2018; pp. 25–45. [Google Scholar]
- SteadieSeifi, M.; Dellaert, N.P.; Nuijten, W.; VanWoensel, T.; Raoufi, R. Multimodal freight transportation planning: A literature review. Eur. J. Oper. Res. 2014, 233, 1–15. [Google Scholar] [CrossRef]
- Tang, Y.; Agrawal, S.; Faenza, Y. Reinforcement learning for integer programming: Learning to cut. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; PMLR: Birmingham, UK, 2020; pp. 9367–9376. [Google Scholar]
- Nair, V.; Bartunov, S.; Gimeno, F.; Von Glehn, I.; Lichocki, P.; Zwols, Y. Solving mixed integer programs using neural networks. arXiv 2020, arXiv:2012.13349. [Google Scholar]
- Kotary, J.; Fioretto, F.; Van Hentenryck, P.; Wilder, B. End-to-end constrained optimization learning: A survey. arXiv 2021, arXiv:2103.16378. [Google Scholar]
- Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
- Karimi-Mamaghan, M.; Mohammadi, M.; Meyer, P.; Karimi-Mamaghan, A.M.; Talbi, E.G. Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Polosukhin, I. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://dl.acm.org/doi/10.5555/3295222.3295349 (accessed on 20 September 2024).
- Agamez-Arias, A.D.M.; Moyano-Fuentes, J. Intermodal transport in freight distribution: A literature review. Transp. Rev. 2017, 37, 782–807. [Google Scholar] [CrossRef]
- Rail-Roadmap. Level Playing Field in the Transport Sector. 2021. Available online: https://www.railroadmap2030.be/wp-content/uploads/2021/09/BRFF-Level-playing-field-in-the-transport-sector.pdf (accessed on 1 June 2023).
- Agasucci, V.; Grani, G.; Lamorgese, L. Solving the single-track train scheduling problem via Deep Reinforcement Learning. arXiv 2020, arXiv:2009.00433. [Google Scholar]
- Escudero, A.; Muñuzuri, J.; Guadix, J.; Arango, C. Dynamic approach to solve the daily drayage problem with transit time uncertainty. Comput. Ind. 2013, 64, 165–175. [Google Scholar] [CrossRef]
- Bektas, T.; Crainic, T. A Brief Overview of Intermodal Transportation; Cirrelt: Montreal, CA, USA, 2007; Volume 3, pp. 1–25. [Google Scholar]
- Giusti, R.; Manerba, D.; Bruno, G.; Tadei, R. Synchromodal logistics: An overview of critical success factors, enabling technologies, and open research issues. Transp. Res. Part E Logist. Transp. Rev. 2019, 129, 92–110. [Google Scholar] [CrossRef]
- Guihaire, V.; Hao, J.-K. Transit network design and scheduling: A global review. Transp. Res. Part A Policy Pract. 2008, 42, 1251–1273. [Google Scholar] [CrossRef]
- Stoilova, S.D.; Martinov, S.V. Selecting a location for establishing a rail-road intermodal terminal by using a hybrid SWOT/MCDM model. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; Volume 618, p. 012060. [Google Scholar]
- Wu, X.; Cao, L. Using heuristic MCMC method for terminal location planning in intermodal transportation. Int. J. Oper. Res. 2018, 32, 421–442. [Google Scholar] [CrossRef]
- Newman, A.M.; Yano, C.A. Centralized and decentralized train scheduling for intermodal operations. IIE Trans. 2000, 32, 743–754. [Google Scholar] [CrossRef]
- Behdani, B.; Fan, Y.; Wiegmans, B.; Zuidwijk, R. Multimodal schedule design for synchromodal freight transport systems. Eur. J. Transp. Infrastruct. Res. 2014, 16, 424–444. [Google Scholar] [CrossRef]
- Weik, N.; Bohlin, M.; Nießen, N. Long-Term Capacity Planning of Railway Infrastructure: A Stochastic Approach Capturing Infrastructure Unavailability. RWTH Aachen University. PhD Thesis No.RWTH-2020-06771. Lehrstuhl für Schienenbahnwesen und Verkehrswirtschaft und Verkehrswissenschaftliches Institut. 2020. Available online: https://publications.rwth-aachen.de/record/793271/files/793271.pdf (accessed on 20 September 2024).
- Li, Z.; Huang, P.; Wen, C.; Rodrigues, F. Railway Network Delay Evolution: A Heterogeneous Graph Neural Network Approach. arXiv 2023, arXiv:2303.15489. [Google Scholar]
- Mueller, J.P.; Elbert, R.; Emde, S. Integrating vehicle routing into intermodal service network design with stochastic transit times. EURO J. Transp. Logist. 2021, 10, 100046. [Google Scholar] [CrossRef]
- Mueller, J.P.; Elbert, R.; Emde, S. Intermodal service network design with stochastic demand and short-term schedule modifications. Comput. Ind. Eng. 2021, 159, 107514. [Google Scholar] [CrossRef]
- Jaržemskiene, I. The evolution of intermodal transport research and its development issues. Transport 2007, 22, 296–306. [Google Scholar] [CrossRef]
- Baykasoğlu, A.; Subulan, K.; Serdar Taşan, A.; Ülker, Ö. Development of a Web-Based Decision Support System for Strategic and Tactical Sustainable Fleet Management Problems in Intermodal Transportation Networks. In Lean and Green Supply Chain Management; Springer: Berlin/Heidelberg, Germany, 2018; pp. 189–230. [Google Scholar]
- Arabani, A.B.; Farahani, R.Z. Facility location dynamics: An overview of classifications and applications. Comput. Ind. Eng. 2012, 62, 408–420. [Google Scholar] [CrossRef]
- Gupta, A.; Könemann, J. Approximation algorithms for network design: A survey. Surv. Oper. Res. Manag. Sci. 2011, 16, 3–20. [Google Scholar] [CrossRef]
- Cordeau, J.F.; Toth, P.; Vigo, D. A survey of optimization models for train routing and scheduling. Transp. Sci. 1998, 32, 380–404. [Google Scholar] [CrossRef]
- Díaz-Parra, O.; Ruiz-Vanoye, J.A.; Bernábe Loranca, B.; Fuentes-Penna, A.; Barrera-Cámara, R.A. A survey of transportation problems. J. Appl. Math. 2014, 2014, 848129. [Google Scholar] [CrossRef]
- Feeney, G.J. The Distribution of Empty Freight Cars; Columbia University: New York, NY, USA, 1959. [Google Scholar]
- Beaujon, G.J.; Turnquist, M.A. A model for fleet sizing and vehicle allocation. Transp. Sci. 1991, 251, 19–45. [Google Scholar] [CrossRef]
- Baykasoğlu, A.; Subulan, K.; Taşan, A.S.; Dudaklı, N. A review of fleet planning problems in single and multimodal transportation systems. Transp. A Transp. Sci. 2019, 15, 631–697. [Google Scholar] [CrossRef]
- Zhang, H.; Ge, H.; Yang, J.; Tong, Y. Review of vehicle routing problems: Models, classification and solving algorithms. In Archives of Computational Methods in Engineering; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–27. [Google Scholar]
- Golden, B.; Wang, X.; Wasil, E. The Evolution of the Vehicle Routing Problem—A Survey of VRP Research and Practice from 2005 to 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–64. [Google Scholar]
- Henn, S. Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses. Flex. Serv. Manuf. J. 2015, 27, 86–114. [Google Scholar] [CrossRef]
- Hu, Q.; Corman, F.; Lodewijks, G. A review of intermodal rail freight bundling operations. In Proceedings of the Computational Logistics: 6th International Conference, ICCL 2015, Delft, The Netherlands, 23–25 September 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 451–463. [Google Scholar]
- Gao, B.; Ou, D.; Dong, D.; Wu, Y. A data-driven two-stage prediction model for train primary-delay recovery time. Int. J. Softw. Eng. Knowl. Eng. 2020, 30, 921–940. [Google Scholar] [CrossRef]
- Babai, M.Z.; Boylan, J.E.; Rostami-Tabar, B. Demand forecasting in supply chains: A review of aggregation and hierarchical approaches. Int. J. Prod. Res. 2022, 60, 324–348. [Google Scholar] [CrossRef]
- Shah, N.H.; Mittal, M. Optimization and Inventory Management; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Perez, H.D.; Hubbs, C.D.; Li, C.; Grossmann, I.E. Algorithmic approaches to inventory management optimization. Processes 2021, 9, 102. [Google Scholar] [CrossRef]
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Simao, H.P.; Day, J.; George, A.P.; Gifford, T.; Nienow, J.; Powell, W.B. An approximate dynamic programming algorithm for large-scale fleet management: A case application. Transp. Sci. 2009, 43, 178–197. [Google Scholar] [CrossRef]
- Novoa, C.; Storer, R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands. Eur. J. Oper. Res. 2009, 196, 509–515. [Google Scholar] [CrossRef]
- Powell, W.B. A unified framework for stochastic optimization. Eur. J. Oper. Res. 2019, 275, 795–821. [Google Scholar] [CrossRef]
- Stimpson, D.; Ganesan, R. A reinforcement learning approach to convoy scheduling on a contested transportation network. Optim. Lett. 2015, 9, 1641–1657. [Google Scholar] [CrossRef]
- Goodson, J.C.; Ohlmann, J.W.; Thomas, B.W. Rollout policies for dynamic solutions to the multivehicle routing problem with stochastic demand and duration limits. Oper. Res. 2013, 61, 138–154. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2 (NIPS’14); MIT Press: Cambridge, MA, USA, 2014; pp. 3104–3112. Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf (accessed on 20 September 2024).
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In ICLR 2015. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Nowak, A.; Bruna, J. Divide and Conquer with Neural Networks. 2016. Available online: https://openreview.net/forum?id=Hy3_KuYxg (accessed on 1 August 2024).
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://papers.nips.cc/paper_files/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdf (accessed on 1 September 2024).
- Kool, W.; Van Hoof, H.; Welling, M. Attention, learn to solve routing problems! arXiv 2018, arXiv:1803.08475. [Google Scholar]
- Deudon, M.; Cournut, P.; Lacoste, A.; Adulyasak, Y.; Rousseau, L.M. Learning heuristics for the tsp by policy gradient. In Proceedings of the Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 15th International Conference, Delft, The Netherlands, 26–29 June 2018; pp. 170–181. [Google Scholar]
- Vinyals, O.; Bengio, S.; Kudlur, M. Order matters: Sequence to sequence for sets. arXiv 2015, arXiv:1511.06391. [Google Scholar]
- Li, J.; Ma, Y.; Gao, R.; Cao, Z.; Lim, A.; Zhang, J. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. IEEE Trans. Cybern. 2021, 52, 13572–13585. [Google Scholar] [CrossRef]
- Ren, L.; Fan, X.; Cui, J.; Shen, Z.; Lv, Y.; Xiong, G. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16410–16420. [Google Scholar] [CrossRef]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2008, 1050, 10–48550. [Google Scholar]
- Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
- He, S.; Shin, K.G. Towards fine-grained flow forecasting: A graph attention approach for bike sharing systems. Proc. Web Conf. 2020, 88–98. [Google Scholar] [CrossRef]
- Fang, X.; Huang, J.; Wang, F.; Zeng, L.; Liang, H.; Wang, H. Constgat: Contextual spatial-temporal graph attention network for travel time estimation at baidu maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, Virtual, 6–10 July 2020; pp. 2697–2705. [Google Scholar]
- Sun, C.; Li, C.; Lin, X.; Zheng, T.; Meng, F.; Rui, X.; Wang, Z. Attention-based graph neural networks: A survey. Artif. Intell. Rev. 2023, 56 (Suppl. S2), 2263–2310. [Google Scholar] [CrossRef]
- Oroojlooyjadid, A.; Nazari, M.; Snyder, L.V.; Takáč, M. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manuf. Serv. Oper. Manag. 2022, 241, 285–304. [Google Scholar] [CrossRef]
- Obara, M.; Kashiyama, T.; Sekimoto, Y. Deep reinforcement learning approach for train rescheduling utilizing graph theory. In Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018; pp. 4525–4533. [Google Scholar]
- Khadilkar, H. A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. Intell. Transp. Syst. 2018, 20, 727–736. [Google Scholar] [CrossRef]
- Guo, W.; Atasoy, B.; Negenborn, R.R. Global synchromodal shipment matching problem with dynamic and stochastic travel times: A reinforcement learning approach. Ann. Oper. Res. 2022, 1–32. [Google Scholar] [CrossRef]
- Adi, T.N.; Iskandar, Y.A.; Bae, H. Interterminal truck routing optimization using deep reinforcement learning. Sensors 2020, 20, 5794. [Google Scholar] [CrossRef]
- Rolf, B.; Jackson, I.; Müller, M.; Lang, S.; Reggelin, T.; Ivanov, D. A review on reinforcement learning algorithms and applications in supply chain management. Int. J. Prod. Res. 2022, 61, 7151–7179. [Google Scholar] [CrossRef]
- Yan, Y.; Chow, A.H.; Ho, C.P.; Kuo, Y.H.; Wu, Q.; Ying, C. Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities. Transp. Res. Part E Logist. Transp. Rev. 2022, 162, 102712. [Google Scholar] [CrossRef]
- Zong, Z.; Feng, T.; Xia, T.; Jin, D.; Li, Y. Deep Reinforcement Learning for Demand Driven Services in Logistics and Transportation Systems: A Survey. arXiv 2021, arXiv:2108.04462. [Google Scholar]
- Wang, Q.; Tang, C. Deep reinforcement learning for transportation network combinatorial optimization: A survey. Knowl.-Based Syst. 2021, 233, 107526. [Google Scholar] [CrossRef]
- Gokhale, A.; Trasikar, C.; Shah, A.; Hegde, A.; Naik, S.R. A Reinforcement Learning Approach to Inventory Management. In Advances in Artificial Intelligence and Data Engineering: Select Proceedings of AIDE; Springer: Singapore, 2019; pp. 281–297. [Google Scholar]
- Li, X.; Zhang, J.; Bian, J.; Tong, Y.; Liu, T.Y. A cooperative multi-agent reinforcement learning framework for resource balancing in complex logistics network. arXiv 2019, arXiv:1903.00714. [Google Scholar]
- Boute, R.N.; Gijsbrechts, J.; Van Jaarsveld, W.; Vanvuchelen, N. Deep reinforcement learning for inventory control: A roadmap. Eur. J. Oper. Res. 2022, 298, 401–412. [Google Scholar] [CrossRef]
- Chen, J.; Umrawal, A.K.; Lan, T.; Aggarwal, V. DeepFreight: A Model-free Deep-reinforcement-learning-based Algorithm for Multi-transfer Freight Delivery. In Proceedings of the International Conference on Automated Planning and Scheduling, Guangzhou, China, 2–13 August 2021; Volume 31, pp. 510–518. [Google Scholar]
- Vera, J.M.; Abad, A.G. Deep reinforcement learning for routing a heterogeneous fleet of vehicles. In Proceedings of the 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Guayaquil, Ecuador, 11–15 November 2019; pp. 1–6. [Google Scholar]
- Liu, X.; Hu, M.; Peng, Y.; Yang, Y. Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management; SSRN: Rochester, NY, USA, 2022. [Google Scholar]
- Chen, Y.; Qian, Y.; Yao, Y.; Wu, Z.; Li, R.; Xu, Y. Can sophisticated dispatching strategy acquired by reinforcement learning?—A case study in dynamic courier dispatching system. arXiv 2019, arXiv:1903.02716. [Google Scholar]
- Beeks, M.; Afshar, R.R.; Zhang, Y.; Dijkman, R.; van Dorst, C.; de Looijer, S. Deep reinforcement learning for a multi-objective online order batching problem. In Proceedings of the International Conference on Automated Planning and Scheduling, Virtual, 13–24 June 2022; Volume 32, pp. 435–443. [Google Scholar]
- Hutsebaut-Buysse, M.; Mets, K.; Latré, S. Hierarchical reinforcement learning: A survey and open research challenges. Mach. Learn. Knowl. Extr. 2022, 4, 172–221. [Google Scholar] [CrossRef]
- Bengio, Y.; Lodi, A.; Prouvost, A. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 2021, 290, 405–421. [Google Scholar] [CrossRef]
- Biedenkapp, A.; Bozkurt, H.F.; Eimer, T.; Hutter, F.; Lindauer, M. Dynamic algorithm configuration: Foundation of a new meta-algorithmic framework. In ECAI 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 427–434. [Google Scholar]
- Ni, F.; Hao, J.; Lu, J.; Tong, X.; Yuan, M.; He, K. A multi-graph attributed reinforcement learning based optimization algorithm for large-scale hybrid flow shop scheduling problem. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, Virtual, 14–18 August 2021; pp. 3441–3451. [Google Scholar]
- Kerkkamp, D.; Bukhsh, Z.A.; Zhang, Y.; Jansen, N. Grouping of Maintenance Actions with Deep Reinforcement Learning and Graph Convolutional Networks. In Proceedings of the ICAART, Virtual, 3–5 February 2022; pp. 574–585. [Google Scholar]
- Popescu, T. Reinforcement Learning for Train Dispatching: A Study on the Possibility to Use Reinforcement Learning to Optimize Train Ordering and Minimize Train Delays in Disrupted Situations, inside the Rail Simulator OSRD. KTH, School of Electrical Engineering and Computer Science (EECS). Dissertation. 2022. Available online: https://www.diva-portal.org/smash/get/diva2:1702837/FULLTEXT01.pdf (accessed on 20 September 2024).
- Zhu, H.; Gupta, V.; Ahuja, S.S.; Tian, Y.; Zhang, Y.; Jin, X. Network planning with deep reinforcement learning. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, Virtual, 23–27 August 2021; pp. 258–271. [Google Scholar]
- Song, W.; Chen, X.; Li, Q.; Cao, Z. Flexible Job-Shop Scheduling via Graph Neural Network and Deep Reinforcement Learning. IEEE Trans. Ind. Inform. 2022, 19, 1600–1610. [Google Scholar] [CrossRef]
- Ren, J.F.; Ye, C.M.; Yang, F. A novel solution to jsps based on long short-term memory and policy gradient algorithm. Int. J. Simul. Model. 2020, 19, 157–168. [Google Scholar] [CrossRef]
- Han, B.; Yang, J. A deep reinforcement learning based solution for flexible job shop scheduling problem. Int. J. Simul. Model. 2021, 20, 375–386. [Google Scholar] [CrossRef]
- Oren, J.; Ross, C.; Lefarov, M.; Richter, F.; Taitler, A.; Daniel, C. SOLO: Search online, learn offline for combinatorial optimization problems. In Proceedings of the International Symposium on Combinatorial Search, Virtual, 26–30 July 2021; Volume 12, pp. 97–105. [Google Scholar]
- Rashid, T.; Farquhar, G.; Peng, B.; Whiteson, S. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 10199–10210. [Google Scholar]
- Chen, H.; Li, Z.; Yao, Y. Multi-agent reinforcement learning for fleet management: A survey. In Proceedings of the 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing, AIAHPC 2022, Zhuhai, China, 25–27 February 2022; Volume 12348, pp. 611–624. [Google Scholar]
- Ahn, K.; Park, J. Cooperative zone-based rebalancing of idle overhead hoist transportations using multi-agent reinforcement learning with graph representation learning. IISE Trans. 2021, 53, 1140–1156. [Google Scholar] [CrossRef]
- Pan, L.; Cai, Q.; Fang, Z.; Tang, P.; Huang, L. A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1393–1400. [Google Scholar]
- Xi, J.; Zhu, F.; Ye, P.; Lv, Y.; Tang, H.; Wang, F.Y. HMDRL: Hierarchical Mixed Deep Reinforcement Learning to Balance Vehicle Supply and Demand. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21861–21872. [Google Scholar] [CrossRef]
- Cals, B.; Zhang, Y.; Dijkman, R.; van Dorst, C. Solving the online batching problem using deep reinforcement learning. Comput. Ind. Eng. 2021, 156, 107221. [Google Scholar] [CrossRef]
- Hottung, A.; Tanaka, S.; Tierney, K. Deep learning assisted heuristic tree search for the container pre-marshalling problem. Comput. Oper. Res. 2020, 113, 104781. [Google Scholar] [CrossRef]
- Nazari, M.; Oroojlooy, A.; Snyder, L.; Takác, M. Reinforcement learning for solving the vehicle routing problem. Adv. Neural Inf. Process. Syst. 2018, 31. Available online: https://dl.acm.org/doi/10.5555/3327546.3327651 (accessed on 20 September 2024).
- Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; Song, L. Learning combinatorial optimization algorithms over graphs. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://dl.acm.org/doi/10.5555/3295222.3295382 (accessed on 20 September 2024).
- Joshi, C.K.; Laurent, T.; Bresson, X. An efficient graph convolutional network technique for the travelling salesman problem. arXiv 2019, arXiv:1906.01227. [Google Scholar]
- Li, J.; Xin, L.; Cao, Z.; Lim, A.; Song, W.; Zhang, J. Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2021, 23, 2306–2315. [Google Scholar] [CrossRef]
- Falkner, J.K.; Schmidt-Thieme, L. Learning to solve vehicle routing problems with time windows through joint attention. arXiv 2020, arXiv:2006.09100. [Google Scholar]
- Foa, S.; Coppola, C.; Grani, G.; Palagi, L. Solving the vehicle routing problem with deep reinforcement learning. arXiv 2022, arXiv:2208.00202. [Google Scholar]
- Li, X.; Luo, W.; Yuan, M.; Wang, J.; Lu, J.; Wang, J.; Zeng, J. Learning to optimize industry-scale dynamic pickup and delivery problems. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 2511–2522. [Google Scholar]
- Ma, Y.; Hao, X.; Hao, J.; Lu, J.; Liu, X.; Xialiang, T.; Meng, Z. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems. Adv. Neural Inf. Process. Syst. 2021, 34, 23609–23620. [Google Scholar]
- Wu, Y.; Song, W.; Cao, Z.; Zhang, J.; Lim, A. Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5057–5069. [Google Scholar] [CrossRef]
- Da Costa, P.R.D.O.; Rhuggenaath, J.; Zhang, Y.; Akcay, A. Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. arXiv 2020, arXiv:2004.01608. [Google Scholar]
- Hottung, A.; Tierney, K. Neural large neighborhood search for the capacitated vehicle routing problem. arXiv 2019, arXiv:1911.09539. [Google Scholar]
- Li, S.; Yan, Z.; Wu, C. Learning to delegate for large-scale vehicle routing. Adv. Neural Inf. Process. Syst. 2021, 34, 26198–26211. [Google Scholar]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
- Hassan, L.A.H.; Mahmassani, H.S.; Chen, Y. Reinforcement learning framework for freight demand forecasting to support operational planning decisions. Transp. Res. Part E Logist. Transp. Rev. 2020, 137, 101926. [Google Scholar] [CrossRef]
- Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. Proc. AAAI Conf. Artif. Intell. 2019, 33, 922–929. [Google Scholar] [CrossRef]
- Zhao, J.; Liu, Z.; Sun, Q.; Li, Q.; Jia, X.; Zhang, R. Attention-based dynamic spatial-temporal graph convolutional networks for traffic speed forecasting. Expert Syst. Appl. 2022, 204, 117511. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, M.; Zhao, X.; Zou, J. A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7208–7218. [Google Scholar] [CrossRef]
- Iklassov, Z.; Medvedev, D. Robust Reinforcement Learning on Graphs for Logistics optimization. arXiv 2022, arXiv:2205.12888. [Google Scholar]
- Porras-Valenzuela, J.F. A Deep Reinforcement Learning Approach to Multistage Stochastic Network Flows for Distribution Problems. Instituto Tecnológico de Costa Rica; Thesis. 2022. Available online: https://repositoriotec.tec.ac.cr/handle/2238/13949 (accessed on 20 September 2024).
- Zhang, Y.; Negenborn, R.R.; Atasoy, B. Synchromodal freight transport re-planning under service time uncertainty: An online model-assisted reinforcement learning. Transp. Res. Part C Emerg. Technol. 2023, 156, 104355. [Google Scholar] [CrossRef]
- Zou, G.; Tang, J.; Yilmaz, L.; Kong, X. Online food ordering delivery strategies based on deep reinforcement learning. Appl. Intell. 2022, 52, 6853–6865. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, Q.; Li, J.; Xu, C. Dynamic fleet management with rewriting deep reinforcement learning. IEEE Access 2020, 8, 143333–143341. [Google Scholar] [CrossRef]
- Lu, H.; Zhang, X.; Yang, S. A learning-based iterative method for solving vehicle routing problems. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Kalakanti, A.K.; Verma, S.; Paul, T.; Yoshida, T. RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem. In Proceedings of the 2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS), Ipoh, Malaysia, 19 September 2019. [Google Scholar]
- James, J.Q.; Yu, W.; Gu, J. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar]
- Wang, Q. VARL: A variational autoencoder-based reinforcement learning Framework for vehicle routing problems. Appl. Intell. 2021, 52, 8910–8923. [Google Scholar] [CrossRef]
- Xing, Z.; Tu, S. A graph neural network assisted monte-carlo tree search approach to traveling salesman problem. IEEE Access 2020, 8, 108418–108428. [Google Scholar] [CrossRef]
- Xu, Y.; Fang, M.; Chen, L.; Xu, G.; Du, Y.; Zhang, C. Reinforcement learning with multiple relational attention for solving vehicle routing problems. IEEE Trans. Cybern. 2021, 52, 11107–11120. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, H.; Zhou, M.; Wang, J. Solving dynamic traveling salesman problems with deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Systems. 2021, 34, 2119–2132. [Google Scholar] [CrossRef]
- Zhang, K.; He, F.; Zhang, Z.; Lin, X.; Li, M. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach. Transp. Res. Part C Emerg. Technol. 2020, 121, 102861. [Google Scholar] [CrossRef]
- Zong, Z.; Zheng, M.; Li, Y.; Jin, D. Mapdp: Cooperative multi-agent reinforcement learning to solve pickup and delivery problems. Proc. AAAI Conf. Artif. Intell. 2022, 36, 9980–9988. [Google Scholar] [CrossRef]
- Anguita, J.G.M.; Olariaga, O.D. Air cargo transport demand forecasting using ConvLSTM2D, an artificial neural network architecture approach. Case Stud. Transp. Policy 2023, 12, 101009. [Google Scholar] [CrossRef]
- Heglund, J.S.; Taleongpong, P.; Hu, S.; Tran, H.T. Railway delay prediction with spatial-temporal graph convolutional networks. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
- Heydaribeni, N.; Zhan, X.; Zhang, R.; Eliassi-Rad, T.; Koushanfar, F. Distributed constrained combinatorial optimization leveraging hypergraph neural networks. Nat. Mach. Intell. 2024, 6, 664–672. [Google Scholar] [CrossRef]
- Wan, C.P.; Li, T.; Wang, J.M. RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research. arXiv 2023, arXiv:2303.13117. [Google Scholar]
- Berto, F.; Hua, C.; Park, J.; Kim, M.; Kim, H.; Son, J.; Kim, H.; Angioni, D.; Kool, W.; Cao, Z.; et al. Rl4co: An extensive reinforcement learning for combinatorial optimization benchmark. arXiv 2023, arXiv:2306.17100. [Google Scholar]
- Zheng, X.; Yin, M.; Zhang, Y. Integrated optimization of location, inventory and routing in supply chain network design. Transp. Res. Part B Methodol. 2019, 121, 1–20. [Google Scholar] [CrossRef]
- Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1234–1241. [Google Scholar] [CrossRef]
- Cheng, R.; Li, Q. Modeling the momentum spillover effect for stock prediction via attribute-driven graph attention networks. Proc. AAAI Conf. Artif. Intell. 2021, 35, 55–62. [Google Scholar] [CrossRef]
- Xin, L.; Song, W.; Cao, Z.; Zhang, J. Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. Proc. AAAI Conf. Artif. Intell. 2021, 35, 12042–12049. [Google Scholar] [CrossRef]
- Boffa, M.; Houidi, Z.B.; Krolikowski, J.; Rossi, D. Neural combinatorial optimization beyond the TSP: Existing architectures under-represent graph structure. arXiv 2022, arXiv:2201.00668. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Goldberg, K.; Gonzalez, J.; Jordan, M.; Stoica, I. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning; PMLR: Birmingham, UK, 2018; pp. 3053–3062. [Google Scholar]
- Kwon, Y.D.; Choo, J.; Kim, B.; Yoon, I.; Gwon, Y.; Min, S. Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21188–21198. [Google Scholar]
- Fitzpatrick, J.; Ajwani, D.; Carroll, P. A Scalable Learning Approach for the Capacitated Vehicle Routing Problem; SSRN: Rochester, NY, USA, 2023; SSRN 4633199. [Google Scholar]
- Krasowski, H.; Thumm, J.; Müller, M.; Schäfer, L.; Wang, X.; Althoff, M. Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking. Trans. Mach. Learn. Res. 2023. [Google Scholar] [CrossRef]
- Kochdumper, N.; Krasowski, H.; Wang, X.; Bak, S.; Althoff, M. Provably safe reinforcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open J. Control. Syst. 2023, 2, 79–92. [Google Scholar] [CrossRef]
- Berkenkamp, F.; Schoellig, A.P.; Turchetta, M.; Krause, A. Safe model-based reinforcement learning with stability guarantees. Proc. Int. Conf. Neural Inf. Process. Syst. 2017, 30, 908–918. [Google Scholar]
- Garmendia, A.I.; Ceberio, J.; Mendiburu, A. Neural combinatorial optimization: A new player in the field. arXiv 2022, arXiv:2205.01356. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deineko, E.; Jungnickel, P.; Kehrt, C. Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art. Appl. Sci. 2024, 14, 8642. https://doi.org/10.3390/app14198642
Deineko E, Jungnickel P, Kehrt C. Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art. Applied Sciences. 2024; 14(19):8642. https://doi.org/10.3390/app14198642
Chicago/Turabian StyleDeineko, Elija, Paul Jungnickel, and Carina Kehrt. 2024. "Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art" Applied Sciences 14, no. 19: 8642. https://doi.org/10.3390/app14198642
APA StyleDeineko, E., Jungnickel, P., & Kehrt, C. (2024). Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art. Applied Sciences, 14(19), 8642. https://doi.org/10.3390/app14198642