Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Xuan Zuo ORCID: orcid.org/0000-0002-8586-9834¹,
Pu Zhang^1,2,
Hui-Yan Li³ &
…
Zhun-Ga Liu¹

314 Accesses
1 Citation
Explore all metrics

Abstract

Multi-agent reinforcement learning is a varied and highly active field of research. The idea of parameter sharing or experience sharing has recently been introduced into multi-agent reinforcement learning to accelerate the training of multiple neural networks and enhance the final returns. However, implementing the parameter or experience sharing methods in multi-agent environments could introduce additional constraint or computation cost. This work presents a preference-based experience sharing scheme, which allows for different policies in environments with weakly homogeneous agents and requires barely any additional computational power. In this scheme, the experience replay buffer is augmented by adding a choice vector which indicates the preferred target of the agent, and each agent can learn various policies from the experience data collected by other agents who choose the same target. PSE-MADDPG, an off-policy algorithm with the preference-based experience sharing scheme, is proposed and benchmarked on a multi-target assignment and cooperative navigation mission. Experimental results show that PSE-MADDPG can successfully solve the problem of multiple targets assignment and outperform two classical deep reinforcement learning algorithms by learning in fewer steps and converging to higher episode rewards. Meanwhile, PSE-MADDPG relaxes the constraint of the strongly homogeneous agents assumption and requires little additional computation cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Article Open access 21 February 2023

An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay

Deep Skill Chaining with Diversity for Multi-agent Systems*

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data and materials that support the findings of this study are available from the corresponding author upon reasonable request.

Code availability

The code is accessible on a GitHub repository named 'PSE-MADDPG', visit: https://github.com/guanzhongzx/PSE-MADDPG.

References

Ahuja RK, Kumar A, Jha KC, Orlin JB (2007) Exact and heuristic algorithms for the weapon-target assignment problem. Oper Res 55(6):1136–1146. https://doi.org/10.1287/opre.1070.0440
Article MathSciNet Google Scholar
Albrecht SV, Christianos F, Schäfer L (2023) Multi-agent reinforcement learning: foundations and modern approaches. MIT Press, Cambridge. https://www.marl-book.com
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13:834–846. https://doi.org/10.1109/TSMC.1983.6313077
Article Google Scholar
Bellingham J, Richards A, How JP (2002) Receding horizon control of autonomous aerial vehicles. In: ACC2002 (ed) Proceedings of the 2002 American control conference, vol 5. American Automatic Control Council, Anchorage, pp 3741–3746. https://doi.org/10.1109/ACC.2002.1024509
Bello I, Pham H, Le QV, Norouzi M, Bengio S (2016) Neural combinatorial optimization with reinforcement learning. ArXiv CoRR arXiv:abs/1611.09940
Christianos F, Schäfer L, Albrecht SV (2020) Shared experience actor-critic for multi-agent reinforcement learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Proceedings of the 34th international conference on neural information processing systems. NIPS’20. WASET, Red Hook, pp 10707–10717
Degris T, White M, Sutton RS (2012) Off-policy actor-critic. In: Langford J, Pineau J (eds) Proceedings of the 29th international conference on machine learning. IMLS, Edinburgh, pp 179–186
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Furman J, Marchant G, Price H, Rossi F (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence. AAAI’18/IAAI’18/EAAI’18. AAAI Press, New Orleans, pp 2974–2982. https://doi.org/10.1609/aaai.v32i1.11794
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354. https://doi.org/10.1561/2200000071
Article Google Scholar
Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: a survey. Artif Intell Rev 55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w
Article Google Scholar
Grondman I, Busoniu L, Lopes GAD, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Article Google Scholar
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar G, Rodriguez-Aguilar JA (eds) Autonomous agents and multiagent systems. IFAAMAS, Cham, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5
Haarnoja T, Tang H, Abbeel P, Levine S (2017) Reinforcement learning with deep energy-based policies. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning. ICML’17, vol 70. IMLS, pp 1352–1361
Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi-Agent Syst 33(6):750–797. https://doi.org/10.1007/s10458-019-09421-1
Article Google Scholar
Hua W, Fan L, Li L, Mei K, Ji J, Ge Y, Hemphill L, Zhang Y (2023) War and peace (waragent): large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227
Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castañeda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A, Sonnerat N, Green T, Deason L, Leibo JZ, Silver D, Hassabis D, Kavukcuoglu K, Graepel T (2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443):859–865. https://doi.org/10.1126/science.aau6249
Article MathSciNet Google Scholar
Kalakanti AK, Verma S, Paul T, Yoshida T (2019) Rl solver pro: reinforcement learning for solving vehicle routing problem. In: Casuarina M, Meru B (eds) 2019 1st international conference on artificial intelligence and data sciences. Sreyas Institute Of Engineering and Technology, Ipoh, pp 94–99. https://doi.org/10.1109/AiDAS47888.2019.8970890
Karasakal O, Karasakal E, Silav A (2021) A multi-objective approach for dynamic missile allocation using artificial neural networks for time sensitive decisions. Soft Comput 25(15):10153–10166. https://doi.org/10.1007/s00500-021-05923-x
Article Google Scholar
Kumar R, Hyland DC (2001) Control law design using repeated trials. In: ACC2001 (ed) Proceedings of the 2001 American control conference, vol 2. American Automatic Control Council, Arlington, pp 837–842. https://doi.org/10.1109/ACC.2001.945820
Lee Z, Lee C, Su S (2002) An immunity-based ant colony optimization algorithm for solving weapon-target assignment problem. Appl Soft Comput 2(1):39–47. https://doi.org/10.1016/S1568-4946(02)00027-3
Article Google Scholar
Lee D, Shin MK, Choi H (2020) Weapon target assignment problem with interference constraints. AIAA Scitech 2020 Forum. AIAA, Orlando. https://doi.org/10.2514/6.2020-0388
Li Y (2018) Deep reinforcement learning. ArXiv CoRR arXiv:abs/1810.06339
Li W, Lyu Y, Dai S, Chen H, Shi J, Li Y (2022) A multi-target consensus-based auction algorithm for distributed target assignment in cooperative beyond-visual-range air combat. Aerospace 9(9):486. https://doi.org/10.3390/aerospace9090486
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, conference track proceedings. ICLR, San Juan. https://doi.org/10.48550/arXiv.1509.02971
Lloyd SP, Witsenhause HS (1986) Weapon allocation is NP-complete. In: Crosbie R, Luker P (eds) Proceeding of the IEEE summer simulation conference. IEEE, Reno, pp 1054–1058
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Luxburg UV, Guyon I (eds) Proceedings of the 31st international conference on neural information processing systems. NIPS’17. WASET, Long Beach, pp 6382–6393
Lu C, Bao Q, Xia S, Qu C (2022) Centralized reinforcement learning for multi-agent cooperative environments. Evol Intell 17:267–273. https://doi.org/10.1007/s12065-022-00703-4
Article Google Scholar
Lv L, Zhang S, Ding D, Wang Y (2019) Path planning via an improved DQN-based learning policy. IEEE Access 7:67319–67330. https://doi.org/10.1109/ACCESS.2019.2918703
Article Google Scholar
Maddula T, Minai AA, Polycarpou MM (2004) Multi-target assignment and path planning for groups of UAVs, Chapter 1. In: Butenko S, Murphey R, Pardalos PM (eds) Recent developments in cooperative control and optimization, Boston, pp 261–272. https://doi.org/10.1007/978-1-4613-0219-3_15
McLain TW, Chandler PR, Rasmussen S, Pachter M (2001) Cooperative control of UAV rendezvous. In: ACC2001 (ed) Proceedings of the 2001 American control conference, vol. 3. American Automatic Control Council, Arlington, pp 2309–2314. https://doi.org/10.1109/ACC.2001.946096
Meng F, Tian K, Wu C (2021) Deep reinforcement learning-based radar network target assignment. IEEE Sens J 21(14):16315–16327. https://doi.org/10.1109/JSEN.2021.3074826
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning. ICML’16, vol 48. IMLS, New York, pp 1928–1937
Na H, Ahn J, Moon I (2023) Weapon-target assignment by reinforcement learning with pointer network. J Aerosp Inf Syst 20(1):53–59. https://doi.org/10.2514/1.I011150
Article Google Scholar
Nazari M, Oroojlooy A, Takáč M, Snyder LV (2018) Reinforcement learning for solving the vehicle routing problem. In: Bengio S, Wallach HM, Cesa-Bianchi N (eds) Proceedings of the 32nd international conference on neural information processing systems. NIPS’18. WASET, Montréal, pp 9861–9871
Okumura K, Défago X (2023) Solving simultaneous target assignment and path planning efficiently with time-independent execution. Artif Intell 321:103946. https://doi.org/10.1016/j.artint.2023.103946
Article MathSciNet Google Scholar
Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning. ICML’17. IMLS, Sydney, pp 2681–2690
Park JS, O’Brien JC, Cai CJ, Morris MR, Liang P, Bernstein MS (2023) Generative agents: interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. PMLR, Stockholmsmässan, Stockholm, pp 4295–4304. https://proceedings.mlr.press/v80/rashid18a.html
Rasmussen S, Chandler P, Mitchell J, Schumacher C, Sparks A (2003) Optimal vs. heuristic assignment of cooperative autonomous unmanned air vehicles. AIAA Guidance, Navigation, and Control Conference and Exhibit. AIAA, Austin. https://doi.org/10.2514/6.2003-5586
Richards A, Bellingham J, Tillerson M, How J (2002) Coordination and control of multiple UAVs. AIAA guidance, navigation, and control conference and exhibit. AIAA, Monterey. https://doi.org/10.2514/6.2002-4588
Schulman J, Levine S, Moritz P, Jordan M, Abbeel P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning. ICML’15, vol. 37. IMLS, Lille, pp 1889–1897
Shin MK, Lee D, Choi H (2019) Weapon-target assignment problem with interference constraints using mixed-integer linear programming. Asia Pacific International Symposium on Aerospace Technology. RAeS Australian Division and Engineers Australia, Gold Coast, pp 2382–2392. https://doi.org/10.48550/arXiv.1911.12567
Shokoohi M, Afsharchi M, Shah-Hoseini H (2022) Dynamic distributed constraint optimization using multi-agent reinforcement learning. Soft Comput 26(8):3601–3629. https://doi.org/10.1007/s00500-022-06820-7
Article Google Scholar
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, vol 32. IMLS, Beijing, pp 387–395
Singh L, Fuller J (2001) Trajectory generation for a UAV in urban terrain, using nonlinear MPC. In: ACC2001 (ed) Proceedings of the 2001 American control conference, vol 3. American Automatic Control Council, Arlington, pp 2301–2308. https://doi.org/10.1109/ACC.2001.946095
Song F, Xing H, Wang X, Luo S, Dai P, Xiao Z, Zhao B (2023) Evolutionary multi-objective reinforcement learning based trajectory control and task offloading in UAV-assisted mobile edge computing. IEEE Trans Mob Comput 22(12):7387–7405. https://doi.org/10.1109/TMC.2022.3208457
Article Google Scholar
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Andre E, Koenig S (eds) Proceedings of the 17th international conference on autonomous agents and multiagent systems. AAMAS ’18. International Foundation for Autonomous Agents and Multiagent Systems, Richland, pp 2085–2087
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge. https://mitpress.mit.edu/9780262352703/reinforcement-learning/
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Cortes C, Lee DD, Sugiyama M, Garnett R (eds) Proceedings of the 28th international conference on neural information processing systems. NIPS’15, vol 2. WASET, Montréal, pp 2692–2700
Wang S, Chen W (2012) Solving weapon-target assignment problems by cultural particle swarm optimization. In: IHMSC’12 (ed) Proceedings of the 2012 4th international conference on intelligent human-machine systems and cybernetics, vol 1. IEEE Computer Society, Nanchang, pp 141–144. https://doi.org/10.1109/IHMSC.2012.41
Wang Z, Liu L, Long T, Wen Y (2018) Multi-UAV reconnaissance task allocation for heterogeneous targets using an opposition-based genetic algorithm with double-chromosome encoding. Chin J Aeronaut 31(2):339–350. https://doi.org/10.1016/j.cja.2017.09.005
Article Google Scholar
Witten IH (1977) An adaptive optimal controller for discrete-time Markov environments. Inf Control 34(4):286–295. https://doi.org/10.1016/S0019-9958(77)90354-0
Article MathSciNet Google Scholar
Wu Y, Lei Y, Zhu Z, Yang X, Li Q (2022) Dynamic multitarget assignment based on deep reinforcement learning. IEEE Access 10:75998–76007. https://doi.org/10.1109/ACCESS.2022.3190972
Article Google Scholar
Xiao Z, Xing H, Zhao B, Qu R, Luo S, Dai P, Li K, Zhu Z (2024) Deep contrastive representation learning with self-distillation. IEEE Trans Emerg Top Comput Intell 8(1):3–15. https://doi.org/10.1109/TETCI.2023.3304948
Article Google Scholar
Zhen Z, Zhu P, Xue Y, Ji Y (2019) Distributed intelligent self-organized mission planning of multi-UAV for dynamic targets cooperative search-attack. Chin J Aeronaut 32(12):2706–2716. https://doi.org/10.1016/j.cja.2019.05.012
Article Google Scholar
Zhu B, Zou F, Wei J (2011) A novel approach to solving weapon-target assignment problem based on hybrid particle swarm optimization algorithm. In: EMEIT2011 (ed) Proceedings of the 2011 international conference on electronic and mechanical engineering and information technology, vol 3. IEEE, Harbin, pp 1385–1387. https://doi.org/10.1109/EMEIT.2011.6023352
Zhu J, Zhao C, Li X, Bao W (2021) Multi-target assignment and intelligent decision based on reinforcement learning. Acta Armamentarii 42(9):2040–2048. https://doi.org/10.3969/j.issn.1000-1093.2021.09.025
Article Google Scholar
Zou Z, Chen Q (2022) Decision tree-based target assignment for confrontation of multiple space vehicles. Acta Aeronaut Astronaut Sin 43(S1):726910. https://doi.org/10.7527/S1000-6893.2022.26910
Article Google Scholar

Download references

Funding

The first and second author are supported by the National Natural Science Foundation of China (No. 61790552).

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, West Youyi Road, Xi’an, 710072, Shaanxi, China
Xuan Zuo, Pu Zhang & Zhun-Ga Liu
School of Automation and Information Engineering, Xi’an University of Technology, Jinhua Road, Xi’an, 710048, Shaanxi, China
Pu Zhang
China Aerospace Academy of Systems Science and Engineering, Fucheng Road, Haidian, Beijing, 100048, China
Hui-Yan Li

Authors

Xuan Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Pu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhun-Ga Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XZ contributed to designing the method, running the experiments and writing all sections. ZL contributed to providing feedback and guidance. PZ contributed to the conceptualisation and formal analysis. HL contribute to the methodology. All authors contributed on the results analysis and the manuscript revision.

Corresponding author

Correspondence to Xuan Zuo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Compatible function approximation

We consider off-policy MARL methods that learn a deterministic target policy $\mu _{\theta }(s)$ from trajectories generated by an arbitrary stochastic behaviour policy $\pi (s, a)$. Similar to the stochastic case, we find a critic $Q^{w}(s, a)$ such that the gradient $\nabla _a Q^{\mu }(s, a)$ can be replaced by $\nabla _a Q^w(s, a)$, without affecting the deterministic policy gradient. The following theorem applies to both on-policy, ${\mathbb {E}}[\cdot ]={\mathbb {E}}_{s \sim \rho ^{\mu }}[\cdot ]$, and off-policy, ${\mathbb {E}}[\cdot ]={\mathbb {E}}_{s \sim \rho ^{\beta }}[\cdot ]$.

Theorem 1

A function approximator $Q^{w}(s, a)$ is compatible with a deterministic policy $\mu _{\theta }(s)$, $\nabla _\theta J_{\beta }(\theta ) = {\mathbb {E}}\left[ \left. \nabla _\theta \mu _\theta (s) \nabla _a Q^w(s, a)\right| _{a=\mu _\theta (s)}\right]$, if:

(i)
$\left. \nabla _a Q^w(s, a)\right| _{a=\mu _\theta (s)}=\nabla _\theta \mu _\theta (s)^{\top } w$ and
(ii)
w minimises the mean-squared error, $\text {MSE}(\theta , w)= {\mathbb {E}}\left[ \epsilon (s; \theta , w)^{\top } \epsilon (s; \theta , w)\right]$ where $\epsilon (s; \theta , w) = \left. \nabla _a Q^w(s, a)\right| _{a=\mu _\theta (s)}-\left. \nabla _a Q^\mu (s, a)\right| _{a=\mu _\theta (s)}$.

Proof

If w minimises the MSE then the gradient of ${\epsilon }^2$ with respect to w must be zero. We then use the fact that, by condition (i), $\nabla _w \epsilon (s; \theta , w)=\nabla _\theta \mu _\theta (s)$

$$\begin{aligned} \begin{aligned} \nabla _w \text {MSE}(\theta , w)&= 0 \\ {\mathbb {E}}\left[ \nabla _\theta \mu _\theta (s) \epsilon (s ; \theta , w)\right]&= 0 \\ {\mathbb {E}}\left[ \left. \nabla _\theta \mu _\theta (s) \nabla _a Q^w(s, a)\right| _{a=\mu _\theta (s)}\right]&= {\mathbb {E}}\left[ \left. \nabla _\theta \mu _\theta (s) \nabla _a Q^\mu (s, a)\right| _{a=\mu _\theta (s)}\right] \\&= \nabla _\theta J_\beta \left( \mu _\theta \right) \text{ or } \nabla _\theta J\left( \mu _\theta \right) \end{aligned} \end{aligned}$$

For any deterministic policy $\mu _{\theta }(s)$, there always exists a compatible function approximator of the form $Q^w(s, a) = \left( a-\mu _\theta (s)\right) ^{\top } \nabla _\theta \mu _\theta (s)^{\top } w + V^v(s)$, where $V^v(s)$ may be any differentiable baseline function that is independent of the action a. For example, a linear combination of state features $\phi (s)$ and parameters v, $V^v(s) = v^{\top }\phi (s)$ for parameters v. We note that a linear function approximator is not very useful for predicting action-values globally, since the action value diverges to $\pm \infty$ for large actions. As a result, a linear function approximator is sufficient to select the direction in which the actor should adjust its policy parameters. $\square$

Appendix B: The pseudocode of RSE-MADDPG

The RSE-MADDPG algorithm is used as baseline for the proposed method.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zuo, X., Zhang, P., Li, HY. et al. Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments. Evolving Systems 15, 1681–1699 (2024). https://doi.org/10.1007/s12530-024-09587-4

Download citation

Received: 28 October 2023
Accepted: 19 April 2024
Published: 09 May 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s12530-024-09587-4

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay

Deep Skill Chaining with Diversity for Multi-agent Systems*

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A: Compatible function approximation

Theorem 1

Proof

Appendix B: The pseudocode of RSE-MADDPG

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay

Deep Skill Chaining with Diversity for Multi-agent Systems*

Explore related subjects

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A: Compatible function approximation

Theorem 1

Proof

Appendix B: The pseudocode of RSE-MADDPG

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now