Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

ASN: action semantics network for multiagent reinforcement learning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

In multiagent systems (MASs), each agent makes individual decisions but all contribute globally to the system’s evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. Previous works borrow various multiagent coordination mechanisms for use in deep learning architectures to facilitate multiagent coordination. However, none of them explicitly consider that different actions can have different influence on other agents, which we call the action semantics. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show that ASN significantly improves the performance of state-of-the-art DRL approaches, compared with several other network architectures. We also successfully deploy ASN to a popular online MMORPG game called Justice Online, which indicates a promising future for ASN to be applied in even more complex scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. More details can be found at https://sites.google.com/view/asn-intro, the source code is at https://github.com/wwxFromTju/ASN_cloud

  2. https://github.com/wwxFromTju/MA-RLlib

  3. Our ASN is compatible with extensions of QMIX since these methods follow the QMIX structure and we select QMIX as an representative baseline.

  4. The implementation details are not public since this is a commercial game and all details are close-sourced.

References

  1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

  2. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.

    Article  Google Scholar 

  3. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In Proceedings of the 4th international conference on learning representations.

  4. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T. P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.

    Article  Google Scholar 

  5. Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the fifteenth national conference on artificial intelligence and tenth innovative applications of artificial intelligence conference (pp. 746–752).

  6. Hu, J., Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the fifteenth international conference on machine learning (pp. 242–250).

  7. Bu, L., Babu, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156–172.

    Article  Google Scholar 

  8. Hauwere, Y. D., Devlin, S., Kudenko, D., & Nowé, A. (2016). Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review, 31(1), 59–76.

    Article  Google Scholar 

  9. Yang, T., Wang, W., Tang, H., Hao, J., Meng, Z., Mao, H., Li, D., Liu, W., Chen, Y., & Hu, Y. (2021). An efficient transfer learning framework for multiagent reinforcement learning. In Advances in neural information processing systems (vol. 34, pp. 17037–17048).

  10. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in neural information processing systems (pp. 6379–6390).

  11. Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In Proceedings of the thirty-second AAAI conference on artificial intelligence.

  12. Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning (pp. 5567–5576).

  13. Stanley, H. E. (1971). Phase transitions and critical phenomena. Clarendon Press.

  14. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., & Tuyls, K. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 2085–2087).

  15. Rashid, T., Samvelyan, M., Witt, C. S., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning (pp. 4292–4301).

  16. Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Advances in neural information processing systems (vol. 29, pp. 2244–2252).

  17. Singh, A., Jain, T., & Sukhbaatar, S. (2019). Individualized controlled continuous communication model for multiagent cooperative and competitive tasks. In Proceedings of the 7th international conference on learning representations.

  18. Zambaldi, V. F., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D. P., Lillicrap, T. P., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M., Vinyals, O., & Battaglia, P. W. (2019). Deep reinforcement learning with relational inductive biases. In Proceedings of the 7th international conference on learning representations.

  19. Tacchetti, A., Song, H. F., Mediano, P. A. M., Zambaldi, V.F., Kramár, J., Rabinowitz, N. C., Graepel, T., Botvinick, M., Battaglia, P. . (2019). Relational forward models for multi-agent learning. In Proceedings of the 7th international conference on learning representations.

  20. Pachocki, J., Brockman, G., Raiman, J., Zhang, S., Pondé, H., Tang, J., Wolski, F., Dennison, C., Jozefowicz, R., Debiak, P., et al. (2018). OpenAI five, 2018. URL https://blog.openai.com/openai-five

  21. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  22. Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C., Torr, P. H. S., Foerster, J. N., & Whiteson, S. (2019). The starcraft multi-agent challenge, pp. 2186–2188.

  23. Suarez, J., Du, Y., Isola, P., & Mordatch, I. (2019). Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784

  24. Wang, W., Yang, T., Liu, Y., Hao, J., Hao, X., Hu, Y., Chen, Y., Fan, C., & Gao, Y. (2020). Action semantics network: Considering the effects of actions in multiagent systems. In Proceedings of the 8th international conference on learning representations.

  25. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163).

  26. Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Proceedings of the nineteenth national conference on artificial intelligence (pp. 709–715).

  27. Watkins, C. J. C. H., & Dayan, P. (1992). Technical note Q-learning. Machine Learning, 8, 279–292.

    Article  Google Scholar 

  28. van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. In Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 2094–2100).

  29. Anschel, O., Baram, N., & Shimkin, N. (2017). Averaged-DQN: Variance reduction and stabilization for deep reinforcement learning. In Proceedings of the 34th international conference on machine learning (pp. 176–185).

  30. Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In Proceedings of the 34th international conference on machine learning (pp. 449–458).

  31. Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2018). Distributional reinforcement learning with quantile regression. In Proceedings of the thirty-second AAAI conference on artificial intelligence (pp. 2892–2901).

  32. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (pp. 1995–2003).

  33. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In Proceedings of the 4th international conference on learning representations.

  34. Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M. G., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the thirty-second AAAI conference on artificial intelligence (pp. 3215–3222).

  35. Sosic, A., KhudaBukhsh, W. R., Zoubir, A. M., & Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th conference on autonomous agents and multiagent systems (pp. 1413–1421).

  36. Oh, K.-K., Park, M.-C., & Ahn, H.-S. (2015). A survey of multi-agent formation control. Automatica, 53, 424–440.

    Article  MathSciNet  Google Scholar 

  37. Wang, W., Yang, T., Liu, Y., Hao, J., Hao, X., Hu, Y., Chen, Y., Fan, C., & Gao, Y. (2020). From few to more: Large-scale dynamic multiagent curriculum learning. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 7293–7300).

  38. Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. In Advances in neural information processing systems (pp. 7254–7264).

  39. Subramanian, S. G., Taylor, M. E., Crowley, M., & Poupart, P. (2021). Decentralized mean field games. arXiv preprint arXiv:2112.09099

  40. Wolpert, D. H., & Tumer, K. (2001). Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2–3), 265–280.

    Article  Google Scholar 

  41. Foerster, J. N., de Witt, C. A. S., Farquhar, G., Torr, P. H. S., Boehmer, W., & Whiteson, S. (2018). Multi-agent common knowledge reinforcement learning. arXiv preprint arXiv:1810.11702

  42. Du, Y., Han, L., Fang, M., Liu, J., Dai, T., & Tao, D. (2019). LIIR: Learning individual intrinsic reward in multi-agent reinforcement learning. In Advances in neural information processing systems (pp. 4405–4416).

  43. Son, K., Kim, D., Kang, W. J., Hostallero, D., & Yi, Y. (2019). QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (pp. 5887–5896).

  44. Rashid, T., Farquhar, G., Peng, B., & Whiteson, S. (2020). Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In Advances in neural information processing systems.

  45. Yang, Y., Hao, J., Liao, B., Shao, K., Chen, G., Liu, W., & Tang, H. (2020). Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939

  46. Wang, J., Ren, Z., Liu, T., Yu, Y., & Zhang, C. (2021) QPLEX: Duplex dueling multi-agent Q-learning. In Proceedings of the International Conference on Learning Representations.

  47. Panait, L., Luke, S., & Wiegand, R. P. (2006). Biasing coevolutionary search for optimal multiagent behaviors. IEEE Transactions on Evolutionary Computation, 10(6), 629–645.

    Article  Google Scholar 

  48. Mahajan, A., Rashid, T., Samvelyan, M., & Whiteson, S. (2019). MAVEN: Multi-agent variational exploration. In Advances in neural information processing systems (pp. 7611–7622).

  49. Wang, T., Wang, J., Wu, Y., & Zhang, C. (2020). Influence-based multi-agent exploration. In Proceedings of the 8th international conference on learning representations.

  50. Yoo, B., Ningombam, D. D., Yi, S., Kim, H. W., Chung, E., Han, R., & Song, H. J. (2022). A novel and efficient influence-seeking exploration in deep multiagent reinforcement learning. IEEE Access, 10, 47741–47753.

    Article  Google Scholar 

  51. Pieroth, F. R., Fitch, K., & Belzner, L. (2022). Detecting influence structures in multi-agent reinforcement learning systems.

  52. Hu, S., Zhu, F., Chang, X., & Liang, X. (2021). UPDeT: Universal multi-agent RL via policy decoupling with transformers. In proceedings of the 9th international conference on learning representations.

  53. Zhou, T., Zhang, F., Shao, K., Li, K., Huang, W., Luo, J., Wang, W., Yang, Y., Mao, H., Wang, B., & Li, D. (2021). Cooperative multi-agent transfer learning with level-adaptive credit assignment. arXiv preprint arXiv:2106.00517

  54. Chai, J., Li, W., Zhu, Y., Zhao, D., Ma, Z., Sun, K., & Ding, J. (2021). UNMAS: Multiagent reinforcement learning for unshaped cooperative scenarios. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3105869

    Article  Google Scholar 

  55. Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., & Zhang, C. (2021). RODE: Learning roles to decompose multi-agent tasks. In Proceedings of the international conference on learning representations.

  56. Cao, J., Yuan, L., Wang, J., Zhang, S., Zhang, C., Yu, Y., & Zhan, D. (2021). LINDA: Multi-agent local information decomposition for awareness of teammates. arXiv preprint arXiv:2109.12508

  57. Hao, X., Wang, W., Mao, H., Yang, Y., Li, D., Zheng, Y., Wang, Z., & Hao, J. (2022). API: Boosting multi-agent reinforcement learning via agent-permutation-invariant networks. arXiv preprint arXiv:2203.05285

  58. Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th international conference on autonomous agents and MultiAgent systems (pp. 443–451).

  59. Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems, workshops (pp. 66–83).

  60. David, H., Andrew, D., & Quoc, V. (2016). Hypernetworks. arXiv preprint arXiv 1609.

  61. Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., & Vicente, R. (2017). Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4), 0172395.

    Article  Google Scholar 

  62. Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., & Jordan, M.I. (2018). Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX symposium on operating systems design and implementation (OSDI 18) (pp. 561–577).

  63. Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In Advances in Neural Information Processing Systems (vol. 30, pp. 5279–5288).

  64. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016) Asynchronous methods for deep reinforcement learning. In Proceedings of the 33nd international conference on machine learning (pp. 1928–1937).

  65. Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A. M., & Wu, Y. (2021). The surprising effectiveness of MAPPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955

  66. Sarafian, E., Keynan, S., & Kraus, S. (2021). Recomposing the reinforcement learning building blocks with hypernetworks. In International Conference on Machine Learning (pp. 9301–9312). PMLR.

  67. Fu, J., Kumar, A., Soh, M., & Levine, S. (2019). Diagnosing bottlenecks in deep Q-learning algorithms. In International Conference on Machine Learning (pp. 2021–2030).

  68. Andrychowicz, M., Raichuk, A., Stanczyk, P., Orsini, M., Girgin, S., Marinier, R., Hussenot, L., Geist, M., Pietquin, O., Michalski, M., Gelly, S., & Bachem, O. (2021). What matters for on-policy deep actor-critic methods? A large-scale study. In Proceedings of the 9th international conference on learning representations.

  69. NetEase: Justice Online. (2018). https://n.163.com/index.html, https://www.mmobomb.com/news/netease-looks-to-bring-justice-online-west-this-year

  70. Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., & Dunning, I. (2018). IMPALA: Scalable distributed Deep-RL with importance weighted actor-learner architectures. In International conference on machine learning (pp. 1407–1416).

Download references

Acknowledgements

This work is supported by the Major Program of the National Natural Science Foundation of China(Grant No. 92370132) and the National Key R&D Program of China (Grant No. 2022ZD0116402). Part of this work has taken place in the Intelligent Robot Learning (IRL) Lab at the University of Alberta, which is supported in part by research grants from the Alberta Machine Intelligence Institute (Amii); a Canada CIFAR AI Chair, Amii; Compute Canada; Huawei; Mitacs; and NSERC.

Author information

Authors and Affiliations

Authors

Contributions

TY wrote the main manuscript text. WW and TY prepared all experimental results. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Tianpei Yang or Jianye Hao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: environmental settings

1.1 A.1 StarCraft II

State Description In StarCraft II, we follow the settings of previous works [15, 22]. The local observation of each agent is drawn within their field of view, which encompasses the circular area of the map surrounding units and has a radius equal to the sight range. Each agent receives as input a vector consisting of the following features for all units in its field of view (both allied and enemy): distance, relative x, relative y, and unit type. More details can be found at https://github.com/wwxFromTju/ASN_cloud or https://github.com/oxwhirl/smac.

1.2 A.2 Neural MMO

State Description In a 10 × 10 tile (where each tile can be set as different kinds, e.g., rocks, grass), there are two teams of agents (green and red), each of which has 3 agents. At the beginning of each episode, each agent appears on any of the 10 × 10 tiles. The observation of an agent is in the form of a 43-dimensional vector, in which the first 8 dimensions are: time to live, HP, remaining foods (set 0), remaining water (set 0), current position (x and y), the amount of damage suffered, frozen state (1 or 0); the rest of 35 dimensions are divided equally to describe the other 5 agents’ information. The first 14 dimensions describe the information of 2 teammates, followed by the description of 3 opponents’ information. Each observed agent’s information includes the relative position (x and y), whether it is a teammate (1 or 0), HP, remaining foods, remaining water, and the frozen state.

Each agent chooses an action from a set of 14 discrete actions: stop, move left, right, up or down, and three different attacks (“Melee”, the attack distance is 2, the damage is 5; “Range”, the attack distance is 4, the damage is 2; “Mage”, the attack distance is 10, the damage is 1) against one of the three opponents.

Each agent gets a penalty of \(-0.1\) if the attack fails. They get a \(-0.01\) reward for each tick and a \(-10\) penalty for being killed. The game ends when a group of agents dies or the time exceeds a fixed period, and agents belonging to the same group receive the same reward, which is the difference of the total number of HPs between itself and its opposite side.

Appendix B: Network structure and parameter settings

1.1 B.1 StarCraft II

Network Structure The details of different network structures for StarCraft II are shown in Fig. 19. The vanilla network (Fig. 19a) of each agent i contains two fully-connected hidden layers with 64 units and one GRU layer with 64 units, taking \(o_t^i\) as input. The output layer is a fully-connected layer that outputs the Q-values of each action. The attention network (Fig. 19b) of each agent i contains two isolated fully-connected layers with 64 units, taking \(o_t^i\) as input and computing the standard attention value for each dimension of the input. The following hidden layer is a GRU with 64 units. The output contains the Q-values of each action. The entity-attention network (Fig. 19c) is similar to that in Fig. 19b, except that the attention weight is calculated on each \(o_t^{i,j}\). The dueling network (Fig. 19d) is the same as vanilla except for the output layer that outputs the advantages of each action and also the state value. Our homogeneous ASN (Fig. 19e) of each agent i contains two sub-modules, one is the \(O2A^i\) which contains two fully-connected layers with 32 units, taking \(o_t^i\) as input, followed by a GRU layer with 32 units; the other is a parameter-sharing sub-module which contains two fully-connected layers with 32 units, taking each \(o_t^{i,j}\) as input, following with a GRU layer with 32 units; the output layer outputs the Q-values of each action.

Fig. 19
figure 19

Various network structures on a StartCraft II 8m map

Parameter Settings Here we provide the hyperparameters for StarCraft II as shown in Table 4, more details can be found at https://github.com/wwxFromTju/ASN_cloud.

Table 4 Hyperparameter settings for StarCraft II

1.2 Neural MMO

Network structure The details of vanilla, attention, and entity-attention networks for Neural MMO are shown in Fig. 20a–c which contains an actor network, and a critic network. All actors are similar to those for StarCraft II in Fig. 19, except that the GRU layer is excluded and the output is the logic probability of choosing each action. All critics are the same as shown in Fig. 20a. Since in Neural MMO, each agent has multiple actions that have a direct influence on each other agent, i.e., three kinds of attack actions, we test two kinds of ASN variants: one (Fig. 20d) is the Multi-action ASN we mentioned in the previous section that shares the first layer parameters among multiple actions; the other (Fig. 20e) is the basic homogeneous ASN that does not share the first layer parameters among multiple actions.

Fig. 20
figure 20

Various network structures on Neural MMO

Parameter settings Here we provide the hyperparameters for Neural MMO shown in Table 5.

Table 5 Parameters of all algorithms

Appendix C: Environmental results

The following results present the performance of QMIX-ASN and vanilla QMIX under different StarCraft II maps by adding the manual rule (forbids the agent to choose the invalid actions) (Fig. 21).

Fig. 21
figure 21

Win rates of QMIX-ASN and vanilla QMIX on 5 m and 8 m StarCraft II maps

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, T., Wang, W., Hao, J. et al. ASN: action semantics network for multiagent reinforcement learning. Auton Agent Multi-Agent Syst 37, 45 (2023). https://doi.org/10.1007/s10458-023-09628-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-023-09628-3

Keywords

Navigation