Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-92916-9_5guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Reinforcement Learning for Modeling and Capturing the Effect of Partner Selection Strategies on the Emergence of Cooperation

Published: 21 September 2021 Publication History

Abstract

In this research we study the statistical mechanics of cooperation through a simple case of aspiration-driven dynamics in structured populations with mixed strategies. Comparing to the existing literature, we define a pool of possible behaviors for the agents based on the bandits learning algorithms and we highlight settings of the Iterated Prisoner’s Dilemma Game which may have positive influence on the emergence of cooperation from the aspect of both the entire population and the individual players. We present the level of cooperation and its variation in terms of the median (M) and the interquartile range (IQ) in accordance to the observed topological characteristics of the network structures and partner selection strategies. Our experimental results show that regardless of the underlying network structures, it is difficult to maintain a fully cooperative society in the shade of Random and Epsilon Greedy partner selection strategies. The reported Median values are the lowest and the changes in the IQRs do not follow a sharp increase or decrease in both strategies. Contrary to this, it will even take a shorter time to see a fully cooperative population though UCB, Epsilon First and Epsilon Decreasing strategies. Our observation with respect to different network structures also shows that, considering a certain level of heterogeneity both in terms of distance to others as well as clustering coefficient is more conductive in the spread of cooperative behavior among a networked population.

References

[1]
Eriksson, K., Strimling, P.: The hard problem of cooperation. PLoS ONE 77(2), e40325 (2012)
[2]
Axelrod R and Hamilton WD The evolution of cooperation Science 1981 211 4489 1390-1396
[3]
Flood MM Some experimental game Manage. Sci. 1958 5 1 5-26
[4]
Du J, Wu B, Altrock PM, and Wang L Aspiration dynamics of multi-player games in finite populations J. R. Soc. Interface 2014 11 94 20140077
[5]
Gintis, H.: Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction, 2nd edn, REV - Revised, Princeton University Press, Princeton (2009)
[6]
Traulsen, A., Claussen, J.C., Hauert, C.: Coevolutionary dynamics: from finite to infinite populations. Phys. Rev. Lett. 95(23), 238701 (2005)
[7]
Wu, B., Altrock, P.M., Wang, L., Traulsen, A.: Universality of weak selection. Phys. Rev. E 82(4), 046106 (2010)
[8]
Szabó G and Fáth G Evolutionary games on graphs Phys. Rep. 2007 446 4 97-216
[9]
Fudenberg D and Imhof LA Imitation processes with small mutations J. Econ. Theory 2006 131 1 251-262
[10]
Chen, X,. Wang, L.: Promotion of cooperation induced by appropriate payoff aspirations in a small-world networked game. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 77(1) Pt 2, 17103 (2008)
[11]
Perc, M., Wang, Z.: heterogeneous aspirations promote cooperation in the prisoner's dilemma game. PLoS ONE 5(12), e15117 (2010)
[12]
Zhang L, Huang C, Li H, and Dai Q Aspiration-dependent strategy persistence promotes cooperation in spatial prisoner's dilemma game EPL 2019 126 1 18001
[13]
Liu Y, Chen X, Wang L, Li B, Zhang W, and Wang H Aspiration-based learning promotes cooperation in spatial prisoners dilemma games EPL (Europhys. Lett.) 2011 94 6 60002
[14]
Anastassacos N, Hailes S, and Musolesi M Partner Selection for the emergence of cooperation in multi-agent systems using reinforcement learning Proc. AAAI Conf. Artif. Intell. 2020 34 05 7047-7054
[15]
Santos FC, Santos MD, and Pacheco JM Social diversity promotes the emergence of cooperation in public goods games Nature 2008 454 7201 213-216
[16]
Zheng, H., Jiang, J., Wei, P., Long, G., Zhang, C.: Competitive and cooperative heterogeneous deep reinforcement learning. In: An, B., Yorke-Smith, N., El Fallah Seghrouchni, A., Sukthankar, G. (eds.) Proceedings of the 19th International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2020), 9–13 May 2020, pp. 1656–1664. Auckland, New Zealand (2020)
[17]
Fujimoto, S., Hoof, H.V., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv:1802.09477 [cs, stat] (2018)
[18]
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290 [cs, stat] (2018)
[19]
Lüders B, Schläger M, Korach A, and Risi S Squillero G and Sim K Continual and one-shot learning through neural networks with dynamic external memory Applications of Evolutionary Computation 2017 Cham Springer 886-901
[20]
Watkins, C.J.C.H.: Learning from Delayed Rewards. University of Cambridge, Cambridge (1989)
[21]
Caelen O and Bontempi G Maniezzo V, Battiti R, and Watson J-P Improving the exploration strategy in bandit algorithms Learning and Intelligent Optimization 2008 Heidelberg Springer 56-68
[22]
Auer P, Cesa-Bianchi N, Freund Y, and Schapire RE The nonstochastic multiarmed bandit problem SIAM J. Comput. 2002 32 1 48-77
[23]
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2003)
[24]
Chu, W., Li, L., Reyzin, L., Schapire, R.: Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208–214 (2011)
[25]
Lin, B., Bouneffouf, D., Cecchi, G.: Online learning in iterated prisoner's dilemma to mimic human behavior. arXiv:2006.06580 [cs, q-bio] (2020)
[26]
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, p. 342. MIT Press, Cambridge (1998)
[27]
Ezaki, T., Horita, Y., Takezawa, M., Masuda, N.: Reinforcement learning explains conditional .cooperation and its moody cousin. PLOS Comput. Biol. 12(7), e1005034 (2016)
[28]
Takes FW and Kosters WA Computing the eccentricity distribution of large graphs Algorithms 2013 6 1 100-118
[29]
Rand DG, Nowak MA, Fowler JH, and Christakis NA Static network structure can stabilize human cooperation Proc. Natl. Acad. Sci. 2014 111 48 17093
[30]
Maciejewski, W., Fu, F., Hauert, C.: Evolutionary game dynamics in populations with heterogenous structures. PLoS Comput. Biol. 10(4), e1003567 (2014)
[31]
Ge X, Li H, and Li L Pan J-S, Lin JC-W, Sui B, and Tseng S-P Effects of centrality and heterogeneity on evolutionary games Genetic and Evolutionary Computing 2019 Singapore Springer 51-63
[32]
Koohborfardhaghighi, S., Altmann, J.: How structural changes in complex networks impact organizational learning performance (No. 2014111). Seoul National University; Technology Management, Economics, and Policy Program (2014)
[33]
Koohborfardhaghighi, S., Romero, J.P., Maliphol, S., Liu, Y., Altmann, J.: How bounded rationality of individuals in social interactions impacts evolutionary dynamics of cooperation. In: Proceedings of the International Conference on Web Intelligence, pp. 381–388 (2017)
[34]
Koohborfardhaghighi S and Altmann J Ślȩzak D, Schaefer G, Vuong ST, and Kim Y-S How variability in individual patterns of behavior changes the structural properties of networks Active Media Technology 2014 Cham Springer 49-60

Index Terms

  1. Reinforcement Learning for Modeling and Capturing the Effect of Partner Selection Strategies on the Emergence of Cooperation
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Economics of Grids, Clouds, Systems, and Services: 18th International Conference, GECON 2021, Virtual Event, September 21–23, 2021, Proceedings
      Sep 2021
      235 pages
      ISBN:978-3-030-92915-2
      DOI:10.1007/978-3-030-92916-9

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 21 September 2021

      Author Tags

      1. Iterated prisoner’s dilemma
      2. Cooperation
      3. Reinforcement learning
      4. Network structure
      5. Network measures
      6. Agent-based modeling and simulation
      7. Bandits learning algorithms

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media