Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3618408.3618551guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Doubly optimal no-regret learning in monotone games

Published: 23 July 2023 Publication History

Abstract

We consider online learning in multiplayer smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow O(1/√T) last-iterate convergence rate to a Nash equilibrium. While the O(1/√T) rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal O(√T) regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal O(1/T) last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an O(log T) individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art O(√T) bound.

References

[1]
Anagnostides, I., Daskalakis, C., Farina, G., Fishelson, M., Golowich, N., and Sandholm, T. Near-optimal no-regret learning for correlated equilibria in multi-player generalsum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2022a.
[2]
Anagnostides, I., Farina, G., Kroer, C., Lee, C.-W., Luo, H., and Sandholm, T. Uncoupled learning dynamics with o(log t) swap regret in multiplayer games. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.
[3]
Anagnostides, I., Panageas, I., Farina, G., and Sandholm, T. On the convergence of no-regret learning dynamics in time-varying games. arXiv preprint arXiv:2301.11241, 2023.
[4]
Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, pp. 214-223. PMLR, July 2017. URL https://proceedings.mlr.press/v70/arjovsky17a.html. ISSN: 2640-3498.
[5]
Bregman, L. and Fokin, I. Methods of Determining Equilibrium Situations in Zero-Sum Polymatrix Games. Optimizatsia, 40(57):70-82, 1987.
[6]
Cai, Y. and Daskalakis, C. On minmax theorems for multiplayer games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms (SODA), 2011.
[7]
Cai, Y. and Zheng, W. Accelerated single-call methods for constrained min-max optimization. International Conference on Learning Representations (ICLR), 2023. To appear.
[8]
Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. Zero-Sum Polymatrix Games: A Generalization of Minmax. Mathematics of Operations Research, 41(2):648-655, May 2016. ISSN 0364-765X. URL https://pubsonline.informs.org/doi/10.1287/moor.2015.0745. Publisher: INFORMS.
[9]
Cai, Y., Oikonomou, A., and Zheng, W. Accelerated algorithms for monotone inclusion and constrained nonconvex-nonconcave min-max optimization. arXiv preprint arXiv:2206.05248, 2022a.
[10]
Cai, Y., Oikonomou, A., and Zheng, W. Finite-time last-iterate convergence for learning in multi-player games. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.
[11]
Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, and Games. Cambridge University Press, 2006.
[12]
Chen, X. and Peng, B. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems (NeurIPS), 33:18990- 18999, 2020.
[13]
Daskalakis, C. and Panageas, I. The limit points of (optimistic) gradient descent in min-max optimization. Advances in neural information processing systems (NeurIPS), 2018.
[14]
Daskalakis, C. and Panageas, I. Last-iterate convergence: Zero-sum games and constrained min-max optimization. In 10th Innovations in Theoretical Computer Science Conference (ITCS), 2019.
[15]
Daskalakis, C. and Papadimitriou, C. H. On a Network Generalization of the Minmax Theorem. In Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II, ICALP '09, pp. 423-434, Berlin, Heidelberg, July 2009. Springer-Verlag. ISBN 978-3-642-02929-5.
[16]
Daskalakis, C., Deckelbaum, A., and Kim, A. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pp. 235-254. SIAM, 2011.
[17]
Daskalakis, C., Ilyas, A., Syrgkanis, V., and Zeng, H. Training gans with optimism. In International Conference on Learning Representations (ICLR), 2018.
[18]
Daskalakis, C., Fishelson, M., and Golowich, N. Nearoptimal no-regret learning in general games. Advances in Neural Information Processing Systems (NeurIPS), 2021.
[19]
Diakonikolas, J. Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities. In Conference on Learning Theory (COLT), 2020.
[20]
Even-Dar, E., Mansour, Y., and Nadav, U. On the convergence of regret minimization dynamics in concave games. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 523-532, 2009.
[21]
Fudenberg, D., Drew, F., Levine, D. K., and Levine, D. K. The theory of learning in games, volume 2. MIT press, 1998.
[22]
Golowich, N., Pattathil, S., and Daskalakis, C. Tight last-iterate convergence rates for no-regret learning in multiplayer games. Advances in neural information processing systems (NeurIPS), 2020a.
[23]
Golowich, N., Pattathil, S., Daskalakis, C., and Ozdaglar, A. Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems. In Conference on Learning Theory (COLT), 2020b.
[24]
Golowich, N., Pattathil, S., Daskalakis, C., and Ozdaglar, A. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems. arXiv:2002.00057 [cs, math, stat], July 2020c. URL http://arxiv.org/abs/2002.00057. arXiv: 2002.00057.
[25]
Halpern, B. Fixed points of nonexpanding maps. Bulletin of the American Mathematical Society, 73(6):957-961, 1967.
[26]
Hsieh, Y.-G., Iutzeler, F., Malick, J., and Mertikopoulos, P. On the convergence of single-call stochastic extragradient methods. Advances in Neural Information Processing Systems (NeurIPS), 2019.
[27]
Hsieh, Y.-G., Antonakopoulos, K., and Mertikopoulos, P. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory, pp. 2388-2422. PMLR, 2021.
[28]
Hsieh, Y.-G., Antonakopoulos, K., Cevher, V., and Mertikopoulos, P. No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. In International Conference on Neural Information Processing Systems (NeurIPS), 2022.
[29]
Jordan, M. I., Lin, T., and Zhou, Z. Adaptive, doubly optimal no-regret learning in games with gradient feedback. Available at SSRN, 2022.
[30]
Kangarshahi, E. A., Hsieh, Y.-P., Sahin, M. F., and Cevher, V. Let's be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML), 2018.
[31]
Korpelevich, G. M. The extragradient method for finding saddle points and other problems. Matecon, 12:747- 756, 1976. URL https://ci.nii.ac.jp/naid/10017556617/.
[32]
Lee, S. and Kim, D. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.
[33]
Lei, Q., Nagarajan, S. G., Panageas, I., et al. Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes. In International Conference on Artificial Intelligence and Statistics, 2021.
[34]
Liang, T. and Stokes, J. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 907-915. PMLR, 2019.
[35]
Lin, T., Zhou, Z., Mertikopoulos, P., and Jordan, M. Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games. In Proceedings of the 37th International Conference on Machine Learning, pp. 6161-6171. PMLR, November 2020. URL https://proceedings.mlr.press/v119/lin20h.html. ISSN: 2640- 3498.
[36]
Lin, T., Zhou, Z., Ba, W., and Zhang, J. Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback, July 2022. URL http://arxiv.org/abs/2112.02856. arXiv:2112.02856 [cs, math].
[37]
Mertikopoulos, P. and Zhou, Z. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173:465-507, 2019.
[38]
Mertikopoulos, P., Papadimitriou, C., and Piliouras, G. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2703-2717. SIAM, 2018.
[39]
Mokhtari, A., Ozdaglar, A., and Pattathil, S. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020a.
[40]
Mokhtari, A., Ozdaglar, A. E., and Pattathil, S. Convergence rate of O(1/k) for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems. SIAM Journal on Optimization, 30(4):3230-3251, 2020b.
[41]
Ouyang, Y. and Xu, Y. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1):1-35, 2021.
[42]
Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., et al. Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623): 990-996, 2022.
[43]
Popov, L. D. A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28(5):845-848, 1980. Publisher: Springer.
[44]
Rakhlin, S. and Sridharan, K. Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 2013.
[45]
Rosen, J. B. Existence and Uniqueness of Equilibrium Points for Concave N-Person Games. Econometrica, 33(3):520-534, 1965. ISSN 0012-9682. URL https://www.jstor.org/stable/1911749. Publisher: [Wiley, Econometric Society].
[46]
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature, 550(7676):354-359, 2017.
[47]
Sorin, S. A first course on zero-sum repeated games, volume 37. Springer Science & Business Media, 2002.
[48]
Syrgkanis, V., Agarwal, A., Luo, H., and Schapire, R. E. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems (NeurIPS), 2015.
[49]
Tran-Dinh, Q. The connection between nesterov's accelerated methods and halpern fixed-point iterations. arXiv preprint arXiv:2203.04869, 2022.
[50]
Tseng, P. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1):237-252, June 1995. ISSN 0377-0427. URL https://www.sciencedirect.com/science/article/pii/037704279400094H.
[51]
Viossat, Y. and Zapechelnyuk, A. No-regret Dynamics and Fictitious Play. Journal of Economic Theory, 148(2): 825-842, March 2013. ISSN 00220531. URL http://arxiv.org/abs/1207.0660. arXiv: 1207.0660.
[52]
Wei, C.-Y., Lee, C.-W., Zhang, M., and Luo, H. Linear last-iterate convergence in constrained saddle-point optimization. In International Conference on Learning Representations (ICLR), 2021.
[53]
Weibull, J. W. Evolutionary game theory. MIT press, 1997.
[54]
Yoon, T. and Ryu, E. K. Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm. In International Conference on Machine Learning (ICML), pp. 12098-12109. PMLR, 2021.
[55]
Zhou, Z., Mertikopoulos, P., Bambos, N., Glynn, P. W., and Tomlin, C. Countering feedback delays in multi-agent learning. Advances in Neural Information Processing Systems, 30, 2017a.
[56]
Zhou, Z., Mertikopoulos, P., Moustakas, A. L., Bambos, N., and Glynn, P. Mirror descent learning in continuous games. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 5776-5783, December 2017b.
[57]
Zhou, Z., Mertikopoulos, P., Athey, S., Bambos, N., Glynn, P. W., and Ye, Y. Learning in games with lossy feedback. Advances in Neural Information Processing Systems, 31, 2018.
[58]
Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S. P., and Glynn, P. W. On the Convergence of Mirror Descent beyond Stochastic Convex Programming. SIAM Journal on Optimization, 30(1):687-716, January 2020. ISSN 1052-6234. URL https://epubs.siam.org/doi/abs/10.1137/17M1134925. Publisher: Society for Industrial and Applied Mathematics.
[59]
Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (ICML), 2003.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media