research-article

Doubly optimal no-regret learning in monotone games

AUTHORs:

Weiqiang ZhengAuthors Info & Claims

ICML'23: Proceedings of the 40th International Conference on Machine Learning

Article No.: 143, Pages 3507 - 3524

Published: 23 July 2023 Publication History

Abstract

We consider online learning in multiplayer smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow O(1/√T) last-iterate convergence rate to a Nash equilibrium. While the O(1/√T) rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal O(√T) regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal O(1/T) last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an O(log T) individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art O(√T) bound.

References

[1]

Anagnostides, I., Daskalakis, C., Farina, G., Fishelson, M., Golowich, N., and Sandholm, T. Near-optimal no-regret learning for correlated equilibria in multi-player generalsum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2022a.

Digital Library

[2]

Anagnostides, I., Farina, G., Kroer, C., Lee, C.-W., Luo, H., and Sandholm, T. Uncoupled learning dynamics with o(log t) swap regret in multiplayer games. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.

[3]

Anagnostides, I., Panageas, I., Farina, G., and Sandholm, T. On the convergence of no-regret learning dynamics in time-varying games. arXiv preprint arXiv:2301.11241, 2023.

[4]

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, pp. 214-223. PMLR, July 2017. URL https://proceedings.mlr.press/v70/arjovsky17a.html. ISSN: 2640-3498.

[5]

Bregman, L. and Fokin, I. Methods of Determining Equilibrium Situations in Zero-Sum Polymatrix Games. Optimizatsia, 40(57):70-82, 1987.

[6]

Cai, Y. and Daskalakis, C. On minmax theorems for multiplayer games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms (SODA), 2011.

Digital Library

[7]

Cai, Y. and Zheng, W. Accelerated single-call methods for constrained min-max optimization. International Conference on Learning Representations (ICLR), 2023. To appear.

[8]

Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. Zero-Sum Polymatrix Games: A Generalization of Minmax. Mathematics of Operations Research, 41(2):648-655, May 2016. ISSN 0364-765X. URL https://pubsonline.informs.org/doi/10.1287/moor.2015.0745. Publisher: INFORMS.

[9]

Cai, Y., Oikonomou, A., and Zheng, W. Accelerated algorithms for monotone inclusion and constrained nonconvex-nonconcave min-max optimization. arXiv preprint arXiv:2206.05248, 2022a.

[10]

Cai, Y., Oikonomou, A., and Zheng, W. Finite-time last-iterate convergence for learning in multi-player games. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.

[11]

Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, and Games. Cambridge University Press, 2006.

[12]

Chen, X. and Peng, B. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems (NeurIPS), 33:18990- 18999, 2020.

[13]

Daskalakis, C. and Panageas, I. The limit points of (optimistic) gradient descent in min-max optimization. Advances in neural information processing systems (NeurIPS), 2018.

[14]

Daskalakis, C. and Panageas, I. Last-iterate convergence: Zero-sum games and constrained min-max optimization. In 10th Innovations in Theoretical Computer Science Conference (ITCS), 2019.

[15]

Daskalakis, C. and Papadimitriou, C. H. On a Network Generalization of the Minmax Theorem. In Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II, ICALP '09, pp. 423-434, Berlin, Heidelberg, July 2009. Springer-Verlag. ISBN 978-3-642-02929-5.

Digital Library

[16]

Daskalakis, C., Deckelbaum, A., and Kim, A. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pp. 235-254. SIAM, 2011.

[17]

Daskalakis, C., Ilyas, A., Syrgkanis, V., and Zeng, H. Training gans with optimism. In International Conference on Learning Representations (ICLR), 2018.

[18]

Daskalakis, C., Fishelson, M., and Golowich, N. Nearoptimal no-regret learning in general games. Advances in Neural Information Processing Systems (NeurIPS), 2021.

[19]

Diakonikolas, J. Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities. In Conference on Learning Theory (COLT), 2020.

[20]

Even-Dar, E., Mansour, Y., and Nadav, U. On the convergence of regret minimization dynamics in concave games. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 523-532, 2009.

Digital Library

[21]

Fudenberg, D., Drew, F., Levine, D. K., and Levine, D. K. The theory of learning in games, volume 2. MIT press, 1998.

[22]

Golowich, N., Pattathil, S., and Daskalakis, C. Tight last-iterate convergence rates for no-regret learning in multiplayer games. Advances in neural information processing systems (NeurIPS), 2020a.

[23]

Golowich, N., Pattathil, S., Daskalakis, C., and Ozdaglar, A. Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems. In Conference on Learning Theory (COLT), 2020b.

[24]

Golowich, N., Pattathil, S., Daskalakis, C., and Ozdaglar, A. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems. arXiv:2002.00057 [cs, math, stat], July 2020c. URL http://arxiv.org/abs/2002.00057. arXiv: 2002.00057.

[25]

Halpern, B. Fixed points of nonexpanding maps. Bulletin of the American Mathematical Society, 73(6):957-961, 1967.

[26]

Hsieh, Y.-G., Iutzeler, F., Malick, J., and Mertikopoulos, P. On the convergence of single-call stochastic extragradient methods. Advances in Neural Information Processing Systems (NeurIPS), 2019.

[27]

Hsieh, Y.-G., Antonakopoulos, K., and Mertikopoulos, P. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory, pp. 2388-2422. PMLR, 2021.

[28]

Hsieh, Y.-G., Antonakopoulos, K., Cevher, V., and Mertikopoulos, P. No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. In International Conference on Neural Information Processing Systems (NeurIPS), 2022.

[29]

Jordan, M. I., Lin, T., and Zhou, Z. Adaptive, doubly optimal no-regret learning in games with gradient feedback. Available at SSRN, 2022.

[30]

Kangarshahi, E. A., Hsieh, Y.-P., Sahin, M. F., and Cevher, V. Let's be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML), 2018.

[31]

Korpelevich, G. M. The extragradient method for finding saddle points and other problems. Matecon, 12:747- 756, 1976. URL https://ci.nii.ac.jp/naid/10017556617/.

[32]

Lee, S. and Kim, D. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.

[33]

Lei, Q., Nagarajan, S. G., Panageas, I., et al. Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes. In International Conference on Artificial Intelligence and Statistics, 2021.

[34]

Liang, T. and Stokes, J. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 907-915. PMLR, 2019.

[35]

Lin, T., Zhou, Z., Mertikopoulos, P., and Jordan, M. Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games. In Proceedings of the 37th International Conference on Machine Learning, pp. 6161-6171. PMLR, November 2020. URL https://proceedings.mlr.press/v119/lin20h.html. ISSN: 2640- 3498.

Digital Library

[36]

Lin, T., Zhou, Z., Ba, W., and Zhang, J. Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback, July 2022. URL http://arxiv.org/abs/2112.02856. arXiv:2112.02856 [cs, math].

[37]

Mertikopoulos, P. and Zhou, Z. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173:465-507, 2019.

Digital Library

[38]

Mertikopoulos, P., Papadimitriou, C., and Piliouras, G. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2703-2717. SIAM, 2018.

[39]

Mokhtari, A., Ozdaglar, A., and Pattathil, S. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020a.

[40]

Mokhtari, A., Ozdaglar, A. E., and Pattathil, S. Convergence rate of O(1/k) for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems. SIAM Journal on Optimization, 30(4):3230-3251, 2020b.

Digital Library

[41]

Ouyang, Y. and Xu, Y. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1):1-35, 2021.

Digital Library

[42]

Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., et al. Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623): 990-996, 2022.

[43]

Popov, L. D. A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28(5):845-848, 1980. Publisher: Springer.

[44]

Rakhlin, S. and Sridharan, K. Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 2013.

[45]

Rosen, J. B. Existence and Uniqueness of Equilibrium Points for Concave N-Person Games. Econometrica, 33(3):520-534, 1965. ISSN 0012-9682. URL https://www.jstor.org/stable/1911749. Publisher: [Wiley, Econometric Society].

[46]

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature, 550(7676):354-359, 2017.

[47]

Sorin, S. A first course on zero-sum repeated games, volume 37. Springer Science & Business Media, 2002.

[48]

Syrgkanis, V., Agarwal, A., Luo, H., and Schapire, R. E. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems (NeurIPS), 2015.

[49]

Tran-Dinh, Q. The connection between nesterov's accelerated methods and halpern fixed-point iterations. arXiv preprint arXiv:2203.04869, 2022.

[50]

Tseng, P. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1):237-252, June 1995. ISSN 0377-0427. URL https://www.sciencedirect.com/science/article/pii/037704279400094H.

Digital Library

[51]

Viossat, Y. and Zapechelnyuk, A. No-regret Dynamics and Fictitious Play. Journal of Economic Theory, 148(2): 825-842, March 2013. ISSN 00220531. URL http://arxiv.org/abs/1207.0660. arXiv: 1207.0660.

[52]

Wei, C.-Y., Lee, C.-W., Zhang, M., and Luo, H. Linear last-iterate convergence in constrained saddle-point optimization. In International Conference on Learning Representations (ICLR), 2021.

[53]

Weibull, J. W. Evolutionary game theory. MIT press, 1997.

[54]

Yoon, T. and Ryu, E. K. Accelerated algorithms for smooth convex-concave minimax problems with O(1/k²) rate on squared gradient norm. In International Conference on Machine Learning (ICML), pp. 12098-12109. PMLR, 2021.

[55]

Zhou, Z., Mertikopoulos, P., Bambos, N., Glynn, P. W., and Tomlin, C. Countering feedback delays in multi-agent learning. Advances in Neural Information Processing Systems, 30, 2017a.

[56]

Zhou, Z., Mertikopoulos, P., Moustakas, A. L., Bambos, N., and Glynn, P. Mirror descent learning in continuous games. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 5776-5783, December 2017b.

Digital Library

[57]

Zhou, Z., Mertikopoulos, P., Athey, S., Bambos, N., Glynn, P. W., and Ye, Y. Learning in games with lossy feedback. Advances in Neural Information Processing Systems, 31, 2018.

[58]

Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S. P., and Glynn, P. W. On the Convergence of Mirror Descent beyond Stochastic Convex Programming. SIAM Journal on Optimization, 30(1):687-716, January 2020. ISSN 1052-6234. URL https://epubs.siam.org/doi/abs/10.1137/17M1134925. Publisher: Society for Industrial and Applied Mathematics.

Digital Library

[59]

Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (ICML), 2003.

Recommendations

Near-optimal no-regret learning in general games
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

We show that Optimistic Hedge - a common variant of multiplicative-weights-updates with recency bias - attains poly(logT) regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update ...
Robust No-Regret Learning in Min-Max Stackelberg Games
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first ...
The Effect of Regret on Optimal Bidding in Auctions

<P>While most models of auctions and competitive bidding assume that each bidder's utility for an outcome depends only on his own profit, we allow the utility to also depend on any regret that a bidder suffers after the fact and characterize when and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'23: Proceedings of the 40th International Conference on Machine Learning

July 2023

43479 pages

Copyright © 2023.

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents