Abstract
This paper explores the theoretical basis of the covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint.
To establish a theoretical foundation for the CMA-ES, we focus on a geometric structure of a Riemannian manifold of probability distributions equipped with the Fisher metric. We define a function on the manifold which is the expectation of fitness over the sampling distribution, and regard the goal of update of the parameters of sampling distribution in the CMA-ES as maximization of the expected fitness. We investigate the steepest ascent learning for the expected fitness maximization, where the steepest ascent direction is given by the natural gradient, which is the product of the inverse of the Fisher information matrix and the conventional gradient of the function.
Our first result is that we can obtain under some types of parameterization of multivariate normal distribution the natural gradient of the expected fitness without the need for inversion of the Fisher information matrix. We find that the update of the distribution parameters in the CMA-ES is the same as natural gradient learning for expected fitness maximization. Our second result is that we derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness. We see from the close relation between the CMA-ES and natural gradient learning that the default setting of learning rates in the CMA-ES seems suitable in terms of monotone improvement in expected fitness. Then, we discuss the relation to the expectation-maximization framework and provide an information geometric interpretation of the CMA-ES.
Similar content being viewed by others
References
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi S.: Theoretical analysis of evolutionary computation on continuously differentiable functions. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2010, pp. 1401–1408 (2010)
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature—PPSN XI, pp. 154–163. Springer, Berlin (2010)
Akimoto, Y., Sakuma, J., Ono, I., Kobayashi, S.: Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation—GECCO ’08, pp. 479–486 (2008)
Amari, S.: Information geometry of the EM and em algorithms for neural networks. Neural Netw. 8(9), 1379–1408 (1995)
Amari S.i.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Amari S.i., Douglas S.: Why natural gradient? In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 1213–1216 (1998)
Amari S.i., Nagaoka H.: Methods of Information Geometry. American Mathematical Society, Providence (2007)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley-Interscience, New York (1995)
Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Comput. 9(2), 271–278 (1997)
Dempster, A., Laird, N.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
Evans, M., Swartz, T.: Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press, New York (2000)
Glasmachers, T., Schaul, T., Yi, S., Wierstra, D., Schmidhuber, J.: Exponential natural evolution strategies. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 393–400 (2010)
Hansen, N.: The CMA Evolution Strategy: A Comparing Review, pp. 75–102. Springer, Berlin (2006)
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, Berlin (2008)
Jastrebski, G., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 9719–9726 (2006)
Kay, S.M.: Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, New York (1993)
Kober, J., Peters, J.: Policy search for motor primitives in robotics. Adv. Neural Inf. Process. Syst. 22, 1–8 (2009)
Kullback, S.: Information Theory and Statistics. Wiley, New York (1959)
Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic, Norwell (2002)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, vol. 89, pp. 355–368 (1998)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)
Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008)
Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient natural evolution strategies. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation—GECCO ’09, pp. 539–545 (2009)
Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Stochastic search using the natural gradient. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1161–1168 (2009)
Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness expectation maximization. In: Parallel Problem Solving from Nature—PPSN X, pp. 337–346. Springer, Berlin (2008)
Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: IEEE Congress on Evolutionary Computation, pp. 3381–3387 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Akimoto, Y., Nagata, Y., Ono, I. et al. Theoretical Foundation for CMA-ES from Information Geometry Perspective. Algorithmica 64, 698–716 (2012). https://doi.org/10.1007/s00453-011-9564-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-011-9564-8