article

The Estimation of Distributions and the Minimum Relative Entropy Principle

Authors:

Heinz Mühlenbein,

Robin HönsAuthors Info & Claims

Evolutionary Computation, Volume 13, Issue 1

Pages 1 - 27

https://doi.org/10.1162/1063656053583469

Published: 01 January 2005 Publication History

Abstract

Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper we explain the relationship of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence KLD between the unknown distribution p(x) and a class q(x) of approximations. However, the Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize KLD(q||p), the methods which learn the approximate model from data minimize KLD(p||q). This minimization is identical to maximizing the log-likelihood. In the paper three classes of algorithms are discussed. FDA uses the ADF to compute an approximate factorization of the unknown distribution. The factors are marginal distributions, whose values are computed from samples. The second class is represented by the Bethe-Kikuchi approach which has recently been rediscovered in statistical physics. Here the values of the marginals are computed from a difficult constrained minimization problem. The third class learns the factorization from the data. We analyze our learning algorithm LFDA in detail. It is shown that learning is faced with two problems: first, to detect the important dependencies be- tween the variables, and second, to create an acyclic Bayesian network of bounded clique size.

References

[1]

Almond, R. G. (1995). Graphical Belief Modelling. Chapman & Hall, London.

[2]

Bertelé, U. and Brioschi, F. (1972). Nonserial Dynamic Programming. Academic Press, New York.

[3]

Bethe, H. (1935). Statistical theory of superlattices. Proc. Roy. Soc. London A, 150:552-558.

[4]

Cover, T. M. and Thomas, J. (1989). Elements of Information Theory. Wiley, New York.

[5]

Csiszár, I. (1975). I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3:146-158.

[6]

Huang, C. and Darwiche, A. (1996). Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning, 15(3):225-263.

[7]

Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev, 6:620-643.

[8]

Jaynes, E. T. (1978). Where do we stand on maximum entropy? In Levine, R. D. and Tribus, M., editors, The Maximum Entropy Formalism. MIT Press, Cambridge.

[9]

Jensen, F. V. and Jensen, F. (1994). Optimal junction trees. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 360-366, Seattle.

[10]

Jordan, M. I. (1999). Learning in Graphical Models. MIT Press, Cambrigde.

[11]

Kikuchi, R. (1951). A theory of cooperative phenomena. Phys.Review, 115:988-1003.

[12]

Kullback, S. (1968). Probability densities with given marginals. Annals of Mathematical Statistics, 39(4):1236-1243.

[13]

Larrañaga, P. and Lozano, J. (2002). Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization. Kluwer Academic Press, Boston.

[14]

Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.

[15]

Lin, S. and Kernighan, B. (1973). An effective heuristic algorithm for the traveling-salesman. Operations Research, 21:498-516.

[16]

Martelli, A. and Montari, U. (1972). Nonserial dynamic programming: On the optimal strategy of variable elimination for the rectangular lattice. Jour. Math. Analyis and Applications, 40:226-242.

[17]

Mühlenbein, H. and Mahnig, T. (1999). FDA - a scalable evolutionary algorithm for the optimization of additively decomposed functions. Evolutionary Computation, 7(4):353-376.

[18]

Mühlenbein, H. and Mahnig, T. (2000). Evolutionary algorithms: From recombination to search distributions. In Kallel, L., Naudts, B., and Rogers, A., editors, Theoretical Aspects of Evolutionary Computing, Natural Computing, pages 137-176. Springer Verlag, Berlin.

[19]

Mühlenbein, H. and Mahnig, T. (2001). Evolutionary computation and beyond. In Uesaka, Y., Kanerva, P., and Asoh, H., editors, Foundations of Real-World Intelligence, pages 123-188. CSLI Publications, Stanford, California.

[20]

Mühlenbein, H. and Mahnig, T. (2002a). Evolutionary optimization and the estimation of search distributions with applications to graph bipartitioning. Journal of Approximate Reasoning, 31(3):157-192.

[21]

Mühlenbein, H. and Mahnig, T. (2002b). Mathematical analysis of evolutionary algorithms. In Ribeiro, C. C. and Hansen, R, editors, Essays and Surveys in Metaheuristics , Operations Research/Computer Science Interface Series, pages 525-556. Kluwer Academic Publisher, Norwell.

[22]

Mühlenbein, H. and Mahnig, T. (2003). Evolutionary algorithms and the Boltzmann distribution. In Jong, K. D., Poli, R., and Rowe, J. C., editors, Foundations of Genetic Algorithms 7, pages 525-556. Morgan Kaufmann Publishers, San Francisco.

[23]

Mühlenbein, H., Mahnig, T., and Ochoa, A. (1999). Schemata, distributions and graphical models in evolutionary optimization. Journal of Heuristics, 5(2):213-247.

[24]

Mühlenbein, H. and Paaß, G. (1996). From recombination of genes to the estimation of distributions I. binary parameters. In Voigt, H.-M., Ebeling, W., Rechenberg, I., and Schwefel, H.-P, editors, Lecture Notes in Computer Science 1141: Parallel Problem Solving from Nature - PPSN IV), pages 178-187, Springer Verlag, Berlin.

[25]

Ochoa, A., Höns, R., Soto, M., and Mühlenbein, H. (2003). A maximum entropy approach to sampling in EDA - the singly connected case. In Proceedings of the 8th Iberoamerican Congress on Pattern Recognition (CIARP 2003), volume 2905 of Lecture Notes in Computer Science, pages 683-690. Springer Verlag, Berlin.

[26]

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461-464.

[27]

Xiang, Y., Wong, S., and Cercone, N. (1997). A 'microscopic' study of minimum entropy search in learning decomposable markov networks. Machine Learning, 26:65-92.

[28]

Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2001). Understanding belief propagation and its generalizations. Technical Report 2001-22, Mitsubishi Electric Research Laboratories.

[29]

Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2004). Constructing free energy approximations and generalized belief propagation algorithms. Technical Report 2004-40, Mitsubishi Electric Research Laboratories.

Cited By

Sukla ANaskar AGoel TSangwan SRai AShakir MVerma IDasgupta TDey L(2021)Concept Driven Search and Visualization System for Exploring Scientific RepositoriesProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3430991(395-399)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3430984.3430991
Wei MSun B(2017)Bi-level programming model for multi-modal regional bus timetable and vehicle dispatch with stochastic travel timeCluster Computing10.1007/s10586-016-0719-x20:1(401-411)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10586-016-0719-x
Iclanzan DMoore J(2012)Higher-order linkage learning in the ECGAProceedings of the 14th annual conference on Genetic and evolutionary computation10.1145/2330163.2330202(265-272)Online publication date: 7-Jul-2012
https://dl.acm.org/doi/10.1145/2330163.2330202
Show More Cited By

Recommendations

Approximate factorizations of distributions and the minimum relative entropy principle
GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper the major design issues of EDA's are discussed within a general interdisciplinary framework, the maximum entropy approximation. Our EDA ...
Probability distributions and the maximum entropy principle
Abstract
It is shown that every probability distribution with finite entropy can be characterized as the minimum relative entropy distribution respect to a given non-negative function within a non-trivial collection of probability ...
MUTUAL INFORMATION, METRIC ENTROPY, AND RISK IN ESTIMATION OF PROBABILITY DISTRIBUTIONS

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Evolutionary Computation

Evolutionary Computation Volume 13, Issue 1

January 2005

144 pages

ISSN:1063-6560

EISSN:1530-9304

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 January 2005

Published in EVOL Volume 13, Issue 1

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
138
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sukla ANaskar AGoel TSangwan SRai AShakir MVerma IDasgupta TDey L(2021)Concept Driven Search and Visualization System for Exploring Scientific RepositoriesProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3430991(395-399)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3430984.3430991
Wei MSun B(2017)Bi-level programming model for multi-modal regional bus timetable and vehicle dispatch with stochastic travel timeCluster Computing10.1007/s10586-016-0719-x20:1(401-411)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10586-016-0719-x
Iclanzan DMoore J(2012)Higher-order linkage learning in the ECGAProceedings of the 14th annual conference on Genetic and evolutionary computation10.1145/2330163.2330202(265-272)Online publication date: 7-Jul-2012
https://dl.acm.org/doi/10.1145/2330163.2330202
Teytaud OLanzi P(2011)Comparison-based complexity of multiobjective optimizationProceedings of the 13th annual conference on Genetic and evolutionary computation10.1145/2001576.2001685(801-806)Online publication date: 12-Jul-2011
https://dl.acm.org/doi/10.1145/2001576.2001685
Iclanzan DLanzi P(2011)Hierarchical allelic pairwise independent functionsProceedings of the 13th annual conference on Genetic and evolutionary computation10.1145/2001576.2001663(633-640)Online publication date: 12-Jul-2011
https://dl.acm.org/doi/10.1145/2001576.2001663
Chen YChen C(2010)Enabling the extended compact genetic algorithm for real-parameter optimization by using adaptive discretizationEvolutionary Computation10.1162/evco.2010.18.2.1820218:2(199-228)Online publication date: 1-Jun-2010
https://dl.acm.org/doi/10.1162/evco.2010.18.2.18202
Chuang CChen Y(2010)Sensibility of linkage information and effectiveness of estimated distributionsEvolutionary Computation10.1162/EVCO_a_0001018:4(547-579)Online publication date: 1-Dec-2010
https://dl.acm.org/doi/10.1162/EVCO_a_00010
Chuang CHsu WPelikan MBranke J(2010)Multivariate multi-model approach for globally multimodal problemsProceedings of the 12th annual conference on Genetic and evolutionary computation10.1145/1830483.1830544(311-318)Online publication date: 7-Jul-2010
https://dl.acm.org/doi/10.1145/1830483.1830544
Li SLi FMei Z(2010)A hybrid evolutionary algorithm based on alopex and estimation of distribution algorithm and its application for optimizationProceedings of the First international conference on Advances in Swarm Intelligence - Volume Part I10.1007/978-3-642-13495-1_67(549-557)Online publication date: 12-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-13495-1_67
Carpentieri M(2009)On multivariate genetic systemsProceedings of the 18th international conference on Fuzzy Systems10.5555/1717561.1717568(35-40)Online publication date: 20-Aug-2009
https://dl.acm.org/doi/10.5555/1717561.1717568
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents