Estimation of Distribution Algorithms in Machine Learning: A Survey
Pages 1301 - 1321
Abstract
The automatic induction of machine learning models capable of addressing supervised learning, feature selection, clustering, and reinforcement learning problems requires sophisticated intelligent search procedures. These searches are usually performed in the possible model structure spaces, leading to combinatorial optimization problems, and in the parameter spaces, where it is necessary to solve continuous optimization problems. This article reviews how the estimation of distribution algorithms, a kind of evolutionary algorithm, can be used to address these problems. Topics include preprocessing, mining association rules, selecting variables, searching for the optimal supervised learning model (both probabilistic and nonprobabilistic models), finding the best hierarchical, partitional, or probabilistic clustering, obtaining the optimal policy in reinforcement learning, and performing inference and structural learning in Bayesian networks for association discovery. Interesting guidelines for future work in this area are also provided.
References
[1]
Y. Saeys, I. Inza, and P. Larra naga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007.
[2]
H. Sharp, “Cardinality of finite topologies,” J. Combinatorial Theory, vol. 5, pp. 82–86, Jul. 1968.
[3]
R. W. Robinson, “Counting unlabeled acyclic digraphs,” in Combinatorial Mathematics (Lecture Notes in Mathematics), vol. 622. Heidelberg, Germany: Springer, 1997, pp. 28–42.
[4]
R. E. Tarjan and M. Yannakakis, “Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs,” SIAM J. Comput., vol. 13, no. 3, pp. 566–578, 1984.
[5]
T. Minka, “Algorithms for maximum-likelihood logistic regression,” Carnegie Mellon Univ., Pittsburgh, PA, USA, Rep. 758, 2001.
[6]
D. E. Rumelhart, G. R. Hinton, and R. J. Willians, “Learning representations by back-propagation errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
[7]
J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, MI, USA: Univ. Michigan Press, 1975.
[8]
D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Boston, MA, USA: Wesley, 1989.
[9]
M. Dorigo and T. Stützle, Ant Colony Optimization. Cambridge, MA, USA: MIT Press, 2004.
[10]
R. C. Eberhart and J. Kennedy, Swarm Intelligence. Burlington, MA, USA: Morgan Kaufmann, 2001.
[11]
P. Larra naga and J. A. Lozano, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad., 2002.
[12]
R. Storn and K. Price, “Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces,” J. Global Optim., vol. 11, no. 4, pp. 341–359, 1997.
[13]
L. J. Fogel, “Autonomous automata,” Ind. Res., vol. 4, no. 2, pp. 14–19, 1962.
[14]
J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA, USA: MIT Press, 1992.
[15]
I. Rechenberg, Evolutionstrategie: Optimierung Technischer Systeme Nach Prinzipien der Biologischen Evolution. Stuttgart, Germany: Fromman-Holzboog, 1973.
[16]
M. S. Krejca and C. Witt, “Theory of estimation-of-distribution algorithms,” in Theory of Evolutionary Computation. Cham, Switzerland: Springer, 2020, pp. 405–442.
[17]
B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutionary computation approaches to feature selection,” IEEE Trans. Evol. Comput., vol. 20, no. 4, pp. 606–626, Aug. 2016.
[18]
B. Badhon, M. M. Xabit, S. Xu, and M. Kabir, “A survey on association rule mining based on evolutionary computation,” Int. J. Comput. Appl., vol. 41, no. 1, pp. 1–11, 2019.
[19]
A. Telikani, A. H. Gandomi, and A. Shahbahrami, “A survey of evolutionary computation for association rule mining,” Inf. Sci., vol. 524, pp. 318–352, Jul. 2020.
[20]
R. C. Barros, M. P. Basgalupp, A. C. P. L. F. de Carvalho, and A. A. Freitas, “A survey of evolutionary algorithms for decision-tree induction,” IEEE Trans. Syst., Man Cybern. C, Appl. Rev., vol. 42, no. 3, pp. 291–312, May 2012.
[21]
A. Darwish, A. E. Hassanien, and S. Das, “A survey of swarm and evolutionary computing approaches for deep learning,” Artif. Intell. Rev., vol. 53, no. 3, pp. 1767–1812, 2020.
[22]
Z.-H. Zhan, J.-Y. Li, and J. Zhang, “Evolutionary deep learning: A survey,” Neurocomputing, vol. 483, pp. 42–58, Apr. 2022.
[23]
N. Li, L. Ma, G. Yu, B. Xue, M. Zhang, and Y. Jin, “Survey on evolutionary deep learning: Principles, algorithms, applications and open issues,” 2022, arxiv2208.10658v1.
[24]
M. J. A. Hasan and S. Ramakrishnan, “A survey: Hybrid evolutionary algorithms for cluster analysis,” Artif. Intell. Rev., vol. 36, no. 3, pp. 179–204, 2011.
[25]
E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. de Carvalho, “A survey of evolutionary algorithms for clustering,” IEEE Trans. Syst., Man Cybern. C, Appl. Rev., vol. 39, no. 2, pp. 133–155, Mar. 2009.
[26]
A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, “A survey of multiobjective evolutionary clustering,” ACM Comput. Surveys, vol. 47, no. 4, pp. 1–46, 2015.
[27]
P. Larra naga, H. Karshenas, C. Bielza, and R. Santana, “A review on evolutionary algorithms in Bayesian network learning and inference tasks,” Inf. Sci., vol. 233, pp. 109–125, Jun. 2013.
[28]
H. Al-Sahafet al., “A survey on evolutionary machine learning,” J. Royal New Zealand, vol. 49, no. 2, pp. 205–228, 2019.
[29]
A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, and C. A. Coello-Coello, “Survey of multiobjective evolutionary algorithms for data mining: Part I,” IEEE Trans. Evol. Comput., vol. 18, no. 1, pp. 4–19, Feb. 2014.
[30]
A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, and C. A. Coello-Coello, “Survey of multiobjective evolutionary algorithms for data mining: Part II,” IEEE Trans. Evol. Comput., vol. 18, no. 1, pp. 20–35, Feb. 2014.
[31]
K.A. De Jong, Evolutionary Computation. A Unified Approach. Cambridge, MA, USA: MIT Press, 2006.
[32]
C. Darwin, The Origin of Species by Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life. London, U.K.: John Murray, 1859.
[33]
M. Pelikan, D. E. Goldberg, and F. Lobo, “A survey of optimization by building and using probabilistic models,” Comput. Optim. Appl., vol. 21, no. 1, pp. 5–20, 2002.
[34]
J. A. Lozano, P. Larra naga, I. Inza, and E. Bengoetxea, Towards a New Evolutionary Computation. Advances in Estimation of Distribution Algorithms. Heidelberg, Germany: Springer, 2005.
[35]
M. Hauschild and M. Pelikan, “An introduction and survey of estimation of distribution algorithms,” Swarm Evol. Comput., vol. 1, no. 3, pp. 111–128, 2011.
[36]
R. Santana, J. A. Lozano, and P. Larra naga, “Research topics in discrete estimation of distribution algorithms based on factorizations,” Memet. Comput., vol. 1, pp. 35–54, Mar. 2009.
[37]
P. Larra naga, H. Karshenas, C. Bielza, and R. Santana, “A review on probabilistic graphical models in evolutionary computation,” J. Heurist., vol. 18, no. 5, pp. 795–819, 2012.
[38]
J. Ceberio, A. Mendiburu, and J. A. Lozano, “A roadmap for solving optimization problems with estimation of distribution algorithms,” Nat. Comput., 2022. 10.1007/s11047-022-09913-2.
[39]
X. Qian, J. Yu, A. Zhao, Q. Liu, and R. Zhang, “Decentralised estimation of distribution algorithm for parallel pumps based on log-linear model,” Int. J. Smart Grid Green Commun., vol. 2, no. 1, pp. 70–85, 2020.
[40]
W. Dong, Y. Wang, and M. Zhou, “A latent space-based estimation of distribution algorithm for large-scale global optimization,” Soft Comput., vol. 23, pp. 4593–4615, Jul. 2019.
[41]
R. Santana, “Estimation of distribution algorithms with Kikuchi approximations,” Evol. Comput., vol. 13, no. 1, pp. 67–97, 2005.
[42]
S. Shakya and J. McCall, “Optimization by estimation of distribution with DEUM framework based on Markov random fields,” Int. J. Autom. Comput., vol. 4, no. 3, pp. 262–272, 2007.
[43]
S. Shakya and R. Santana, Markov Networks in Evolutionary Computation. Heidelberg, Germany: Springer, 2012.
[44]
E. Pira, “Using Markov chain based estimation of distribution algorithm for model-based safety analysis of graph transformation,” J. Comput. Sci. Technol., vol. 36, pp. 839–855, Jul. 2021.
[45]
M. Soto, Y. González-Fernández, and A. Ochoa, “Modelling with copulas and vines in estimation of distribution algorithms,” Investigación Operacional, vol. 36, no. 1, pp. 1–23, 2015.
[46]
T. K. Paul and H. Iba, “Reinforcement learning estimation of distribution algorithm,” in Genetic and Evolutionary Computation (Lecture Notes in Computer Science 2724). Heidelberg, Germany: Springer, 2003, pp. 1259–1270.
[47]
L. Martí, J. García, A. Berlanga, and J. M. Molina, “Multi-objective optimization with an adaptive resonance theory-based estimation of distribution algorithm,” Ann. Math. Artif. Intell., vol. 68, no. 4, pp. 247–273, 2013.
[48]
L. Martí, J. García, A. Berlanga, and J. M. Molina, “MONEDA: Scalable multi-objective optimization with a neural network-based estimation of distribution algorithm,” J. Global Optim., vol. 66, pp. 729–768, Mar. 2016.
[49]
L. Bao, X. Sun, D. Gong, and Y. Zhang, “Multisource heterogeneous user-generated contents-driven interactive estimation of distribution algorithms for personalized search,” IEEE Trans. Evol. Comput., vol. 26, no. 5, pp. 844–858, Oct. 2022.
[50]
M. Probst, F. Rothlauf, and J. Grahl, “Scalability of using restricted Boltzmann machines for combinatorial optimization,” Eur. J. Oper. Res., vol. 256, no. 2, pp. 368–383, 2017.
[51]
V. A. Shim, K. C. Tan, C. Y. Cheong, and J. Y. Chia, “Enhancing the scalability of multi-objective optimization via restricted Boltzmann machine-based estimation of distribution algorithm,” Inf. Sci., vol. 248, pp. 191–213, Nov. 2013.
[52]
M. Probst and F. Rothlauf, “Harmless overfitting: Using denoising autoencoders in estimation of distribution algorithms,” J. Mach. Learn. Res., vol. 21, no. 78, pp. 1–31, 2020.
[53]
S. Bhattacharjee, “Variational autoencoder based estimation of distribution algorithms and applications to individual based ecosystem modelling using EcoSim,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Windsor, Windsor, ON, Canada, 2019.
[54]
J.-H. Jeong, E. Lee, J.-H. Lee, and C.W. Ahn, “Multi-objective deep network-based estimation of distribution algorithm for music composition,” IEEE Access, vol. 10, pp. 71973–71985, 2022.
[55]
M. Probst, “Generative adversarial networks in estimation of distribution algorithms for combinatorial optimization,” 2016, arXiv:1509.09235v2.
[56]
D. Cucci, L. Malagó, and M. Matteucci, “Variable transformations in estimation of distribution algorithms,” in Parallel Problem Solving from Nature (Lecture Notes in Computer Science 7491). Heidelberg, Germany: Springer, 2012, pp. 428–437.
[57]
C. González, J. A. Lozano, and P. Larra naga, “Analyzing the PBIL algorithm by means of discrete dynamical systems,” Complex Syst., vol. 12, no. 4, pp. 465–479, 2001.
[58]
T. Chen, K. Tang, G. Chen, and X. Yao, “Analysis of computational time of simple estimation of distribution algorithms,” IEEE Trans. Evol. Comput., vol. 14, no. 1, pp. 1–22, Feb. 2010.
[59]
B. Doerr and M. S. Krejca, “A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes,” Theor. Comput. Sci. vol. 851, pp. 121–128, Jan. 2021.
[60]
C. González, J. A. Lozano, and P. Larra naga, “Mathematical modelling of UMDAc algorithm with tournament selection. Behaviour on linear and quadratic functions,” Int. J. Approx. Reason., vol. 31, no. 4, pp. 313–340, 2002.
[61]
Q. Zhang and H. Mühlenbein, “On the convergence of a class of estimation of distribution algorithms,” IEEE Trans. Evol. Comput., vol. 8, no. 2, pp. 127–136, Apr. 2004.
[62]
B. Doerr and W. Zheng, “Sharp bounds for genetic drift in estimation of distribution algorithms,” IEEE Trans. Evol. Comput., vol. 2, no. 6, pp. 1140–1149, Dec. 2020.
[63]
J. Pearl, Probabilistic Reasoning in Intelligent Systems. Burlington, MA, USA: Morgan Kaufmann, 1988.
[64]
D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA, USA: MIT Press, 2009.
[65]
M. Maathuis, M. Drton, S. Lauritzen, and M. Wainwright, Handbook of Graphical Models. Boca Raton, FL, USA: CRC Press, 2019.
[66]
C. Bielza and P. Larra naga, Data-Driven Computational Neuroscience. Machine Learning and Statistical Models. Cambridge, U.K.: Cambridge Univ. Press, 2020.
[67]
R. Shachter and C. Kenley, “Gaussian influence diagrams,” Manag. Sci., vol. 35, no. 5, pp. 527–550, 1989.
[68]
D. Geiger and D. Heckerman, “Learning Gaussian networks,” in Proc. 10th Int. Conf. Uncertainty Artif. Intell., 1994, pp. 235–243.
[69]
R. Neapolitan, Learning Bayesian Networks. Upper Saddle River, NJ, USA: Prentice Hall, 2003.
[70]
R. Daly, Q. Shen, and S. Aitken, “Learning Bayesian networks: Approaches and issues,” Knowl. Eng. Rev., vol. 26, no. 2, pp. 99–157, 2011.
[71]
M. Scanagatta, A. Salmerón, and F. Stella, “A survey on Bayesian network structure learning from data,” Progr. Artif. Intell., vol. 8, pp. 425–439, May 2019.
[72]
O. Spirtes and C. Glymour, “An algorithm FOS fast recovey of sparse causal graphs,” Social Sci. Comput. Rev., vol. 90, no. 1, pp. 62–72, 1991.
[73]
H. Akaike, “A new look at the statistical model identification,” IEEE Trans. Autom. Control, vol. 19, no. 6, pp. 716–723, Dec. 1974.
[74]
G. Schwarz, “Estimating the dimension of a model,” Ann. Stat., vol. 6, no. 2, pp. 461–464, 1978.
[75]
G. F. Cooper and E. Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Mach. Learn., vol. 9, pp. 309–347, Oct. 1992.
[76]
D. Heckerman, D. Geiger, and D. M. Chickering, “Learning Bayesian networks: The combination of knowledge and statistical data,” Mach. Learn., vol. 20, pp. 197–243, Sep. 1995.
[77]
M. Henrion, “Propagating uncertainty in Bayesian networks by probabilistic logic sampling,” in Uncertainty in Artificial Intelligence, vol. 2. Amsterdam, The Netherlands: North-Holland, 1988, pp. 149–163.
[78]
R. M. Fung and K. C. Chang, “Weighing and integrating evidence for stochastic simulation in Bayesian networks,” in Uncertainty in Artificial Intelligence, vol. 5. Amsterdam, The Netherlands: North-Holland, 1990, pp. 209–220.
[79]
J. Pearl, “Evidential reasoning using stochastic simulation of causal models,” Artif. Intell., vol. 32, no. 2, pp. 245–257, 1987.
[80]
Q. Dang, W. Gao, and M. Gong, “An efficient mixture sampling model for Gaussian estimation of distribution algorithm,” Inf. Sci., vol. 608, pp. 1157–1182, Aug. 2022.
[81]
T. Miquélez, E. Bengoetxea, and P. Larra naga, “Evolutionary computation based on Bayesian classifiers,” Int. J. Appl. Math. Comput. Sci., vol. 14, pp. 101–115, Jan. 2004.
[82]
H. Karshenas, R. Santana, C. Bielza, and P. Larra naga, “Multiobjective estimation of distribution algorithm based on joint modeling of objectives and variables,” IEEE Trans. Evol. Comput., vol. 18, no. 4, pp. 519–542, Aug. 2014.
[83]
P. Larra naga, R. Etxeberria, J. A. Lozano, and J. M. Pe na, “Optimization in continuous domains by learning and simulation of Gaussian networks,” in Proc. Genet. Evol. Comput. Conf. Workshop Program, 2000, pp. 201–204.
[84]
P. A. N. Bosman and J. Grahl, “Matching inductive search bias and problem structure in continuous estimation-of-distribution algorithms,” Eur. J. Oper. Res., vol. 185, no. 3, pp. 1246–1264, 2008.
[85]
H. Mühlenbein and G. Paaß, “From recombination of genes to the estimation of distributions. I. Binary parameters,” in Parallel Problem Solving From Nature (Lecture Notes in Computer Science 1411). Heidelberg, Germany: Springer, 1996, pp. 178–187.
[86]
S. Baluja, “Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning,” Carnegie Mellon Univ., Pittsburgh, PA, USA, Rep. CMU-CS-94-163, 1994.
[87]
M. Sebag and A. Ducoulombier, “Extending population-based incremental learning to continuous spaces,” in Parallel Problem Solving from Nature (Lecture Notes in Computer Science 1498). Heidelberg, Germany: Springer, 1998, pp. 418–427.
[88]
G. Harik, F. G. Lobo, and D. E. Goldberg, “The compact genetic algorithm,” in Proc. IEEE Conf. Evol. Comput., 1998, pp. 523–528.
[89]
B. Doerr and M. Dufay, “General univariate estimation-of-distribution algorithms,” in Proc. 17th Int. Conf. Parallel Problem Solving Nat., 2022, pp. 470–484.
[90]
J. S. De Bonet, C. L. Isbell, and P. Viola, “MIMIC: Finding optima by estimating probability densities,” in Proc. Adv. Neural Inf. Process. Syst., vol. 9, 1997, pp. 424–430.
[91]
S. Baluja and S. Davies, “Combining multiple optimization runs with optimal dependency trees,” Carnegie Mellon Univ., Pittsburgh, PA, USA, Rep. CMU-CS-97-157, 1997.
[92]
M. Pelikan and H. Mühlenbein, “The bivariate marginal distribution algorithm,” Advances in Soft Computing-Engineering Design and Manufacturing. London, U.K.: Springer, 1999, pp. 521–535.
[93]
R. Etxeberria and P. Larra naga, “Global optimization using Bayesian networks,” in Proc. 2nd Int. Symp. Artif. Intell., 1999, pp. 332–339.
[94]
P. Larra naga, R. Etxeberria, J. A. Lozano, and J. M. Pe na, “Combinatorial optimization by learning and simulation of Bayesian networks,” in Proc. 16th Conf. Uncertainty Artif. Intell., 2000, pp. 343–352.
[95]
W. Dong, T. Chen, P. Tino, and X. Yao, “Scaling up estimation of distribution algorithms for continuous optimization,” IEEE Trans. Evol. Comput., vol. 17, no. 6, pp. 797–822, Dec. 2013.
[96]
M. Pelikan, D. E. Goldberg, and E. Cantú-Paz, “BOA: The Bayesian optimization algorithm,” in Proc. Genet. Evol. Comput. Conf., vol. 1, 1999, pp. 525–532.
[97]
H. Mühlenbein and T. Mahning, “FDA—A scalable evolutionary algorithm for the optimization of additively decomposed functions,” Evol. Comput., vol. 7, no. 4, pp. 353–376, Dec. 1999.
[98]
W. Dong and X. Yao, “Unified eigen analysis on multivariate Gaussian based estimation of distribution algorithms,” Inf. Sci., vol. 178, no. 15, pp. 3000–3023, 2008.
[99]
Y. Liang, Z. Ren, X. Yao, Z. Feng, A. Chen, and W. Guo, “Enhancing Gaussian estimation of distribution algorithm by exploiting evolution direction with archive,” IEEE Trans. Cybern., vol. 50, no. 1, pp. 140–152, Jan. 2020.
[100]
H. Xu, J. Yang, P. Jia, and Y. Ding, “Effective structure learning for estimation of distribution algorithms via L1-regularized Bayesian networks,” Int. J. Adv. Robot. Syst., vol. 10, no. 1, p. 17, 2013.
[101]
H. Karshenas, R. Santana, C. Bielza, and P. Larra naga, “Regularized continuous estimation of distribution algorithms,” Appl. Soft Comput., vol. 13, no. 5, pp. 2412–2432, 2013.
[102]
P. A. N. Bosman and D. Thierens, “IDEAs based on the normal kernels probability density function,” Dept. Comput. Sci., Utrecht Univ., Utrecht, The Netherlands, Rep. UU-CS-2000-11, 2000.
[103]
P. A. N. Bosman and D. Thierens, “Multi-objective optimization with diversity preserving mixture-based iterated density estimation evolutionary algorithms,” Int. J. Approx. Reason., vol. 31, no. 3, pp. 259–289, 2002.
[104]
J. M. Pe na, J. A. Lozano, and P. Larra naga, “Globally multimodal problem optimization via an estimation of distribution algorithm based on unsupervised learning of Bayesian networks,” Evol. Comput., vol. 13, no. 1, pp. 43–66, 2005.
[105]
G. Harik, “Linkage learning via probabilistic modelling in the ECGA,” Univ. Illinois Urbana-Champaign, Champaign, IL, USA, Rep. 99010, 1999.
[106]
C. A. Coello-Coello, G. B. Lamont, and D. A. Van Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems. New York, NY, USA: Springer, 2007.
[107]
R. Santana, C. Bielza, J. A. Lozano, and P. Larra naga, “Mining probabilistic models learned by EDAs in the optimization of multi-objective problems,” in Proc. 11th Annu. Conf. Genet. Evol. Comput., 2009, pp. 445–452.
[108]
M. Costa and E. Minisci, “MOPED: A multi-objective Parzen-based estimation of distribution algorithm for continuous problems,” in Evolutionary Multi-Criterion Optimization (Lecture Notes in Computer Science 2632). Heidelberg, Germany: Springer, 2003, pp. 282–294.
[109]
E. Bengoetxea, P. Larra naga, C. Bielza, and J. A. Fernández del Pozo, “Optimal row and column ordering to improve table interpretation using estimation of distribution algorithms,” J. Heurist., vol. 17, no. 5, pp. 567–588, 2011.
[110]
J. Bertin, Graphics and Graphic Information Processing. Berlin, Germany: Walter de Gruyter, 1981.
[111]
H. Walker and W. Durost, Statistic Tables: Their Structure and Use, Bureau of Publications, Columbia Univ., New York, NY, USA, 1936.
[112]
J. L. Flores, I. Inza, and P. Larra naga, “Wrapper discretization by means of estimation of distribution algorithms,” Intell. Data Anal. J., vol. 11, no. 5, pp. 525–546, 2007.
[113]
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proc. 20th Int. Conf. Very Large Data Bases, 1994, pp. 487–499.
[114]
X. Zhang, B. Xue, G. Sui, and J. Cui, “Association rule mining based on estimation of distribution algorithm for blood indices,” in Proc. Int. Conf. Comput. Netw., Electron. Autom., 2017, pp. 59–65.
[115]
X. Li, S. Mabu, H. Zhou, K. Shimada, and K. Hirasawa, “Genetic network programming with estimation of distribution algorithms for class association rule mining in traffic prediction,” in Proc. IEEE Congr. Evol. Comput., 2010, pp. 1–8.
[116]
E. Fix and J. L. Hodges, “Discriminatory analysis–non parametric discrimination: Consistency properties,” USAF School Aviation Med., Wright-Patterson AFB, OH, USA, Rep. 4, 1951.
[117]
I. Inza, P. Larra naga, and B. Sierra, “Feature weighting for nearest neighbour by EDAs,” in Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad. Publ., 2002, pp. 295–311.
[118]
J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986.
[119]
H. E. L. Cagnini, R. C. Barros, and M. P. Basgalupp, “Estimation of distribution algorithms for decision-tree induction,” in Proc. Congr. Evol. Comput., 2017, pp. 2022–2029.
[120]
J. Holland, “A mathematical framework for studying learning in classifier systems,” Physica D, vol. 2, no. 1, pp. 307–317, 1986.
[121]
K. De Jong and W. Spears, “Learning concept classification rules using genetic algorithms,” in Proc. 12th Int. Joint Conf. Artif. Intell., 1991, pp. 651–656.
[122]
B. Sierra, E. A. Jiménez, I. Inza, and P. Larra naga, “Rule induction by estimation of distribution algorithms,” in Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad., 2002, pp. 313–322.
[123]
J. Yang, H. Xu, and P. Jia, “Effective search for Pittsburgh learning classifier systems via estimation of distribution algorithms,” Inf. Sci., vol. 198, pp. 100–117, Sep. 2012.
[124]
J. Yang, H. Xu, and P. Jia, “Effective search for genetic-based machine learning systems via estimation of distribution algorithms and embedded feature reduction techniques,” Neurocomputing, vol. 113, pp. 105–121, Aug. 2013.
[125]
L. DelaOssa, J. A. Gámez, and J. M. Puerta, “Learning weighted linguistic fuzzy rules by using specifically-tailored hybrid estimation of distribution algorithms,” Int. J. Approx. Reason., vol. 50, no. 3, pp. 541–560, 2009.
[126]
B.E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proc. 5th Annu. Workshop Comput. Learn. Theory, 1992, pp. 144–152.
[127]
L.C. Padierna, M. Carpio, A. Rojas, H. Puga, R. Baltazar, and H. Fraire, “Hyper-parameter tuning for support vector machines by estimation of distribution algorithms,” in Nature-Inspired Design of Hybrid Intelligent Systems, vol. 667. Cham, Switzerland: Springer, 2017, pp. 787–800.
[128]
W. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull. Math. Biophys., vol. 5, pp. 115–133, Dec. 1943.
[129]
S. Baluja, “An empirical comparison of seven iterative and evolutionary function optimization heuristics,” Carnegie Mellon Univ., Pittsburgh, PA, USA, Rep. CMU-CS-95-193, 1995.
[130]
M.R. Gallagher, “Multi-layer perceptron error surfaces: Visualization, structure and modelling,” Ph.D. dissertation, Comput. Sci. Electr. Eng., Univ. Queensland, Herston, QLD, Australia, 2000.
[131]
C. Cotta, E. Alba, R. Sagarna, and P. Larra naga, “Adjusting weights in artificial neural networks using evolutionary algorithms,” in Estimation of Distribution Algorihtms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad., 2002, pp. 361–377.
[132]
Y. Chen and A. Abraham, “Estimation of distribution algorithms for optimization of neural networks for intrusion detection,” Proc. 8th Int. Conf. Artif. Intell. Soft Comput., 2006, pp. 9–18.
[133]
Q. Li, Y. Du, Z. Liu, Z. Zhou, G. Lu, and Q. Chen, “Drought prediction in the Yunnan–Guizhou plateau of China by coupling the estimation of distribution algorithm and the extreme learning machine,” Nat. Hazards, vol. 113, pp. 1635–1661, May 2022.
[134]
E. Galić and M. Höhfeld, “Improving the generalization performance of multi-layer-perceptrons with population-based incremental learning,” in Parallel Problem Solving From Nature (Lecture Notes in Computer Science 1141). Heidelberg, Germany: Springer, 1996, pp. 740–750.
[135]
G. Holker and M. V. dos Santos, “Toward an estimation of distribution algorithm for the evolution of artificial neural networks,” in Proc. 3rd Conf. Comput. Sci. Softw. Eng., 2010, pp. 17–22.
[136]
E. Cantú-Paz, “Pruning neural networks with distribution estimation algorithms,” in Proc. Genet. Evol. Comput. Conf., 2003, pp. 790–800.
[137]
J. Peralta-Donate, X. Li, G. G. Sánchez, and A. S. de Miguel, “Time series forecasting by evolving artificial neural networks with genetic algorithms, differential evolution and estimation of distribution algorithm,” Neural Comput. Appl., vol. 22, pp. 11–20, Jan. 2013.
[138]
Y. Bengio, “Learning deep architectures for AI,” Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
[139]
J.-Y. Li, Z.-H. Zhan, J. Xu, S. Kwong, and J. Zhang, “Surrogate-assisted hybrid-model estimation of distribution algorithm for mixed-variable hyperparameters optimization in convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 5, pp. 2338–2352, May 2023.
[140]
O. G. Toledano-López, J. Madera, H. González, and A. Simón-Cuevas, “A hybrid method based on estimation of distribution algorithms to train convolutional neural networks for text categorization,” Pattern Recognit. Lett., vol. 160, pp. 105–111, Aug. 2022.
[141]
Q. Xu, A. Liu, X. Yuan, Y. Song, C. Zhang, and Y. Li, “Random mask-based estimation of the distribution algorithm for stacked auto-encoder one-step pre-training,” Comput. Ind. Eng., vol. 158, Aug. 2021, Art. no.
[142]
J. Berkson, “Application of the logistic function to bio-assay,” J. Amer. Stat. Assoc., vol. 39, no. 227, pp. 357–365, 1944.
[143]
V. Robles, C. Bielza, P. Larra naga, S. González, and L. Ohno-Machado, “Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms,” TOP, vol. 16, no. 2, pp. 345–366, 2008.
[144]
C. Bielza, V. Robles, and P. Larra naga, “Regularized logistic regression without a penalty term: An application to cancer classification with microarray data,” Expert Syst. Appl., vol. 38, no. 5, pp. 5110–5118, 2011.
[145]
M. L. Minsky, “Step toward artificial intelligence,” Proc. IRE, vol. JRPROC-49, no. 1, pp. 8–30, Jan. 1961.
[146]
N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Mach. Learn., vol. 50, nos. 1–2, pp. 95–125, 1997.
[147]
C. Bielza, and P. Larra naga, “Discrete Bayesian network classifiers,” ACM Comput. Surveys, vol. 47, no. 1, 2014, Art. no.
[148]
V. Robles, P. Larra naga, J. M. Pe na, E. Menasalvas, and M. S. Pérez, “Interval estimation Naïve Bayes,” in Advances in Intelligent Data Analysis (Lecture Notes in Computer Science 2810). Heidelberg, Germany: Springer, 2003, pp. 143–154.
[149]
M. Pazzani, “Constructive induction of Cartesian product attributes,” in Proc. Inf., Stat. Induct. Sci. Conf., 1996, pp. 66–77.
[150]
V. Robleset al., “Bayesian networks multi-classifiers for protein secondary structure prediction,” Artif. Intell. Med., vol. 31, no. 2, pp. 117–136, 2004.
[151]
D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2, pp. 241–259, 1992.
[152]
I. Mendialdua, A. Arruti, E. Jauregi, E. Lazkano, and B. Sierra, “Classifier subset selection to construct multi-classifiers by means of estimation of distribution algorithms,” Neurocomputing, vol. 157, pp. 46–60, Jul. 2015.
[153]
Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997.
[154]
H. E. L. Cagnini, M. P. Basgalupp, and R. C. Barros, “Increasing boosting effectiveness with estimation of distribution algorithms,” in Proc. IEEE Congr. Evol. Comput., 2018, pp. 1–8.
[155]
H. Bunke, “18 Parameter estimation in nonlinear regression models,” in Handbook of Statistics, vol. 1. Amsterdam, The Netherlands: Elsevier, 1980, pp. 593–615.
[156]
L. M. Torres-Trevi no, “Symbolic regression using α-SS operators and estimation of distribution algorithms: Preliminary results,” in Proc. 13th Genet. Evol. Comput. Conf., 2011, pp. 647–654.
[157]
M. A. Sotelo-Figueroa, A. Hernández-Aguirre, A. Espinal, J. A. Soria-Alcaraz, and J. Ortiz-López, “Symbolic regression by means of grammatical evolution with estimation distribution algorithms as search engine,” in Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications. Studies in Computational Intelligence, vol. 749. Cham, Switzerland: Springer, 2018, pp. 169–180.
[158]
D. Wittenberg and F. Rothlauf, “Denoising autoencoder genetic programming for real-world symbolic regression,” in Proc. Genet. Evol. Comput. Conf., 2022, pp. 612–614.
[159]
C. Jin and S.-W. Jin, “Software reliability prediction model based on support vector regression with improved estimation of distribution algorithms,” Appl. Soft Comput., vol. 15, pp. 113–120, Feb. 2014.
[160]
P. M. Lewis, “The characteristic selection problem in recognition systems,” IRE Trans. Inf. Theory, vol. TIT-8, no. 2, pp. 171–178, Feb. 1962.
[161]
I. Inza, P. Larra naga, R. Etxeberria, and B. Sierra, “Feature subset selection by Bayesian network–based optimization,” Artif. Intell., vol. 123, pp. 157–184, Oct. 2000.
[162]
I. Inza, P. Larra naga, and B. Sierra, “Feature subset selection by Bayesian networks: A comparison with genetic and sequential algorithms,” Int. J. Approx. Reason., vol. 27, pp. 143–164, Aug. 2001.
[163]
E. Cantú-Paz, “Feature subset selection by estimation of distribution algorithms,” in Proc. 4th Annu. Conf. Genet. Evol. Comput., 2002, pp. 303–310.
[164]
G. Neuman and D. Cairns, “Applying a hybrid targeted estimation of distribution algorithm to feature selection problems,” in Proc. 5th Int. Joint Conf. Comput. Intell., 2013, pp. 136–143.
[165]
C. Bielza, V. Robles, and P. Larra naga, “Estimation of distribution algorithms as logistic regression regularizers of microarray classifiers,” Methods Inf. Med., vol. 48, no. 3, pp. 236–241, 2009.
[166]
S. Maza and M. Touahria, “Feature selection for intrusion detection using new multi-objective estimation of distribution algorithms,” Appl. Intell., vol. 49, pp. 4237–4257, May 2019.
[167]
H. Chen, S. Yuan, and K. Jiang, “Fitness approximation in estimation of distribution algorithms for feature selection,” in Advances in Artificial Intelligence (Lecture Notes in Computer Science 3809), Heidelberg, Germany: Springer, 2005, pp. 904–909.
[168]
T. Sorensen, “A method for establishing groups of equal amplitude in plant sociology based on similarity of species contents and its application to analyses of the vegetation on Danish commons,” Biologiske Skrifter, vol. 5, pp. 1–34, 1948.
[169]
J. Fan, “OPE-HCA: An optimal probabilistic estimation approach for hierarchical clustering algorithm,” Neural Comput. Appl., vol. 31, pp. 2095–2105, Jul. 2019.
[170]
E. W. Forgy, “Cluster analysis of multivariate data: Efficiency versus interpretability of classification,” Biometrics, vol. 21, no. 3, pp. 768–769, 1965.
[171]
J. Roure, P. Larra naga, and R. Sangüesa, “An empirical comparison between k-means, GAs and EDAs in partitional clustering,” in Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad., pp. 343–360, 2002.
[172]
L. Kaufman, and P. J. Rousseeuw, “Clustering by means of medoids,” in Statistical Data Analysis Based on the L1 Norm and Related Methods. Amsterdam, The Netherlands: North-Holland, 1997, pp. 405–416.
[173]
H. E. L. Cagnini, R. C. Barros, C. V. Quevedo, and M. P. Basgalupp, “Medoid-based data clustering with estimation of distribution algorithms,” in Proc. 31st Annu. ACM Symp. Appl. Comput., 2016, pp. 112–115.
[174]
B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007.
[175]
R. Santana, C. Bielza, and P. Larra naga, “Affinity propagation enhanced by estimation of distribution algorithms,” in Proc. Genet. Evol. Conf., 2011, pp. 331–338.
[176]
H. E. L. Cagnini, and R. C. Barros, “PASCAL: An EDA for parameterless shape-independent clustering,” in Proc. IEEE Congr. Evol. Comput., 2016, pp. 3434–3440.
[177]
A. S. G. Meiguins, R. C. Limăo, B. S. Meiguins, F. S. Samuel Jr, and A. A. Freitas, “AutoClustering: An estimation of distribution algorithm for the automatic generation of clustering algorithms,” in Proc. IEEE World Congr. Comput. Intell., 2012, pp. 1–7.
[178]
A. P. Demspter, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Stat. Soc. B, vol. 39, no. 1, pp. 1–38, 1977.
[179]
S. Lauritzen, “The EM algorithm for graphical association models with missing data,” Comput. Stat. Data Anal., vol. 19, pp. 191–201, Feb. 1995.
[180]
J. M. Pe na, J. A. Lozano, and P. Larra naga, “Unsupervised learning of Bayesian networks via estimation of distribution algorithms: An application to gene expression data clustering,” Int. J. Uncertainty, Fuzziness Knowl. Based Syst., vol. 12, pp. 63–82, Jan. 2004.
[181]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA: MIT Press, 1998.
[182]
H. Handa and T. Nishimura, “Solving reinforcement learning problems by using estimation of distribution algorithms,” in Proc. 2nd Int. Conf. Soft Comput. 9th Intell. Syst. Int. Symp. Adv. Intell. Syst., 2008, pp. 676–681.
[183]
J. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. 19th Int. Conf. Mach. Learn., 2001, pp. 282–289.
[184]
V. Mnihet al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, Feb. 2015.
[185]
Y. Du, J.-Q. Li, X.-L. Chen, P.-Y. Duan, and Q.-K. Pan, “Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 7, no. 4, pp. 1036–1050, Aug. 2023.
[186]
S. L. Lauritzen and D. J. Spiegelhalter, “Local computations with probabilities on graphical structures and their application to expert systems,” J. Royal Stat. Soc. B, Methodol., vol. 50, no. 2, pp. 157–224, 1988.
[187]
G. F. Cooper, “The computational complexity of probabilistic inference using Bayesian belief networks,” Artif. Intell., vol. 42, nos. 2–3, pp. 393–405, 1990.
[188]
P. Larra naga, C. M. H. Kuijpers, M. Poza, and R. H. Murga, “Decomposing Bayesian networks: Triangulation of the moral graph with genetic algorithms,” Stat. Comput., vol. 7, no. 1, pp. 19–34, 1997.
[189]
T. Romero and P. Larra naga, “Triangulation of Bayesian networks with recursive estimation of distribution algorithms,” Int. J. Approx. Reason., vol. 50, no. 3, pp. 472–484, 2009.
[190]
L. M. de Campos, J. A. Gámez, P. Larra naga, S. Moral, and T. Romero, “Partial abductive inference in Bayesian networks: An empirical comparison between GAs and EDAs,” in Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Amsterdam, The Netherlands: Kluwer Acad., pp. 323–341, 2002.
[191]
R. Blanco, I. Inza, and P. Larra naga, “Learning Bayesian networks in the space of structures by estimation of distribution algorithms,” Int. J. Intell. Syst., vol. 18, no. 2, pp. 205–220, 2003.
[192]
G. Thibault, S. Bonnevay, and A. Aussem, “Learning Bayesian network structures by estimation of distribution algorithms: An experimental analysis,” in Proc. IEEE Int. Conf. Digital Inf. Manag., 2007, pp. 127–132.
[193]
D. W. Kim, S. Ko, and B. Y. Kang, “Structure learning of Bayesian networks by estimation of distribution algorithms with transpose mutation,” J. Appl. Res. Technol., vol. 11, no. 4, pp. 586–596, 2013.
[194]
S. Fukuda, Y. Yamanaka, and T. Yoshihiro, “A probability-based evolutionary algorithm with mutations to learn Bayesian network,” Int. J. Artif. Intell. Interact. Multimedia, vol. 3, no. 1, pp. 7–13, 2014.
[195]
T. Romero, P. Larra naga, and B. Sierra, “Learning Bayesian networks in the space of orderings with estimation of distribution algorithms,” Int. J. Pattern Recognit. Artif. Intell., vol. 18, no. 4, pp. 607–625, 2004.
[196]
L. Rabiner and B. Juang, “An introduction to hidden Markov models,” IEEE ASSP Mag., vol. 3, no. 1, pp. 4–16, Jan. 1986.
[197]
B. Maxwell and S. Anderson, “Training hidden Markov models using population-based learning,” in Proc. 1st Annu. Genet. Evol. Comput. Conf., vol. 1, 1999, pp. 944–950.
[198]
I. Inza, M. Merino, P. Larra naga, J. Quiroga, B. Sierra, and M. Girala, “Feature subset selection by genetic algorithms and estimation of distribution algorithms: A case study in the survival of cirrhotic patients treated with TIPS,” Artif. Intell. Med., vol. 23, no. 2, pp. 187–205, 2001.
[199]
R. Arma nanzaset al., “Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 8, no. 3, pp. 760–774, May/Jun. 2011.
[200]
Y. Saeys, S. Degroeve, D. Aeyels, Y. van de Peer, and P. Rouzé, “Fast feature selection using a simple estimation of distribution algorithm: A case study on splice site prediction,” Bioinformatics, vol. 19, no. 2, pp. II179–II188, 2003.
[201]
M. Ayodele, “Application of estimation of distribution algorithm for feature selection,” in Proc. 19th Genet. Evol. Comput. Conf., 2019, pp. 43–44.
[202]
C. Cano, F. García, F. J. López, and A. Blanco, “Intelligent system for the analysis of microarray data using principal components and estimation of distribution algorithms,” Expert Syst. Appl., vol. 36, no. 3, pp. 4654–4663, 2009.
[203]
H. Handa, “EDA-RL: Estimation of distribution algorithms for reinforcement learning problems,” in Proc. 11th Annu. Conf. Genet. Evol. Comput., 2009, pp. 405–412.
[204]
N. K. Kitson, A. C. Constantinou, Z. Guo, Y. Liu, and K. Chobtham, “A survey of Bayesian network structure learning,” Artif. Intell. Rev., vol. 56, pp. 8721–8814, Jan. 2023.
[205]
D. Atienza, C. Bielza, and P. Larra naga, “Semiparametric Bayesian networks,” Inf. Sci., vol. 584, pp. 564–582, Jan. 2022.
[206]
D. Atienza, P. Larra naga, and C. Bielza, “Hybrid semiparametric Bayesian networks,” TEST, vol. 31, pp. 299–327, Jun. 2022.
[207]
V. P. Soloviev, C. Bielza, and P. Larra naga, “Semiparametric estimation of distribution algorithms for continuous optimization,” IEEE Trans. Evol. Comput., early access, Jun. 29, 2023. 10.1109/TEVC.2023.3290670.
[208]
J. Ceberio, B. Doerr, C. Witt, and V. P. Soloviev, “Estimation-of-distribution algorithms: Theory and applications,” Dagstuhl Rep., vol. 12, no. 5, pp. 17–36, 2022.
[209]
G. Ochoa, K. M. Malan, and C. Blum, “Search trajectory networks: A tool for analysing and visualising the behaviour of metaheuristics,” Appl. Soft Comput., vol. 109, Sep. 2021, Art. no.
[210]
H. Borchani, G. Varando, C. Bielza, and P. Larra naga, “A survey on multi-output regression,” in Wiley Interdisciplinary Reviews—Data Mining and Knowledge Discovery, vol. 5. Hoboken, NJ, USA: Wiley, 2015, pp. 216–233.
[211]
J. A. Lozano and P. Larra naga, “Applying genetic algorithms to search for the best hierarchical clustering of a dataset,” Pattern Recognit. Lett., vol. 20, no. 9, pp. 911–918, 1999.
[212]
J. A. Castellanos-Garzón, C. A. García, and L. A. Miguel-Quintales, “An evolutionary hierarchical clustering method with a visual validation tool,” in Proc. Int. Work Conf. Artif. Neural Netw., 2009, pp. 367–374.
Index Terms
- Estimation of Distribution Algorithms in Machine Learning: A Survey
Index terms have been assigned to the content through auto-classification.
Recommendations
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
Publisher
IEEE Press
Publication History
Published: 12 September 2023
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024
Other Metrics
Citations
View Options
View options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in