Abstract
The EM algorithm performs maximum likelihood estimation for data in which some variables are unobserved. We present a function that resembles negative free energy and show that the M step maximizes this function with respect to the model parameters and the E step maximizes it with respect to the distribution over the unobserved variables. From this perspective, it is easy to justify an incremental variant of the EM algorithm in which the distribution for only one of the unobserved variables is recalculated in each E step. This variant is shown empirically to give faster convergence in a mixture estimation problem. A variant of the algorithm that exploits sparse conditional distributions is also described, and a wide range of other variant algorithms are also seen to be possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Csiszàr I. and Tusnàdy, G. (1984) “Information geometry and alternating minimization procedures”, in E. J. Dudewicz, et al(editors) Recent Results in Estimation Theory and Related Topics(Statistics and Decisions, Supplement Issue No. 1, 1984 ).
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) “Maximum likelihood from incomplete data via the EM algorithm” (with discussion), Journal of the Royal Statistical Society B, vol. 39, pp. 1–38.
Hathaway, R. J. (1986) “Another interpretation of the EM algorithm for mixture distributions”, Statistics and Probability Letters, vol. 4, pp. 53–56.
McLachlan, G. J. and Krishnan, T. (1997) The EM Algorithm and Extensions, New York: Wiley.
Meng, X. L. and Rubin, D. B. (1992) “Recent extensions of the EM algorithm (with discussion) ”, in J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (editors), Bayesian Statistics4, Oxford: Clarendon Press.
Meng, X. L. and van Dyk, D. (1997) “The EM algorithm — an old folksong sung to a fast new tune” (with discussion), Journal of the Royal Statistical Society B, vol. 59, pp. 511–567.
Nowlan, S. J. (1991) Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures, Ph. D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Neal, R.M., Hinton, G.E. (1998). A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants. In: Jordan, M.I. (eds) Learning in Graphical Models. NATO ASI Series, vol 89. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5014-9_12
Download citation
DOI: https://doi.org/10.1007/978-94-011-5014-9_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6104-9
Online ISBN: 978-94-011-5014-9
eBook Packages: Springer Book Archive