article

Training products of experts by minimizing contrastive divergence

Author:

Geoffrey E. HintonAuthors Info & Claims

Neural Computation, Volume 14, Issue 8

Pages 1771 - 1800

https://doi.org/10.1162/089976602760128018

Published: 01 August 2002 Publication History

Abstract

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

References

[1]

Berger, A., Della Pietra, S., & Della Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71.

Digital Library

[2]

Brown, A. D., & Hinton, G. E. (2001). Products of hidden Markov models. In Proceedings of Artificial Intelligence and Statistics 2001 (pp. 3-11). San Mateo, CA: Morgan Kaufmann.

[3]

Freund, Y., & Haussler, D. (1992). Unsupervised learning of distributions on binary vectors using two layer networks. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Advances in neural information processing systems, 4 (pp. 912-919). San Mateo, CA: Morgan Kaufmann.

[4]

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741

Digital Library

[5]

Genest, C., & Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1, 114-148.

[6]

Heskes, T. (1998). Bias/variance decompositions for likelihood-based estimators. Neural Computation, 10, 1425-1433.

Digital Library

[7]

Hinton, G., Dayan, P., Frey, B., & Neal, R. (1995). The wake-sleep algorithm for self-organizing neural networks. Science, 268, 1158-1161.

[8]

Hinton, G. E., & McClelland, J. L. (1988). Learning representations by recirculation. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 358-366). New York: American Institute of Physics.

[9]

Hinton, G. E., Sallans, B., & Ghahramani, Z. (1998). Hierarchical communities of experts. In M. I. Jordan (Ed.), Learning in graphical models. Norwood, MA: Kluwer.

[10]

Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press.

[11]

Mayraz, G., & Hinton, G. E. (2001). Recognizing hand-written digits using hierarchical products of experts. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 953-959). Cambridge, MA: MIT Press.

[12]

Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71-113.

Digital Library

[13]

O'Reilly, R. C. (1996) Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm. Neural Computation, 8, 895-938.

Digital Library

[14]

Revow, M., Williams, C. K. I., & Hinton, G. E. (1996). Using generative models for handwritten digit recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 592-606.

Digital Library

[15]

Saul, L. K., & Jordan, M. I. (2000). Attractor dynamics in feedforward neural networks. Neural Computation, 12, 1313-1335.

Digital Library

[16]

Seung, H. S. (1998). Learning continuous attractors in a recurrent net. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.) Advances in neural information processing systems, 10 (pp. 654-660). Cambridge, MA: MIT Press.

[17]

Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press.

[18]

Teh, Y. W., & Hinton, G. E. (2001). Rate-coded restricted Boltzmann machines for face recognition. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 908-914). Cambridge, MA: MIT Press.

[19]

Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The psychology of computer vision. New York: McGraw-Hill.

[20]

Zemel, R. S., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 696-702). Cambridge, MA: MIT Press.

Cited By

Friston KRamstead MKiefer ATschantz ABuckley CAlbarracin MPitliya RHeins CKlein BMillidge BSakthivadivel DSt Clere Smithe TKoudahl MTremblay SPetersen CFung KFox JSwanson SMapes DRené G(2024)Designing ecosystems of intelligence from first principlesCollective Intelligence10.1177/263391372312224813:1Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1177/26339137231222481
Ma JWang PKong DWang ZLiu JPei HZhao J(2024)Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336615446:8(5575-5594)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3366154
Ma ZLi YLuo YLuo XLi JChen CHua XLu G(2024)Discrepancy and Structure-Based Contrast for Test-Time Adaptive RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338133726(8665-8677)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3381337
Show More Cited By

Index Terms

Training products of experts by minimizing contrastive divergence

Recommendations

Inference by minimizing size, divergence, or their sum
UAI'10: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence

We speed up marginal inference by ignoring factors that do not significantly contribute to overall accuracy. In order to pick a suitable subset of factors to ignore, we propose three schemes: minimizing the number of model factors under a bound on the ...
Divergence measures and a general framework for local variational approximation

The local variational method is a technique to approximate an intractable posterior distribution in Bayesian learning. This article formulates a general framework for local variational approximation and shows that its objective function is decomposable ...
A contrastive learning approach for training variational autoencoder priors
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains. However, they struggle to generate high-quality images, especially when samples are obtained from the prior without any ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 14, Issue 8

August 2002

229 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 August 2002

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

887
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Friston KRamstead MKiefer ATschantz ABuckley CAlbarracin MPitliya RHeins CKlein BMillidge BSakthivadivel DSt Clere Smithe TKoudahl MTremblay SPetersen CFung KFox JSwanson SMapes DRené G(2024)Designing ecosystems of intelligence from first principlesCollective Intelligence10.1177/263391372312224813:1Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1177/26339137231222481
Ma JWang PKong DWang ZLiu JPei HZhao J(2024)Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336615446:8(5575-5594)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3366154
Ma ZLi YLuo YLuo XLi JChen CHua XLu G(2024)Discrepancy and Structure-Based Contrast for Test-Time Adaptive RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338133726(8665-8677)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3381337
Zhou XMiao C(2024)Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With InterpretabilityIEEE Transactions on Multimedia10.1109/TMM.2024.336987526(7543-7554)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3369875
Li AMao YZhang JDai Y(2024)Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328524934:1(397-410)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3285249
Vesapogu PSurampudi B(2024)A Comparative Study of Machine Learning Approaches for the Detection of SARS-CoV-2 and its VariantsProcedia Computer Science10.1016/j.procs.2024.04.113235:C(1190-1201)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.04.113
Granato GBaldassarre G(2024)Bridging flexible goal-directed cognition and consciousnessNeural Networks10.1016/j.neunet.2024.106292176:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.neunet.2024.106292
Sadok SLeglaive SGirin LAlameda-Pineda XSéguier R(2024)A multimodal dynamical variational autoencoder for audiovisual speech representation learningNeural Networks10.1016/j.neunet.2024.106120172:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2024.106120
Deng QGuo YYang ZPan HChen J(2024)Boosting semi-supervised learning with Contrastive Complementary LabelingNeural Networks10.1016/j.neunet.2023.11.052170:C(417-426)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.11.052
Han ZLuo TFu HHu QZhou JZhang C(2024)A principled framework for explainable multimodal disentanglementInformation Sciences: an International Journal10.1016/j.ins.2024.120768675:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120768
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents