Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Training products of experts by minimizing contrastive divergence

Published: 01 August 2002 Publication History

Abstract

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

References

[1]
Berger, A., Della Pietra, S., & Della Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71.
[2]
Brown, A. D., & Hinton, G. E. (2001). Products of hidden Markov models. In Proceedings of Artificial Intelligence and Statistics 2001 (pp. 3-11). San Mateo, CA: Morgan Kaufmann.
[3]
Freund, Y., & Haussler, D. (1992). Unsupervised learning of distributions on binary vectors using two layer networks. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.), Advances in neural information processing systems, 4 (pp. 912-919). San Mateo, CA: Morgan Kaufmann.
[4]
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741
[5]
Genest, C., & Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1, 114-148.
[6]
Heskes, T. (1998). Bias/variance decompositions for likelihood-based estimators. Neural Computation, 10, 1425-1433.
[7]
Hinton, G., Dayan, P., Frey, B., & Neal, R. (1995). The wake-sleep algorithm for self-organizing neural networks. Science, 268, 1158-1161.
[8]
Hinton, G. E., & McClelland, J. L. (1988). Learning representations by recirculation. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 358-366). New York: American Institute of Physics.
[9]
Hinton, G. E., Sallans, B., & Ghahramani, Z. (1998). Hierarchical communities of experts. In M. I. Jordan (Ed.), Learning in graphical models. Norwood, MA: Kluwer.
[10]
Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press.
[11]
Mayraz, G., & Hinton, G. E. (2001). Recognizing hand-written digits using hierarchical products of experts. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 953-959). Cambridge, MA: MIT Press.
[12]
Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71-113.
[13]
O'Reilly, R. C. (1996) Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm. Neural Computation, 8, 895-938.
[14]
Revow, M., Williams, C. K. I., & Hinton, G. E. (1996). Using generative models for handwritten digit recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 592-606.
[15]
Saul, L. K., & Jordan, M. I. (2000). Attractor dynamics in feedforward neural networks. Neural Computation, 12, 1313-1335.
[16]
Seung, H. S. (1998). Learning continuous attractors in a recurrent net. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.) Advances in neural information processing systems, 10 (pp. 654-660). Cambridge, MA: MIT Press.
[17]
Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press.
[18]
Teh, Y. W., & Hinton, G. E. (2001). Rate-coded restricted Boltzmann machines for face recognition. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 908-914). Cambridge, MA: MIT Press.
[19]
Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The psychology of computer vision. New York: McGraw-Hill.
[20]
Zemel, R. S., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 696-702). Cambridge, MA: MIT Press.

Cited By

View all
  • (2024)Designing ecosystems of intelligence from first principlesCollective Intelligence10.1177/263391372312224813:1Online publication date: 1-Jan-2024
  • (2024)Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336615446:8(5575-5594)Online publication date: 1-Aug-2024
  • (2024)Discrepancy and Structure-Based Contrast for Test-Time Adaptive RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338133726(8665-8677)Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 14, Issue 8
August 2002
229 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 August 2002

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Designing ecosystems of intelligence from first principlesCollective Intelligence10.1177/263391372312224813:1Online publication date: 1-Jan-2024
  • (2024)Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336615446:8(5575-5594)Online publication date: 1-Aug-2024
  • (2024)Discrepancy and Structure-Based Contrast for Test-Time Adaptive RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338133726(8665-8677)Online publication date: 25-Mar-2024
  • (2024)Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With InterpretabilityIEEE Transactions on Multimedia10.1109/TMM.2024.336987526(7543-7554)Online publication date: 28-Feb-2024
  • (2024)Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328524934:1(397-410)Online publication date: 1-Jan-2024
  • (2024)A Comparative Study of Machine Learning Approaches for the Detection of SARS-CoV-2 and its VariantsProcedia Computer Science10.1016/j.procs.2024.04.113235:C(1190-1201)Online publication date: 24-Jul-2024
  • (2024)Bridging flexible goal-directed cognition and consciousnessNeural Networks10.1016/j.neunet.2024.106292176:COnline publication date: 1-Aug-2024
  • (2024)A multimodal dynamical variational autoencoder for audiovisual speech representation learningNeural Networks10.1016/j.neunet.2024.106120172:COnline publication date: 1-Apr-2024
  • (2024)Boosting semi-supervised learning with Contrastive Complementary LabelingNeural Networks10.1016/j.neunet.2023.11.052170:C(417-426)Online publication date: 12-Apr-2024
  • (2024)A principled framework for explainable multimodal disentanglementInformation Sciences: an International Journal10.1016/j.ins.2024.120768675:COnline publication date: 1-Jul-2024
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media