article

Free access

Neural autoregressive distribution estimation

Editors: Kevin Murphy, Bernhard Schölkopf Authors:

Marc-Alexandre Côté,

Hugo LarochelleAuthors Info & Claims

The Journal of Machine Learning Research, Volume 17, Issue 1

Pages 7184 - 7220

Published: 01 January 2016 Publication History

PDF eReader Publisher Site

Abstract

We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.

References

[1]

Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.

[2]

Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1-127, 2009.

[3]

Yoshua Bengio and Samy Bengio. Modeling high-dimensional discrete data with multi-layer neural networks. In Advances in Neural Information Processing Systems 12, pages 400-406. MIT Press, 2000.

[4]

Julian Besag. Statistical analysis of non-lattice data. The Statistician, 24(3):179-195, 1975.

[5]

Christopher M. Bishop. Mixture density networks. Technical Report NCRG 4288, Neural Computing Research Group, Aston University, Birmingham, 1994.

[6]

Jörg Bornschein and Yoshua Bengio. Reweighted wake-sleep. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1406.2751, 2015.

[7]

Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning, pages 1159-1166. Omnipress, 2012.

[8]

Yuri Burda, Ruslan Salakhutdinov, and Roger Grosse. Importance weighted autoencoders. In Proceedings of the 4th International Conference on Learning Representations. arXiv:1509.00519v3, 2016.

[9]

KyungHyun Cho, Tapani Raiko, and Alexander Ilin. Parallel tempering is efficient for learning restricted Boltzmann machines. In Proceedings of the International Joint Conference on Neural Networks. IEEE, 2010.

[10]

KyungHyun Cho, Tapani Raiko, and Alexander Ilin. Enhanced gradient for training restricted Boltzmann machines. Neural Computation, 25:805-31, 2013.

[11]

C.K. Chow and C.N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462-467, 1968.

[12]

George E. Dahl, Tara N. Sainath, and Geo_rey E. Hinton. Improving deep neural networks for LVCSR using rectiéd linear units and dropout. IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8609-8613, 2013.

[13]

Peter Dayan, Geoffrey E. Hinton, Radford M. Neal, and Richard S. Zemel. The Helmholtz machine. Neural Computation, 7:889-904, 1995.

[14]

Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems 28, pages 1486-1494. Curran Associates, Inc., 2015.

[15]

Guillaume Desjardins, Aaron Courville, Yoshua Bengio, Pascal Vincent, and Olivier Delalleau. Tempered Markov chain Monte Carlo for training of restricted Boltzmann machine. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 9:145-152, 2010.

[16]

Yoav Freund and David Haussler. Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in Neural Information Processing Systems 4, pages 912-919. Morgan-Kaufmann, 1992.

[17]

Brendan J. Frey, Geoffrey E. Hinton, and Peter Dayan. Does the wake-sleep algorithm learn good density estimators? In Advances in Neural Information Processing Systems 8, pages 661-670. MIT Press, 1996.

[18]

J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST, 1993.

[19]

Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked autoencoder for distribution estimation. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:881-889, 2015.

[20]

Zoubin Ghahramani and Geo_rey E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, 1996.

[21]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672-2680, 2014.

[22]

Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems 24, pages 2348-2356. Curran Associates, Inc., 2011.

[23]

Karol Gregor and Yann LeCun. Learning representations by maximizing compression. Technical report, arXiv:1108.1169, 2011.

[24]

Karol Gregor, Andriy Mnih, and Daan Wierstra. Deep autoregressive networks. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32:1242-1250, 2014.

[25]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. DRAW: a recurrent neural network for image generation. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:1462-1471, 2015.

[26]

Arthur Gretton, Karsten M. Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J. Smola. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems 19, pages 513-520. MIT Press, 2007.

[27]

Michael Gutmann and Aapo Hyvärinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 297-304, 2010.

[28]

Stefan Harmeling and Christopher K.I. Williams. Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(6):1087-1097, 2011.

[29]

Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771-1800, 2002.

[30]

Geoffrey E. Hinton, Peter Dayan, Brendan J. Frey, and Radford M. Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268:1161-1558, 1995.

[31]

Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.

[32]

Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695-709, 2005.

[33]

Aapo Hyvärinen. Some extensions of score matching. Computational Statistics and Data Analysis, 51:2499-2512, 2007a.

[34]

Aapo Hyvärinen. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. IEEE Transactions on Neural Networks, 18: 1529-1531, 2007b.

[35]

Diederik P. Kingma and Jimmy Lei Ba. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1412.6980v5, 2015.

[36]

Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations. arXiv:1312.6114v10, 2014.

[37]

Alex Krizhevsky, Ilya Sutskever, and Geo_rey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1097-1105. Curran Associates, Inc., 2012.

[38]

Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, pages 2708-2716. Curran Associates, Inc., 2012.

[39]

Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 15:29-37, 2011.

[40]

Yann LeCun, Yoshua Bengio, and Geo_rey E. Hinton. Deep learning. Nature, 521(7553): 436-444, 2015.

[41]

Yujia Li, Kevin Swersky, and Richard S. Zemel. Generative moment matching networks. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:1718-1727, 2015.

[42]

Benjamin Marlin, Kevin Swersky, Bo Chen, and Nando de Freitas. Inductive principles for restricted Boltzmann machine learning. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.

[43]

D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision, volume 2, pages 416-423. IEEE, July 2001.

[44]

Grégoire Montavon and Klaus-Robert Müller. Deep Boltzmann machines and the centering trick. In Neural Networks: Tricks of the Trade, Second Edition, pages 621-637. Springer, 2012.

[45]

Radford M. Neal. Connectionist learning of belief networks. Artificial Intelligence, 56:71-113, 1992.

[46]

Jiquan Ngiam, Zhenghao Chen, Pang Wei Koh, and Andrew Y. Ng. Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning, pages 1105-1112. Omnipress, 2011.

[47]

Aäron Van Den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. Proceedings of the 33rd International Conference on Machine Learning, JMLR W&CP, 2016. To appear. arXiv:1601.06759v2.

[48]

Dirk Ormoneit and Volker Tresp. Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In Advances in Neural Information Processing Systems 8, pages 542-548. MIT Press, 1995.

[49]

Tapani Raiko, Li Yao, Kyunghyun Cho, and Yoshua Bengio. Iterative neural autoregressive distribution estimator (NADE-k). In Advances in Neural Information Processing Systems 27, pages 325-333. Curran Associates, Inc., 2014.

[50]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32:1278-1286, 2014.

[51]

Ruslan Salakhutdinov. Learning in Markov random fields using tempered transitions. In Advances in Neural Information Processing Systems 22, pages 1598-1606. Curran Associates, Inc., 2009.

[52]

Ruslan Salakhutdinov. Learning deep Boltzmann machines using adaptive MCMC. In Proceedings of the 27th International Conference on Machine Learning, pages 943-950. Omnipress, 2010.

[53]

Ruslan Salakhutdinov and Geo_rey E. Hinton. Deep Boltzmann machines. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 5:448-455, 2009.

[54]

Ruslan Salakhutdinov and Hugo Larochelle. Efficient learning of deep Boltzmann machines. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 9:693-700, 2010.

[55]

Ruslan Salakhutdinov and Iain Murray. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning, pages 872-879. Omnipress, 2008.

[56]

Ricardo Silva, Charles Blundell, and Yee Whye Teh. Mixed cumulative distribution networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 15:670-678, 2011.

[57]

Paul Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing: Volume 1: Foundations, volume 1, chapter 6, pages 194-281. MIT Press, Cambridge, 1986.

[58]

Padhraic Smyth and David Wolpert. Linearly combining density estimators via stacking. Machine Learning, 36(1-2):59-83, 1999.

[59]

Jascha Sohl-Dickstein, Peter Battaglino, and Michael R. DeWeese. Minimum probability ow learning. In Proceedings of the 28th International Conference on Machine Learning, pages 905-912. Omnipress, 2011.

[60]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1412.6806v3, 2015.

[61]

Yichuan Tang, Ruslan Salakhutdinov, and Geo_rey E. Hinton. Deep mixtures of factor analysers. In Proceedings of the 29th International Conference on Machine Learning, pages 505-512. Omnipress, 2012.

[62]

The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, et al. Theano: A python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016.

[63]

Lucas Theis and Matthias Bethge. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems 28, pages 1927-1935. Curran Associates, Inc., 2015.

[64]

Tijmen Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning, pages 1064-1071. Omnipress, 2008.

[65]

Tijmen Tieleman and Geo_rey E. Hinton. Using fast weights to improve persistent contrastive divergence. In Proceedings of the 26th International Conference on Machine Learning, pages 1033-1040. Omnipress, 2009.

[66]

Benigno Uria. Connectionist multivariate density-estimation and its application to speech synthesis. PhD thesis, The University of Edinburgh, 2015.

[67]

Benigno Uria, Iain Murray, and Hugo Larochelle. RNADE: The real-valued neural autoregressive density-estimator. In Advances in Neural Information Processing Systems 26, pages 2175-2183. Curran Associates, Inc., 2013.

[68]

Benigno Uria, Iain Murray, and Hugo Larochelle. A deep and tractable density estimator. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32: 467-475, 2014.

[69]

Jakob Verbeek. Mixture of factor analyzers Matlab implementation, 2005. http://lear.inrialpes.fr/~verbeek/software.php.

[70]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096-1103. Omnipress, 2008.

[71]

Max Welling, Michal Rosen-Zvi, and Geo_rey E. Hinton. Exponential family harmoniums with an application to information retrieval. In Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, 2005.

[72]

Laurent Younes. Parameter inference for imperfectly observed Gibbsian fields. Probability Theory Related Fields, 82:625-645, 1989.

[73]

Yin Zheng, Richard S. Zemel, Yu-Jin Zhang, and Hugo Larochelle. A neural autoregressive approach to attention-based recognition. International Journal of Computer Vision, 113 (1):67-79, 2015a.

[74]

Yin Zheng, Yu-Jin Zhang, and Hugo Larochelle. A deep and autoregressive approach for topic modeling of multimodal data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(6):1056-1069, 2015b.

[75]

Daniel Zoran and Yair Weiss. From learning models of natural image patches to whole image restoration. In International Conference on Computer Vision, pages 479-486. IEEE, 2011.

[76]

Daniel Zoran and Yair Weiss. Natural images, Gaussian mixtures and dead leaves. In Advances in Neural Information Processing Systems 25, pages 1745-1753. Curran Associates, Inc., 2012.

Cited By

Lo LTsai YChang YTing HChien T(2024)Constructing a Segmentation Model for Patients with Leukoaraiosis Using Deep Learning AlgorithmsProceedings of the 2024 8th International Conference on Medical and Health Informatics10.1145/3673971.3674020(193-199)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3673971.3674020
Bergsma SZeyl TGuo LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)SutraNetsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667452(30518-30533)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667452
Katwe PKhamparia AGupta DDutta A(2023)Methodical Systematic Review of Abstractive Summarization and Natural Language Processing Models for Biomedical Health Informatics: Approaches, Metrics and ChallengesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3600230Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1145/3600230
Show More Cited By

Neural autoregressive distribution estimation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Document neural autoregressive distribution estimation

We present an approach based on feed-forward neural networks for learning the distribution over textual documents. This approach is inspired by the Neural Autoregressive Distribution Estimator (NADE) model which has been shown to be a good estimator of ...
Optical Flow Estimation with Convolutional Neural Nets
Abstract
In recent years, convolutional neural networks (CNNs) have been used for optical flow estimation with great success. Convolutional neural networks are multilayer’s structures, highly competent to estimate the complex, nonlinear transformation ...
Genetic algorithms for evolving deep neural networks
GECCO Comp '14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation

In recent years, deep learning methods applying unsupervised learning to train deep layers of neural networks have achieved remarkable results in numerous fields. In the past, many genetic algorithms based methods have been successfully applied to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 17, Issue 1

January 2016

8391 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2016

Published in JMLR Volume 17, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)7

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lo LTsai YChang YTing HChien T(2024)Constructing a Segmentation Model for Patients with Leukoaraiosis Using Deep Learning AlgorithmsProceedings of the 2024 8th International Conference on Medical and Health Informatics10.1145/3673971.3674020(193-199)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3673971.3674020
Bergsma SZeyl TGuo LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)SutraNetsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667452(30518-30533)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667452
Katwe PKhamparia AGupta DDutta A(2023)Methodical Systematic Review of Abstractive Summarization and Natural Language Processing Models for Biomedical Health Informatics: Approaches, Metrics and ChallengesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3600230Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1145/3600230
Liu WBansal DDaruna AChernova S(2023)Learning instance-level N-ary semantic knowledge at scale for robots operating in everyday environmentsAutonomous Robots10.1007/s10514-023-10099-447:5(529-547)Online publication date: 6-Apr-2023
https://dl.acm.org/doi/10.1007/s10514-023-10099-4
Shrivastava HChajewska U(2023)Neural Graphical ModelsSymbolic and Quantitative Approaches to Reasoning with Uncertainty10.1007/978-3-031-45608-4_22(284-307)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-45608-4_22
Kastner KCooijmans TWu YCourville A(2023)SUNMASK: Mask Enhanced Control in Step Unrolled Denoising AutoencodersArtificial Intelligence in Music, Sound, Art and Design10.1007/978-3-031-29956-8_10(148-163)Online publication date: 12-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-29956-8_10
Xie SHo QZhang KKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Unpaired image-to-image translation with density changing regularizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602339(28545-28558)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602339
Bergsma SZeyl TAnaraki JGuo LKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)C2FARProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601862(21900-21915)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601862
Shih ASadigh DErmon SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Training and inference on any-order autoregressive models the right wayProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600470(2762-2775)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600470
Amiridi MKargas NSidiropoulos N(2022)Low-Rank Characteristic Tensor Density Estimation Part I: FoundationsIEEE Transactions on Signal Processing10.1109/TSP.2022.317560870(2654-2668)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TSP.2022.3175608
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents