Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Neural autoregressive distribution estimation

Published: 01 January 2016 Publication History

Abstract

We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.

References

[1]
Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[2]
Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1-127, 2009.
[3]
Yoshua Bengio and Samy Bengio. Modeling high-dimensional discrete data with multi-layer neural networks. In Advances in Neural Information Processing Systems 12, pages 400-406. MIT Press, 2000.
[4]
Julian Besag. Statistical analysis of non-lattice data. The Statistician, 24(3):179-195, 1975.
[5]
Christopher M. Bishop. Mixture density networks. Technical Report NCRG 4288, Neural Computing Research Group, Aston University, Birmingham, 1994.
[6]
Jörg Bornschein and Yoshua Bengio. Reweighted wake-sleep. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1406.2751, 2015.
[7]
Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning, pages 1159-1166. Omnipress, 2012.
[8]
Yuri Burda, Ruslan Salakhutdinov, and Roger Grosse. Importance weighted autoencoders. In Proceedings of the 4th International Conference on Learning Representations. arXiv:1509.00519v3, 2016.
[9]
KyungHyun Cho, Tapani Raiko, and Alexander Ilin. Parallel tempering is efficient for learning restricted Boltzmann machines. In Proceedings of the International Joint Conference on Neural Networks. IEEE, 2010.
[10]
KyungHyun Cho, Tapani Raiko, and Alexander Ilin. Enhanced gradient for training restricted Boltzmann machines. Neural Computation, 25:805-31, 2013.
[11]
C.K. Chow and C.N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462-467, 1968.
[12]
George E. Dahl, Tara N. Sainath, and Geo_rey E. Hinton. Improving deep neural networks for LVCSR using rectiéd linear units and dropout. IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8609-8613, 2013.
[13]
Peter Dayan, Geoffrey E. Hinton, Radford M. Neal, and Richard S. Zemel. The Helmholtz machine. Neural Computation, 7:889-904, 1995.
[14]
Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems 28, pages 1486-1494. Curran Associates, Inc., 2015.
[15]
Guillaume Desjardins, Aaron Courville, Yoshua Bengio, Pascal Vincent, and Olivier Delalleau. Tempered Markov chain Monte Carlo for training of restricted Boltzmann machine. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 9:145-152, 2010.
[16]
Yoav Freund and David Haussler. Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in Neural Information Processing Systems 4, pages 912-919. Morgan-Kaufmann, 1992.
[17]
Brendan J. Frey, Geoffrey E. Hinton, and Peter Dayan. Does the wake-sleep algorithm learn good density estimators? In Advances in Neural Information Processing Systems 8, pages 661-670. MIT Press, 1996.
[18]
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST, 1993.
[19]
Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked autoencoder for distribution estimation. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:881-889, 2015.
[20]
Zoubin Ghahramani and Geo_rey E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, 1996.
[21]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672-2680, 2014.
[22]
Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems 24, pages 2348-2356. Curran Associates, Inc., 2011.
[23]
Karol Gregor and Yann LeCun. Learning representations by maximizing compression. Technical report, arXiv:1108.1169, 2011.
[24]
Karol Gregor, Andriy Mnih, and Daan Wierstra. Deep autoregressive networks. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32:1242-1250, 2014.
[25]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. DRAW: a recurrent neural network for image generation. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:1462-1471, 2015.
[26]
Arthur Gretton, Karsten M. Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J. Smola. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems 19, pages 513-520. MIT Press, 2007.
[27]
Michael Gutmann and Aapo Hyvärinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 297-304, 2010.
[28]
Stefan Harmeling and Christopher K.I. Williams. Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(6):1087-1097, 2011.
[29]
Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771-1800, 2002.
[30]
Geoffrey E. Hinton, Peter Dayan, Brendan J. Frey, and Radford M. Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268:1161-1558, 1995.
[31]
Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.
[32]
Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695-709, 2005.
[33]
Aapo Hyvärinen. Some extensions of score matching. Computational Statistics and Data Analysis, 51:2499-2512, 2007a.
[34]
Aapo Hyvärinen. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. IEEE Transactions on Neural Networks, 18: 1529-1531, 2007b.
[35]
Diederik P. Kingma and Jimmy Lei Ba. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1412.6980v5, 2015.
[36]
Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations. arXiv:1312.6114v10, 2014.
[37]
Alex Krizhevsky, Ilya Sutskever, and Geo_rey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1097-1105. Curran Associates, Inc., 2012.
[38]
Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, pages 2708-2716. Curran Associates, Inc., 2012.
[39]
Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 15:29-37, 2011.
[40]
Yann LeCun, Yoshua Bengio, and Geo_rey E. Hinton. Deep learning. Nature, 521(7553): 436-444, 2015.
[41]
Yujia Li, Kevin Swersky, and Richard S. Zemel. Generative moment matching networks. Proceedings of the 32nd International Conference on Machine Learning, JMLR W&CP, 37:1718-1727, 2015.
[42]
Benjamin Marlin, Kevin Swersky, Bo Chen, and Nando de Freitas. Inductive principles for restricted Boltzmann machine learning. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
[43]
D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision, volume 2, pages 416-423. IEEE, July 2001.
[44]
Grégoire Montavon and Klaus-Robert Müller. Deep Boltzmann machines and the centering trick. In Neural Networks: Tricks of the Trade, Second Edition, pages 621-637. Springer, 2012.
[45]
Radford M. Neal. Connectionist learning of belief networks. Artificial Intelligence, 56:71-113, 1992.
[46]
Jiquan Ngiam, Zhenghao Chen, Pang Wei Koh, and Andrew Y. Ng. Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning, pages 1105-1112. Omnipress, 2011.
[47]
Aäron Van Den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. Proceedings of the 33rd International Conference on Machine Learning, JMLR W&CP, 2016. To appear. arXiv:1601.06759v2.
[48]
Dirk Ormoneit and Volker Tresp. Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In Advances in Neural Information Processing Systems 8, pages 542-548. MIT Press, 1995.
[49]
Tapani Raiko, Li Yao, Kyunghyun Cho, and Yoshua Bengio. Iterative neural autoregressive distribution estimator (NADE-k). In Advances in Neural Information Processing Systems 27, pages 325-333. Curran Associates, Inc., 2014.
[50]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32:1278-1286, 2014.
[51]
Ruslan Salakhutdinov. Learning in Markov random fields using tempered transitions. In Advances in Neural Information Processing Systems 22, pages 1598-1606. Curran Associates, Inc., 2009.
[52]
Ruslan Salakhutdinov. Learning deep Boltzmann machines using adaptive MCMC. In Proceedings of the 27th International Conference on Machine Learning, pages 943-950. Omnipress, 2010.
[53]
Ruslan Salakhutdinov and Geo_rey E. Hinton. Deep Boltzmann machines. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 5:448-455, 2009.
[54]
Ruslan Salakhutdinov and Hugo Larochelle. Efficient learning of deep Boltzmann machines. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 9:693-700, 2010.
[55]
Ruslan Salakhutdinov and Iain Murray. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning, pages 872-879. Omnipress, 2008.
[56]
Ricardo Silva, Charles Blundell, and Yee Whye Teh. Mixed cumulative distribution networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 15:670-678, 2011.
[57]
Paul Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing: Volume 1: Foundations, volume 1, chapter 6, pages 194-281. MIT Press, Cambridge, 1986.
[58]
Padhraic Smyth and David Wolpert. Linearly combining density estimators via stacking. Machine Learning, 36(1-2):59-83, 1999.
[59]
Jascha Sohl-Dickstein, Peter Battaglino, and Michael R. DeWeese. Minimum probability ow learning. In Proceedings of the 28th International Conference on Machine Learning, pages 905-912. Omnipress, 2011.
[60]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. In Proceedings of the 3rd International Conference on Learning Representations. arXiv:1412.6806v3, 2015.
[61]
Yichuan Tang, Ruslan Salakhutdinov, and Geo_rey E. Hinton. Deep mixtures of factor analysers. In Proceedings of the 29th International Conference on Machine Learning, pages 505-512. Omnipress, 2012.
[62]
The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, et al. Theano: A python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016.
[63]
Lucas Theis and Matthias Bethge. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems 28, pages 1927-1935. Curran Associates, Inc., 2015.
[64]
Tijmen Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning, pages 1064-1071. Omnipress, 2008.
[65]
Tijmen Tieleman and Geo_rey E. Hinton. Using fast weights to improve persistent contrastive divergence. In Proceedings of the 26th International Conference on Machine Learning, pages 1033-1040. Omnipress, 2009.
[66]
Benigno Uria. Connectionist multivariate density-estimation and its application to speech synthesis. PhD thesis, The University of Edinburgh, 2015.
[67]
Benigno Uria, Iain Murray, and Hugo Larochelle. RNADE: The real-valued neural autoregressive density-estimator. In Advances in Neural Information Processing Systems 26, pages 2175-2183. Curran Associates, Inc., 2013.
[68]
Benigno Uria, Iain Murray, and Hugo Larochelle. A deep and tractable density estimator. Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP, 32: 467-475, 2014.
[69]
Jakob Verbeek. Mixture of factor analyzers Matlab implementation, 2005. http://lear.inrialpes.fr/~verbeek/software.php.
[70]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096-1103. Omnipress, 2008.
[71]
Max Welling, Michal Rosen-Zvi, and Geo_rey E. Hinton. Exponential family harmoniums with an application to information retrieval. In Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, 2005.
[72]
Laurent Younes. Parameter inference for imperfectly observed Gibbsian fields. Probability Theory Related Fields, 82:625-645, 1989.
[73]
Yin Zheng, Richard S. Zemel, Yu-Jin Zhang, and Hugo Larochelle. A neural autoregressive approach to attention-based recognition. International Journal of Computer Vision, 113 (1):67-79, 2015a.
[74]
Yin Zheng, Yu-Jin Zhang, and Hugo Larochelle. A deep and autoregressive approach for topic modeling of multimodal data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(6):1056-1069, 2015b.
[75]
Daniel Zoran and Yair Weiss. From learning models of natural image patches to whole image restoration. In International Conference on Computer Vision, pages 479-486. IEEE, 2011.
[76]
Daniel Zoran and Yair Weiss. Natural images, Gaussian mixtures and dead leaves. In Advances in Neural Information Processing Systems 25, pages 1745-1753. Curran Associates, Inc., 2012.

Cited By

View all
  • (2024)Constructing a Segmentation Model for Patients with Leukoaraiosis Using Deep Learning AlgorithmsProceedings of the 2024 8th International Conference on Medical and Health Informatics10.1145/3673971.3674020(193-199)Online publication date: 17-May-2024
  • (2023)SutraNetsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667452(30518-30533)Online publication date: 10-Dec-2023
  • (2023)Methodical Systematic Review of Abstractive Summarization and Natural Language Processing Models for Biomedical Health Informatics: Approaches, Metrics and ChallengesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3600230Online publication date: 31-May-2023
  • Show More Cited By
  1. Neural autoregressive distribution estimation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 17, Issue 1
    January 2016
    8391 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Published: 01 January 2016
    Published in JMLR Volume 17, Issue 1

    Author Tags

    1. deep learning
    2. density modeling
    3. neural networks
    4. unsupervised learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Constructing a Segmentation Model for Patients with Leukoaraiosis Using Deep Learning AlgorithmsProceedings of the 2024 8th International Conference on Medical and Health Informatics10.1145/3673971.3674020(193-199)Online publication date: 17-May-2024
    • (2023)SutraNetsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667452(30518-30533)Online publication date: 10-Dec-2023
    • (2023)Methodical Systematic Review of Abstractive Summarization and Natural Language Processing Models for Biomedical Health Informatics: Approaches, Metrics and ChallengesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3600230Online publication date: 31-May-2023
    • (2023)Learning instance-level N-ary semantic knowledge at scale for robots operating in everyday environmentsAutonomous Robots10.1007/s10514-023-10099-447:5(529-547)Online publication date: 6-Apr-2023
    • (2023)Neural Graphical ModelsSymbolic and Quantitative Approaches to Reasoning with Uncertainty10.1007/978-3-031-45608-4_22(284-307)Online publication date: 19-Sep-2023
    • (2023)SUNMASK: Mask Enhanced Control in Step Unrolled Denoising AutoencodersArtificial Intelligence in Music, Sound, Art and Design10.1007/978-3-031-29956-8_10(148-163)Online publication date: 12-Apr-2023
    • (2022)Unpaired image-to-image translation with density changing regularizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602339(28545-28558)Online publication date: 28-Nov-2022
    • (2022)C2FARProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601862(21900-21915)Online publication date: 28-Nov-2022
    • (2022)Training and inference on any-order autoregressive models the right wayProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600470(2762-2775)Online publication date: 28-Nov-2022
    • (2022)Low-Rank Characteristic Tensor Density Estimation Part I: FoundationsIEEE Transactions on Signal Processing10.1109/TSP.2022.317560870(2654-2668)Online publication date: 1-Jan-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media