Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Tomaso Poggio ORCID: orcid.org/0000-0002-3944-0455¹,
Hrushikesh Mhaskar^2,3,
Lorenzo Rosasco¹,
Brando Miranda¹ &
…
Qianli Liao¹

17k Accesses
340 Citations
63 Altmetric
8 Mentions
Explore all metrics

Abstract

The paper reviews and extends an emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. Implications of a few key theorems are discussed, together with new results, open problems and conjectures.

Article PDF

Summary and Outlook

Why Does Deep and Cheap Learning Work So Well?

Article 21 July 2017

Performance of Deep Learning Algorithms vs. Shallow Models, in Extreme Conditions - Some Empirical Studies

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

F. Anselmi, L. Rosasco, C. Tan, T. Poggio. Deep Convolutional Networks are Hierarchical Kernel Machines, Center for Brains, Minds and Machines (CBMM) Memo No. 035, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
T. Poggio, L. Rosasco, A. Shashua, N. Cohen, F. Anselmi. Notes on Hierarchical Splines, DCLNs and i-theory, Center for Brains, Minds and Machines (CBMM) Memo No. 037, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
T. Poggio, F. Anselmi, L. Rosasco. I-theory on Depth vs Width: Hierarchical Function Composition, Center for Brains, Minds and Machines (CBMM) Memo No. 041, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
H. Mhaskar, Q. L. Liao, T. Poggio. Learning Real and Boolean Functions: When is Deep Better than Shallow, Center for Brains, Minds and Machines (CBMM) Memo No. 045, The Center for Brains, Minds and Machines, USA, 2016.
Google Scholar
H. N. Mhaskar, T. Poggio. Deep Vs. Shallow Networks: An Approximation Theory Perspective, Center for Brains, Minds and Machines (CBMM) Memo No. 054, The Center for Brains, Minds and Machines, USA, 2016.
MATH Google Scholar
D. L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality. Lecture–Math Challenges of Century, vol. 13, pp. 178–183, 2000.
Google Scholar
Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015.
Article Google Scholar
K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
Article MATH Google Scholar
M. Riesenhuber, T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, vol. 2, no. 11, pp. 1019–1025, 1999.
Article Google Scholar
H. N. Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, vol. 1, no. 1, pp. 61–80, 1993.
Article MathSciNet MATH Google Scholar
C. K. Chui, X. Li, H. Mhaskar. Neural networks for localized approximation. Mathematics of Computation, vol. 63, no. 208, pp. 607–623, 1994.
Article MathSciNet MATH Google Scholar
C. K. Chui, X. Li, H. N. Mhaskar. Limitations of the approximation capabilities of neural networks with one hidden layer. Advances in Computational Mathematics, vol.5, no. 1, pp. 233–243, 1996.
Article MathSciNet MATH Google Scholar
A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, vol. 8, pp. 143–195, 1999.
Article MathSciNet MATH Google Scholar
T. Poggio, S. Smale. The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, vol. 50, no. 5, pp. 537–544, 2003.
MathSciNet MATH Google Scholar
B. Moore, T. Poggio. Representation properties of multilayer feedforward networks. Neural Networks, vol.1, no.S1, pp. 203, 1998.
Google Scholar
R. Livni, S. Shalev-Shwartz, O. Shamir. A provably efficient algorithm for training deep networks. CoRR, abs/1304.7045, 2013.
Google Scholar
O. Delalleau, Y. Bengio. Shallow vs. deep sum-product networks. In Proceedings of Advances in Neural Information Processing Systems 24, NIPS, Granada, Spain, pp. 666–674, 2011.
Google Scholar
G. F. Montufar, R. Pascanu, K. Cho, Y. Bengio. On the number of linear regions of deep neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Denver, USA, pp. 2924–2932, 2014.
Google Scholar
H. N. Mhaskar. Neural networks for localized approximation of real functions. In Proceedings of IEEE-SPWorkshop on Neural Networks for Processing III, pp. 190–196, IEEE, Linthicum Heights, USA, 1993.
Google Scholar
N. Cohen, O. Sharir, A. Shashua. On the expressive power of deep learning: A tensor analysis. arXiv:1509.0500v1, 2015.
Google Scholar
F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, T. Poggio. Unsupervised Learning of Invariant Representations With Low Sample Complexity: The Magic of Sensory Cortex or A New Framework for Machine Learning? Center for Brains, Minds and Machines (CBMM) Memo No. 001, The Center for Brains, Minds and Machines, USA, 2014.
MATH Google Scholar
F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, T. Poggio. Unsupervised learning of invariant representations. Theoretical Computer Science, vol. 633, pp. 112–121, 2016.
Article MathSciNet MATH Google Scholar
T. Poggio, L. Rosaco, A. Shashua, N. Cohen, F. Anselmi. Notes on Hierarchical Splines, DCLNs and i-theory, Center for Brains, Minds and Machines (CBMM) Memo No. 037. The Center for Brains, Minds and Machines, 2015.
Google Scholar
Q. L. Liao, T. Poggio. Bridging the Gaps between Residual Learning, Recurrent Neural Networks and Visual Cortex, Center for Brains, Minds and Machines (CBMM) Memo No. 047, The Center for Brains, Minds and Machines, 2016.
Google Scholar
M. Telgarsky. Representation benefits of deep feedforward networks. arXiv:1509.08101v2, 2015.
Google Scholar
I. Safran, O. Shamir. Depth separation in ReLU networks for approximating smooth non-linear functions. arXiv:1610.09887v1, 2016.
Google Scholar
H. N. Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, vol. 8, no. 1, pp. 164–177, 1996.
Article Google Scholar
E. Corominas, F. S. Balaguer. Conditions for an infinitely differentiable function to be a polynomial. Revista Matemática Hispanoamericana vol. 14, no. 1–2, pp. 26–43, 1954. (in Spanish)
MathSciNet Google Scholar
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. L. Liao. Why and when can deep–but not shallow–networks avoid the curse of dimensionality: A review. arXiv:1611.00740v3, 2016.
Google Scholar
R. A. DeVore, R. Howard C. A. Micchelli. Optimal nonlinear approximation. Manuscripta Mathematica, vol. 63, no. 4, pp. 469–478, 1989.
Article MathSciNet MATH Google Scholar
H. N. Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, vol. 20, no. 4, pp. 561–590, 2004.
Article MathSciNet MATH Google Scholar
F. Bach. Breaking the curse of dimensionality with convex neural networks. arXiv:1412.8690, 2014.
MATH Google Scholar
D. Kingma, J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
Google Scholar
J. Bergstra, Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, vol. 13, no. 1, pp. 281–305, 2012.
MathSciNet MATH Google Scholar
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016.
Google Scholar
R. Eldan, O. Shamir. The power of depth for feedforward neural networks. arXiv:1512.03965v4, 2016.
Google Scholar
H. W. Lin, M. Tegmark. Why does deep and cheap learning work so well? arXiv:1608.08225, 2016.
Google Scholar
J. T. Håstad. Computational Limitations for Small Depth Circuits, Cambridge, MA, USA: MIT Press, 1987.
Google Scholar
N. Linial, Y. Mansour, N. Nisan. Constant depth circuits, Fourier transform, and learnability. Journal of the ACM, vol. 40, no. 3, pp. 607–620, 1993.
Article MathSciNet MATH Google Scholar
Y. Bengio, Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, J. Weston, Eds., Cambridge, MA, USA: MIT Press, 2007.
Google Scholar
Y. Mansour. Learning Boolean functions via the Fourier transform. Theoretical Advances in Neural Computation and Learning, V. Roychowdhury, K. Y. Siu, A. Orlitsky, Eds., pp. 391–424, US: Springer, 1994.
Chapter Google Scholar
M. Anthony, P. Bartlett. Neural Network Learning: Theoretical Foundations, Cambridge, UK: Cambridge University Press, 2002.
MATH Google Scholar
F. Anselmi, L. Rosasco, C. Tan, T. Poggio. Deep Convolutional Networks are Hierarchical Kernel Machines, Center for Brains, Minds and Machines (CBMM) Memo No. 035, The Center for Brains, Minds and Machines, USA, 2015.
Google Scholar
B. M. Lake, R. Salakhutdinov, J. B. Tenenabum. Humanlevel concept learning through probabilistic program induction. Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
Article MathSciNet MATH Google Scholar
A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, vol. 7, no. 1, pp. 117–139, 2016.
MathSciNet MATH Google Scholar
S. Soatto. Steps towards a theory of visual information: Active perception, signal-to-symbol conversion and the interplay between sensing and control. arXiv:1110.2053, 2011.
Google Scholar
T. A. Poggio, F. Anselmi. Visual Cortex and Deep Networks: Learning Invariant Representations, Cambridge, MA, UK: MIT Press, 2016.
Google Scholar
L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications, no. 31, no. 4, pp. 2029–2054, 2010.
Article MathSciNet MATH Google Scholar
S. Shalev-Shwartz, S. Ben-David. Understanding Machine Learning: From Theory to Algorithms, Cambridge, UK: Cambridge University Press, 2014.
Book MATH Google Scholar
T. Poggio, W. Reichardt. On the representation of multiinput systems: Computational properties of polynomial algorithms. Biological Cybernetics, vol. 37, no. 3, 167–186, 1980.
Article MathSciNet MATH Google Scholar
M. L. Minsky, S. A. Papert. Perceptrons: An Introduction to Computational Geometry, Cambridge MA, UK: The MIT Press, 1972.
MATH Google Scholar

Download references

Acknowledgement

The authors thank O. Shamir for useful emails that prompted us to clarify our results in the context of lower bounds.

Author information

Authors and Affiliations

Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Tomaso Poggio, Lorenzo Rosasco, Brando Miranda & Qianli Liao
Department of Mathematics, California Institute of Technology, Pasadena, CA, 91125, USA
Hrushikesh Mhaskar
Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA, 91711, USA
Hrushikesh Mhaskar

Authors

Tomaso Poggio
View author publications
You can also search for this author in PubMed Google Scholar
Hrushikesh Mhaskar
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Rosasco
View author publications
You can also search for this author in PubMed Google Scholar
Brando Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Qianli Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomaso Poggio.

Additional information

This work was supported by the Center for Brains, Minds and Machines (CBMM), NSF STC award CCF (No. 1231216), and ARO (No. W911NF-15-1-0385).

Recommended by Associate Editor Hong Qiao

Tomaso Poggio received the Ph.D. degree in theoretical physics from University of Genoa, Italy in 1971. He is the Eugene McDermott Professor at Department of Brain and Cognitive Sciences, the director of Center for Brains, Minds and Machines, the member of the Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology (MIT), USA. Since 2000, he is a member of the faculty of the McGovern Institute for Brain Research. He was a Wissenschaftlicher assistant in Max Planck Institut für Biologische Kybernetik, Tüebingen, Germany from 1972 until 1981 when he became an associate professor at MIT. He is an honorary member of the Neuroscience Research Program, a member of the American Academy of Arts and Sciences and a Founding Fellow of AAAI. He received several awards such as the Otto-Hahn-Medaille Award of the Max-Planck-Society, the Max Planck Research Award (with M. Fahle), from the Alexander von Humboldt Foundation, the MIT 50K Entrepreneurship Competition Award, the Laurea Honoris Causa from the University of Pavia in 2000 (Volta Bicentennial), the 2003 Gabor Award, the 2009 Okawa prize, the American Association for the Advancement of Science (AAAS) Fellowship (2009) and the Swartz Prize for Theoretical and Computational Neuroscience in 2014. He is one of the most cited computational neuroscientists (with a h-index greater than 100-based on GoogleScholar).

Hrushikesh Mhaskar did his under-graduate studies in Institute of Science, Nagpur, and received the first M. Sc. degree in mathematics from the Indian Institute of Technology, India in 1976. He received the Ph.D. degree in mathematics and M. Sc. degree in computer science from the Ohio State University, USA in 1980. He then joined Cal. State L.A., and was promoted to full professor in 1990. After retirement in 2012, he is now a visiting associate at California Institute of Technology, Research Professor at Claremont Graduate University, and occasionally served as a consultant for Qualcomm. He has published more than 135 refereed articles in the area of approximation theory, potential theory, neural networks, wavelet analysis, and data processing. His book Weighted Polynomial Approximation was published in 1997 by World Scientific, and the book with Dr. D. V. Pai, Fundamentals of Approximation Theory was published by Narosa Publishers, CRC, and Alpha Science in 2000. He serves on the editorial boards of Journal of Approximation Theory, Applied and Computational Harmonic Analysis, and Jaen Journal of Approximation. In addition, he was a co-editor of a special issue of “Advances in Computational Mathematics on Mathematical Aspects of Neural Networks”, two volumes of Journal of Approximation Theory, dedicated to the memory of G. G. Lorentz, as well as two edited collections of research articles: Wavelet Analysis and Applications, Narosa Publishers, 2001, and Frontiers in Interpolation and Approximation, Chapman and Hall/CRC, 2006. He has held visiting positions, as well as given several invited lectures throughout North America, Europe, and Asia. He was awarded the Humboldt Fellowship for research in Germany four times. He was John von Neumann distinguished professor at Technical University of Munich in 2011. He is listed in Outstanding Young Men of America (1985) and Who’s Who in America’s Teachers (1994). His research was supported by the National Science Foundation and the U. S. Army Research Office, the Air Force Office of Scientific Research, the National Security Agency, and the Research and Development Laboratories.

Lorenzo Rosasco received the Ph.D. degree from the University of Genova, Italy in 2006, where he worked under the supervision of Alessandro Verri and Ernesto De Vito in the Statistical Learning and Image Processing Research Unit (SLIPGURU). He is an assistant professor at the University of Genova, Italy. He is also affiliated with the Massachusetts Institute of Technology (MIT), USA, where is a visiting professor, and with the Istituto Italiano di Tecnologia (IIT), Italy where he is an external collaborator. He is leading the efforts to establish the Laboratory for Computational and Statistical Learning (LCSL), born from a collaborative agreement between IIT and MIT. During his Ph.D. degree period, he has been visiting student at the Toyota Technological Institute at Chicago, USA (working with Steve Smale) and at the Center for Biological and Computational Learning (CBCL) at MIT–working with Tomaso Poggio. Between 2006 and 2009, he was a postdoctoral fellow at CBCL working with Tomaso Poggio.

His research interests include theory and algorithms for machine learning. He has developed and analyzed methods to learn from small as well as large samples of high dimensional data, using analytical and probabilistic tools, within a multidisciplinary approach drawing concepts and techniques primarily from computer science but also from statistics, engineering and applied mathematics.

Brando Miranda received the B. Sc. degree in electrical engineering and computer science (EECS) and the M. Eng. degree (supervised by Professor Tomaso Poggio) in machine learning from Massachusetts Institute of Technology (MIT), USA in 2014 and 2016, respectively.

His research interests include machine learning, statistics, neural networks, theories in deep learning and applied mathematics.

Qianli Liao is a Ph. D. degree candidate in electrical engineering and computer science (EECS) at Massachusetts Institute of Technology (MIT), USA, supervised by Professor Tomaso Poggio.

His research interests include machine learning, optimization, neural networks, theoretical deep learning, computer vision, visual object/face recognition, biologically-plausible and brain-inspired learning.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Poggio, T., Mhaskar, H., Rosasco, L. et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 14, 503–519 (2017). https://doi.org/10.1007/s11633-017-1054-2

Download citation

Received: 03 November 2016
Accepted: 12 December 2016
Published: 14 March 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11633-017-1054-2

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Abstract

Article PDF