Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Newton Methods for Convolutional Neural Networks

Published: 25 January 2020 Publication History

Abstract

Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but most existing studies consider only fully connected feedforward neural networks. These studies do not investigate some more commonly used networks such as Convolutional Neural Networks (CNN). One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks, including the evaluation of function, gradient, Jacobian, and Gauss-Newton matrix-vector products. These basic components are very important not only for practical implementation but also for developing variants of Newton methods for CNN. We show that an efficient MATLAB implementation can be done in just several hundred lines of code. Preliminary experiments indicate that Newton methods are less sensitive to parameters than the stochastic gradient approach.

References

[1]
Aleksandar Botev, Hippolyt Ritter, and David Barber. 2017. Practical Gauss-Newton optimisation for deep learning. In Proceedings of the 34th International Conference on Machine Learning. 557--565.
[2]
Richard H. Byrd, Gillian M. Chin, Will Neveitt, and Jorge Nocedal. 2011. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21, 3 (2011), 977--995.
[3]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS) 25. 1223--1231.
[4]
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain S. Duff. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (1990), 1--17.
[5]
Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In Proceedings of the International Conference on Machine Learning. 573--582.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15).
[7]
Xi He, Dheevatsa Mudigere, Mikhail Smelyanskiy, and Martin Takáč. 2016. Distributed hessian-free optimization for deep neural network. In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence.
[8]
Magnus Rudolph Hestenes and Eduard Stiefel. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 49, 1 (1952), 409--436.
[9]
Ryan Kiros. 2013. Training neural networks with stochastic Hessian-free optimization. arXiv preprint arXiv:1301.3641.
[10]
Alex Krizhevsky and Geoffrey E. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.
[11]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). 1097--1105.
[12]
Quoc V. Le, Jiquan Ngiam, Adam Coates, Abhik Lahiri, Bobby Prochnow, and Andrew Y. Ng. 2011. On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning. 265--272.
[13]
Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neur. Comput. 1, 4 (1989), 541--551.
[14]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324.
[15]
Yann LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 1998. Efficient backprop. In Neural Networks: Tricks of the Trade. SpringerVerlag, 9--50. http://leon.bottou.org/papers/lecun-98x.
[16]
Yann LeCun, Fu Jie Huang, and Léon Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 97--104.
[17]
Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math. 2, 2 (1944), 164--168.
[18]
Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. 2008. Trust region Newton method for large-scale logistic regression. J. Mach. Learn. Res. 9 (2008), 627--650.
[19]
Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math. 11, 2 (1963), 431--441.
[20]
James Martens. 2010. Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML’10).
[21]
James Martens and Ilya Sutskever. 2012. Training deep and recurrent networks with Hessian-free optimization. In Neural Networks: Tricks of the Trade. Springer, 479--535.
[22]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
[23]
Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neur. Netw. 12, 1 (1999), 145--151.
[24]
Nicol N. Schraudolph. 2002. Fast curvature matrix-vector products for second-order gradient descent. Neur. Comput. 14, 7 (2002), 1723--1738.
[25]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[26]
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1139--1147.
[27]
Andrea Vedaldi and Karel Lenc. 2015. MatConvNet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia. 689--692.
[28]
Oriol Vinyals and Daniel Povey. 2012. Krylov subspace descent for deep learning. In Proceedings of the Annual Conference on Artificial Intelligence and Statistics. 1261--1268.
[29]
Chien-Chih Wang, Chun-Heng Huang, and Chih-Jen Lin. 2015. Subsampled Hessian Newton methods for supervised learning. Neur. Comput. 27, 8 (2015), 1766--1795. http://www.csie.ntu.edu.tw/∼cjlin/papers/sub_hessian/sample_hessian.pdf.
[30]
Chien-Chih Wang, Kent-Loong Tan, Chun-Ting Chen, Yu-Hsiang Lin, S. Sathiya Keerthi, Dhruv Mahajan, Sellamanickam Sundararajan, and Chih-Jen Lin. 2018. Distributed Newton methods for deep learning. Neur. Comput. 30, 6 (2018), 1673--1724. http://www.csie.ntu.edu.tw/∼cjlin/papers/dnn/dsh.pdf.
[31]
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems. 4148--4158.
[32]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. 818--833.

Cited By

View all
  • (2021)Adversarial Learning with Mask Reconstruction for Text-Guided Image InpaintingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475506(3464-3472)Online publication date: 17-Oct-2021
  • (2021)TLDS: A Transfer-Learning-Based Delivery Station Location Selection PipelineACM Transactions on Intelligent Systems and Technology10.1145/346908412:4(1-24)Online publication date: 12-Aug-2021
  • (2021)Cross-Modality Transfer Learning for Image-Text Information ManagementACM Transactions on Management Information Systems10.1145/346432413:1(1-14)Online publication date: 5-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 2
Survey Paper and Regular Paper
April 2020
274 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3379210
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 January 2020
Accepted: 01 October 2019
Revised: 01 September 2019
Received: 01 July 2019
Published in TIST Volume 11, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolution neural networks
  2. large-scale classification
  3. newton methods
  4. subsampled Hessian

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • MOST of Taiwan via

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Adversarial Learning with Mask Reconstruction for Text-Guided Image InpaintingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475506(3464-3472)Online publication date: 17-Oct-2021
  • (2021)TLDS: A Transfer-Learning-Based Delivery Station Location Selection PipelineACM Transactions on Intelligent Systems and Technology10.1145/346908412:4(1-24)Online publication date: 12-Aug-2021
  • (2021)Cross-Modality Transfer Learning for Image-Text Information ManagementACM Transactions on Management Information Systems10.1145/346432413:1(1-14)Online publication date: 5-Oct-2021
  • (2021)AdaRNNProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482315(402-411)Online publication date: 26-Oct-2021
  • (2021)A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary LearningACM Transactions on Intelligent Systems and Technology10.1145/344768712:3(1-19)Online publication date: 5-May-2021
  • (2021)MetaStore: A Task-adaptative Meta-learning Model for Optimal Store Placement with Multi-city Knowledge TransferACM Transactions on Intelligent Systems and Technology10.1145/344727112:3(1-23)Online publication date: 21-Apr-2021
  • (2021)Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.01051(10649-10658)Online publication date: Jun-2021
  • (2020)Understanding User Behavior in Car Sharing Services Through The Lens of MobilityProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34322004:4(1-30)Online publication date: 18-Dec-2020
  • (2020)Pricing-aware Real-time Charging Scheduling and Charging Station Expansion for Large-scale Electric BusesACM Transactions on Intelligent Systems and Technology10.1145/342808012:1(1-26)Online publication date: 25-Nov-2020
  • (2020)Predicting users' continued engagement in online health communities from the quantity and quality of received supportJournal of the Association for Information Science and Technology10.1002/asi.2443672:6(710-722)Online publication date: 3-Dec-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media