Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Model-agnostic meta-learning (MAML) highlights the ability to quickly adapt to new tasks with only a small amount of labeled training data among many few-shot learning algorithms. However, the computational complexity is high, because the MAML algorithm generates a large number of second-order parameters in the secondary gradient update. In addition, due to the non-convex nature of the neural network, the loss landscape has many flat areas, leading to slow convergence during training, and excessively long training. In this paper, a second-order optimization method called Kronecker-factored Approximate Curvature (K-FAC) is proposed to approximate Natural Gradient Descent. K-FAC reduces the computational complexity by approximating the large matrix of the Fisher information as the Kronecker product of two much smaller matrices, and the second-order parameter information is fully utilized to accelerate the convergence. Moreover, in order to solve the problem that Natural Gradient Descent is sensitive to the learning rate, this paper proposes Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning (AK-MAML), which automatically adjusts the learning rate according to the curvature and improves the efficiency of training. Experimental results show that AK-MAML has the ability of faster convergence, lower computation, and higher accuracy on few-shot datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

All data are available from authors upon reasonable request.

References

  1. Xie, Y., Wang, H., Yu, B., Zhang, C.: Secure collaborative few-shot learning. Knowl.-Based Syst. 203(7553), 106157 (2020)

    Article  Google Scholar 

  2. Xu, Z., Chen, X., Tang, W., Lai, J., Cao, L.: Meta weight learning via model-agnostic meta-learning. Neurocomputing 432(7587), 124 (2020)

    Google Scholar 

  3. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. PMLR, pp. 1126–1135 (2017)

  4. Zhang, G., Martens, J., Grosse, R.B.: Fast convergence of natural gradient descent for over-parameterized neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

  5. Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. In: International Conference on Machine Learning. PMLR, pp. 2408–2417 (2015)

  6. Wan, W.: Implementing online natural gradient learning: problems and solutions. IEEE Trans. Neural Netw. 17(2), 317–329 (2006). https://doi.org/10.1109/TNN.2005.863406

    Article  Google Scholar 

  7. Antoniou, A., Edwards, H., Storkey, A.: How to train your MAML (2018). arXiv:1810.09502

  8. Li, Z., Zhou, F., Chen, F., et al.: Meta-sgd: learning to learn quickly for few-shot learning (2017). arXiv:1707.09835

  9. Agarwal, N., Bullins, B., Hazan, E.: Second order stochastic optimization in linear time. J. Mach. Learn. Res. 18, 1–40 (2016)

    MathSciNet  MATH  Google Scholar 

  10. Truong, T.T., To, T.D., Nguyen, T.H., et al.: A fast and simple modification of Newton’s method helping to avoid saddle points (2020). arXiv:2006.01512

  11. Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Ya-Xiang, Y.: A modified bfgs algorithm for unconstrained optimization. Ima J. Numer. Anal. 3, 325–332 (1991)

    MathSciNet  MATH  Google Scholar 

  13. Yao, Z., Gholami, A., Shen, S., Keutzer, K., Mahoney, M.W.: Adahessian: an adaptive second order optimizer for machine learning. AAAI (2020). https://doi.org/10.1609/aaai.v35i12.17275

    Article  Google Scholar 

  14. Gupta, V., Koren, T., Singer, Y:. Shampoo: preconditioned stochastic tensor optimization. In: International Conference on Machine Learning. PMLR, pp. 1842–1850 (2018)

  15. Osawa, K., Tsuji, Y., Ueno, Y., et al.: Large-scale distributed second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12359–12367 (2019)

  16. Zhang, Z., Su, X., Ding, L., et al.: Multi-scale image segmentation of coal piles on a belt based on the Hessian matrix. Particuology 11(5), 549–555 (2013)

    Article  Google Scholar 

  17. Barfoot, T.D.: Multivariate Gaussian variational inference by natural gradient descent (2020). arXiv:2001.10025

  18. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. 26–31 (2012)

  19. Ager, S.: Omniglot: the online encyclopedia of writing systems & languages. Simon Ager (1998)

  20. Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database[C] 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009: 248–255

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities B220202019, Changzhou Sci&Tech Program (Grant No. CJ20210092), Young Talent Development Plan of Changzhou Health Commission (Grant No. CZQM2020025), and the Key Research and Development Program of Jiangsu under grants BK20192004, BE2018004-04.

Author information

Authors and Affiliations

Authors

Contributions

All the authors have contributed to the research of this paper. Ce Zhang proposed the scheme and carried out the experiment under the guidance of Xiao Yao. Xiao Yao and Ce Zhang wrote the first draft together. Changfeng Shi and Min Gu are responsible for data sorting and analysis. All the authors read the article and put forward suggestions for revision.

Corresponding author

Correspondence to Xiao Yao.

Ethics declarations

Conflict of interest

The authors certify that they have no conflict of interest.

Additional information

Communicated by A. Sur.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Yao, X., Shi, C. et al. Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning. Multimedia Systems 29, 3169–3177 (2023). https://doi.org/10.1007/s00530-023-01159-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01159-x

Keywords

Navigation