Abstract
Model-agnostic meta-learning (MAML) highlights the ability to quickly adapt to new tasks with only a small amount of labeled training data among many few-shot learning algorithms. However, the computational complexity is high, because the MAML algorithm generates a large number of second-order parameters in the secondary gradient update. In addition, due to the non-convex nature of the neural network, the loss landscape has many flat areas, leading to slow convergence during training, and excessively long training. In this paper, a second-order optimization method called Kronecker-factored Approximate Curvature (K-FAC) is proposed to approximate Natural Gradient Descent. K-FAC reduces the computational complexity by approximating the large matrix of the Fisher information as the Kronecker product of two much smaller matrices, and the second-order parameter information is fully utilized to accelerate the convergence. Moreover, in order to solve the problem that Natural Gradient Descent is sensitive to the learning rate, this paper proposes Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning (AK-MAML), which automatically adjusts the learning rate according to the curvature and improves the efficiency of training. Experimental results show that AK-MAML has the ability of faster convergence, lower computation, and higher accuracy on few-shot datasets.
Similar content being viewed by others
Data availability
All data are available from authors upon reasonable request.
References
Xie, Y., Wang, H., Yu, B., Zhang, C.: Secure collaborative few-shot learning. Knowl.-Based Syst. 203(7553), 106157 (2020)
Xu, Z., Chen, X., Tang, W., Lai, J., Cao, L.: Meta weight learning via model-agnostic meta-learning. Neurocomputing 432(7587), 124 (2020)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. PMLR, pp. 1126–1135 (2017)
Zhang, G., Martens, J., Grosse, R.B.: Fast convergence of natural gradient descent for over-parameterized neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. In: International Conference on Machine Learning. PMLR, pp. 2408–2417 (2015)
Wan, W.: Implementing online natural gradient learning: problems and solutions. IEEE Trans. Neural Netw. 17(2), 317–329 (2006). https://doi.org/10.1109/TNN.2005.863406
Antoniou, A., Edwards, H., Storkey, A.: How to train your MAML (2018). arXiv:1810.09502
Li, Z., Zhou, F., Chen, F., et al.: Meta-sgd: learning to learn quickly for few-shot learning (2017). arXiv:1707.09835
Agarwal, N., Bullins, B., Hazan, E.: Second order stochastic optimization in linear time. J. Mach. Learn. Res. 18, 1–40 (2016)
Truong, T.T., To, T.D., Nguyen, T.H., et al.: A fast and simple modification of Newton’s method helping to avoid saddle points (2020). arXiv:2006.01512
Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223 (2016)
Ya-Xiang, Y.: A modified bfgs algorithm for unconstrained optimization. Ima J. Numer. Anal. 3, 325–332 (1991)
Yao, Z., Gholami, A., Shen, S., Keutzer, K., Mahoney, M.W.: Adahessian: an adaptive second order optimizer for machine learning. AAAI (2020). https://doi.org/10.1609/aaai.v35i12.17275
Gupta, V., Koren, T., Singer, Y:. Shampoo: preconditioned stochastic tensor optimization. In: International Conference on Machine Learning. PMLR, pp. 1842–1850 (2018)
Osawa, K., Tsuji, Y., Ueno, Y., et al.: Large-scale distributed second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12359–12367 (2019)
Zhang, Z., Su, X., Ding, L., et al.: Multi-scale image segmentation of coal piles on a belt based on the Hessian matrix. Particuology 11(5), 549–555 (2013)
Barfoot, T.D.: Multivariate Gaussian variational inference by natural gradient descent (2020). arXiv:2001.10025
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. 26–31 (2012)
Ager, S.: Omniglot: the online encyclopedia of writing systems & languages. Simon Ager (1998)
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database[C] 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009: 248–255
Acknowledgements
This work was supported by the Fundamental Research Funds for the Central Universities B220202019, Changzhou Sci&Tech Program (Grant No. CJ20210092), Young Talent Development Plan of Changzhou Health Commission (Grant No. CZQM2020025), and the Key Research and Development Program of Jiangsu under grants BK20192004, BE2018004-04.
Author information
Authors and Affiliations
Contributions
All the authors have contributed to the research of this paper. Ce Zhang proposed the scheme and carried out the experiment under the guidance of Xiao Yao. Xiao Yao and Ce Zhang wrote the first draft together. Changfeng Shi and Min Gu are responsible for data sorting and analysis. All the authors read the article and put forward suggestions for revision.
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that they have no conflict of interest.
Additional information
Communicated by A. Sur.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, C., Yao, X., Shi, C. et al. Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning. Multimedia Systems 29, 3169–3177 (2023). https://doi.org/10.1007/s00530-023-01159-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01159-x