Abstract
In this paper, we explore the necessity of meta-training the final layer of the network in model-agnostic meta-learning (MAML) for few-shot learning. Previous research has shown that updating only the final layer during fine-tuning can improve performance. We go beyond this by randomly re-initializing the final layer before optimizing the inner loop and not updating its weight in the meta-step to discover the necessity of pre-training the last layer. Our findings indicate that pre-training the final layer is slightly beneficial when the task distribution does not change between training and testing. However, our novel approach excels in cross-domain adaptation when the tasks change during testing. Re-initializing the final layer forces the body of the network to learn better representations. We perform experiments on various in-domain, cross-domain setups, and mixed-way scenarios and conduct a representation similarity analysis to analyze these networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Since ANIL does not change earlier parts, we could not interpret its change there if we did not update the body.
- 2.
The label assignment is done randomly during the task generation, wherefore the label assignment has to be always fine-tuned
References
Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for Meta-Learning research, August 2020
Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C.F., Huang, J.-B.: A closer look at few-shot classification. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net (2019)
Deleu, T., Würfl, T., Samiei, M., Cohen, J.P., Bengio, Y.: A Meta-Learning library for PyTorch, Torchmeta (2019)
Devos, A., Chatel, S., Grossglauser, M.: Reproducing meta-learning with differentiable closed-form solvers. In: Reproducibility in Machine Learning, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019. OpenReview.net (2019)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 1126–1135 (2017)
Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4367–4375. Computer Vision Foundation/IEEE Computer Society (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Mike Titterington, D. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, vol. 9. JMLR Proceedings, pp. 249–256. JMLR.org (2010)
Goerttler, T., Obermayer, K.: Exploring the similarity of representations in model-agnostic meta-learning. In: Learning to Learn - Workshop at ICLR 2021 (2021)
Hochreiter, S., Steven Younger, A., Conwell, P.R.: Learning to learn using gradient descent. In: Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21–25, 2001 Proceedings, pp. 87–94 (2001)
Kao, C.-H., Chiu, W.-C., Chen, P.-Y.: MAML is a noisy contrastive learner in classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)
Kornblith, S., Norouzi, M., Lee, H., Hinton, G.: Similarity of neural network representations revisited. In: International Conference on Machine Learning, pp. 3519–3529. PMLR (2019)
Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2113–2122 (2015)
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings (2018)
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR (2018)
Oh, J., Yoo, H., Kim, C., Yun, S.-Y.: BOIL: towards representation change for few-shot learning. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021)
Oreshkin, B.N., López, P.R., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, pp. 719–729 (2018)
Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 5822–5830. Computer Vision Foundation/IEEE Computer Society (2018)
Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157 (2019)
Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 (2019)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Salakhutdinov, R., Tenenbaum, J.B., Torralba, A.: One-shot learning with a hierarchical nonparametric bayesian model. In: Unsupervised and Transfer Learning - Workshop held at ICML 2011 (2012)
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.P.: Meta-learning with memory-augmented neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (2016)
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Guyon, I. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 4077–4087 (2017)
Thrun, S., Pratt, L.Y.: Learning to learn: Introduction and overview. In: Learning to Learn, pp. 3–17. Springer (1998)
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie., S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)
Ye, H.-J., Chao, W.-L.: How to train your MAML to excel in few-shot classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Goerttler, T., Pirlet, P., Obermayer, K. (2024). Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-031-53960-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-53960-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53959-6
Online ISBN: 978-3-031-53960-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)