Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning

Thomas Goerttler¹⁰,
Philipp Pirlet¹⁰ &
Klaus Obermayer^10,11

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 919))

Included in the following conference series:

Future of Information and Communication Conference

430 Accesses

Abstract

In this paper, we explore the necessity of meta-training the final layer of the network in model-agnostic meta-learning (MAML) for few-shot learning. Previous research has shown that updating only the final layer during fine-tuning can improve performance. We go beyond this by randomly re-initializing the final layer before optimizing the inner loop and not updating its weight in the meta-step to discover the necessity of pre-training the last layer. Our findings indicate that pre-training the final layer is slightly beneficial when the task distribution does not change between training and testing. However, our novel approach excels in cross-domain adaptation when the tasks change during testing. Re-initializing the final layer forces the body of the network to learn better representations. We perform experiments on various in-domain, cross-domain setups, and mixed-way scenarios and conduct a representation similarity analysis to analyze these networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal meta-learning through meta-learned task representations

Article Open access 23 February 2024

Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning

Simultaneous Perturbation Method for Multi-task Weight Optimization in One-Shot Meta-learning

Notes

1.
Since ANIL does not change earlier parts, we could not interpret its change there if we did not update the body.
2.
The label assignment is done randomly during the task generation, wherefore the label assignment has to be always fine-tuned

References

Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for Meta-Learning research, August 2020
Google Scholar
Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C.F., Huang, J.-B.: A closer look at few-shot classification. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net (2019)
Google Scholar
Deleu, T., Würfl, T., Samiei, M., Cohen, J.P., Bengio, Y.: A Meta-Learning library for PyTorch, Torchmeta (2019)
Google Scholar
Devos, A., Chatel, S., Grossglauser, M.: Reproducing meta-learning with differentiable closed-form solvers. In: Reproducibility in Machine Learning, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019. OpenReview.net (2019)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 1126–1135 (2017)
Google Scholar
Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4367–4375. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Mike Titterington, D. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, vol. 9. JMLR Proceedings, pp. 249–256. JMLR.org (2010)
Google Scholar
Goerttler, T., Obermayer, K.: Exploring the similarity of representations in model-agnostic meta-learning. In: Learning to Learn - Workshop at ICLR 2021 (2021)
Google Scholar
Hochreiter, S., Steven Younger, A., Conwell, P.R.: Learning to learn using gradient descent. In: Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21–25, 2001 Proceedings, pp. 87–94 (2001)
Google Scholar
Kao, C.-H., Chiu, W.-C., Chen, P.-Y.: MAML is a noisy contrastive learner in classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)
Google Scholar
Kornblith, S., Norouzi, M., Lee, H., Hinton, G.: Similarity of neural network representations revisited. In: International Conference on Machine Learning, pp. 3519–3529. PMLR (2019)
Google Scholar
Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2113–2122 (2015)
Google Scholar
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings (2018)
Google Scholar
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR (2018)
Google Scholar
Oh, J., Yoo, H., Kim, C., Yun, S.-Y.: BOIL: towards representation change for few-shot learning. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021)
Google Scholar
Oreshkin, B.N., López, P.R., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, pp. 719–729 (2018)
Google Scholar
Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 5822–5830. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157 (2019)
Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 (2019)
Google Scholar
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Salakhutdinov, R., Tenenbaum, J.B., Torralba, A.: One-shot learning with a hierarchical nonparametric bayesian model. In: Unsupervised and Transfer Learning - Workshop held at ICML 2011 (2012)
Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.P.: Meta-learning with memory-augmented neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (2016)
Google Scholar
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Guyon, I. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 4077–4087 (2017)
Google Scholar
Thrun, S., Pratt, L.Y.: Learning to learn: Introduction and overview. In: Learning to Learn, pp. 3–17. Springer (1998)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie., S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Ye, H.-J., Chao, W.-L.: How to train your MAML to excel in few-shot classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Berlin, Chair of Neural Information Processing, Berlin, Germany
Thomas Goerttler, Philipp Pirlet & Klaus Obermayer
Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
Klaus Obermayer

Authors

Thomas Goerttler
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Pirlet
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Obermayer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Goerttler .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goerttler, T., Pirlet, P., Obermayer, K. (2024). Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-031-53960-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-53960-2_31
Published: 21 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53959-6
Online ISBN: 978-3-031-53960-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal meta-learning through meta-learned task representations

Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning

Simultaneous Perturbation Method for Multi-task Weight Optimization in One-Shot Meta-learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal meta-learning through meta-learned task representations

Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning

Simultaneous Perturbation Method for Multi-task Weight Optimization in One-Shot Meta-learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation