Nothing Special   »   [go: up one dir, main page]

Skip to main content

Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning

  • Conference paper
  • First Online:
Advances in Information and Communication (FICC 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 919))

Included in the following conference series:

  • 430 Accesses

Abstract

In this paper, we explore the necessity of meta-training the final layer of the network in model-agnostic meta-learning (MAML) for few-shot learning. Previous research has shown that updating only the final layer during fine-tuning can improve performance. We go beyond this by randomly re-initializing the final layer before optimizing the inner loop and not updating its weight in the meta-step to discover the necessity of pre-training the last layer. Our findings indicate that pre-training the final layer is slightly beneficial when the task distribution does not change between training and testing. However, our novel approach excels in cross-domain adaptation when the tasks change during testing. Re-initializing the final layer forces the body of the network to learn better representations. We perform experiments on various in-domain, cross-domain setups, and mixed-way scenarios and conduct a representation similarity analysis to analyze these networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Since ANIL does not change earlier parts, we could not interpret its change there if we did not update the body.

  2. 2.

    The label assignment is done randomly during the task generation, wherefore the label assignment has to be always fine-tuned

References

  1. Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for Meta-Learning research, August 2020

    Google Scholar 

  2. Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C.F., Huang, J.-B.: A closer look at few-shot classification. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net (2019)

    Google Scholar 

  3. Deleu, T., Würfl, T., Samiei, M., Cohen, J.P., Bengio, Y.: A Meta-Learning library for PyTorch, Torchmeta (2019)

    Google Scholar 

  4. Devos, A., Chatel, S., Grossglauser, M.: Reproducing meta-learning with differentiable closed-form solvers. In: Reproducibility in Machine Learning, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019. OpenReview.net (2019)

    Google Scholar 

  5. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 1126–1135 (2017)

    Google Scholar 

  6. Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4367–4375. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  7. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Mike Titterington, D. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, vol. 9. JMLR Proceedings, pp. 249–256. JMLR.org (2010)

    Google Scholar 

  8. Goerttler, T., Obermayer, K.: Exploring the similarity of representations in model-agnostic meta-learning. In: Learning to Learn - Workshop at ICLR 2021 (2021)

    Google Scholar 

  9. Hochreiter, S., Steven Younger, A., Conwell, P.R.: Learning to learn using gradient descent. In: Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21–25, 2001 Proceedings, pp. 87–94 (2001)

    Google Scholar 

  10. Kao, C.-H., Chiu, W.-C., Chen, P.-Y.: MAML is a noisy contrastive learner in classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)

    Google Scholar 

  11. Kornblith, S., Norouzi, M., Lee, H., Hinton, G.: Similarity of neural network representations revisited. In: International Conference on Machine Learning, pp. 3519–3529. PMLR (2019)

    Google Scholar 

  12. Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2113–2122 (2015)

    Google Scholar 

  13. Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings (2018)

    Google Scholar 

  14. Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR (2018)

    Google Scholar 

  15. Oh, J., Yoo, H., Kim, C., Yun, S.-Y.: BOIL: towards representation change for few-shot learning. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021)

    Google Scholar 

  16. Oreshkin, B.N., López, P.R., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, pp. 719–729 (2018)

    Google Scholar 

  17. Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 5822–5830. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  18. Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157 (2019)

  19. Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 (2019)

    Google Scholar 

  20. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)

    Google Scholar 

  21. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  22. Salakhutdinov, R., Tenenbaum, J.B., Torralba, A.: One-shot learning with a hierarchical nonparametric bayesian model. In: Unsupervised and Transfer Learning - Workshop held at ICML 2011 (2012)

    Google Scholar 

  23. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.P.: Meta-learning with memory-augmented neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (2016)

    Google Scholar 

  24. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Guyon, I. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 4077–4087 (2017)

    Google Scholar 

  25. Thrun, S., Pratt, L.Y.: Learning to learn: Introduction and overview. In: Learning to Learn, pp. 3–17. Springer (1998)

    Google Scholar 

  26. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)

    Google Scholar 

  27. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie., S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  28. Ye, H.-J., Chao, W.-L.: How to train your MAML to excel in few-shot classification. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Goerttler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Goerttler, T., Pirlet, P., Obermayer, K. (2024). Towards the Necessity of Pre-trained Heads in Model-Agnostic Meta-Learning. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-031-53960-2_31

Download citation

Publish with us

Policies and ethics