Nothing Special   »   [go: up one dir, main page]

Skip to main content

Using Multiple Heads to Subsize Meta-memorization Problem

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Abstract

The memorization problem is a meta-level overfitting phenomenon in meta-learning. The trained model prefers to remember learned tasks instead of adapting to new tasks. This issue limits many meta-learning approaches to generalize. In this paper, we mitigate this limitation issue by proposing multiple supervisions through a multi-objective optimization process. The design leads to a Multi-Input Multi-Output (MIMO) configuration for meta-learning. The model has multiple outputs through different heads. Each head is supervised by a different order of labels for the same task. This leads to different memories, resulting in meta-level conflicts as regularization to avoid meta-overfitting. The resulting MIMO configuration is applicable to all MAML-like algorithms with minor increments in training computation, the inference calculation can be reduced through early-exit policy or better performance can be achieved through low cost ensemble. In experiments, identical model and training settings are used in all test cases, our proposed design is able to suppress the meta-overfitting issue, achieve smoother loss landscapes, and improve generalisation.

K. L. E. Law would appreciate the financial support provided by Macao Polytechnic University through the research funding programme (#RP/ESCA-09/2021).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135 (2017). http://proceedings.mlr.press/v70/finn17a.html

  2. Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://ieeexplore.ieee.org/document/9428530

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  4. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  6. Marcus, G.: Deep learning: a critical appraisal. CoRR abs/1801.00631 (2018). https://arxiv.org/abs/1801.00631

  7. Yin, M., Tucker, G., Zhou, M., Levine, S., Finn, C.: Meta-learning without memorization. In: ICLR (2020)

    Google Scholar 

  8. Doveh, S., et al.: MetAdapt: meta-learned task-adaptive architecture for few-shot classification. Pattern Recognit. Lett. 149, 130–136 (2021)

    Article  Google Scholar 

  9. Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of MAML. In: ICLR (2020)

    Google Scholar 

  10. Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR abs/1803.02999 (2018). https://arxiv.org/abs/1803.02999

  11. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  12. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  13. Rajendran, J., Irpan, A., Jang, E.: Meta-learning requires meta-augmentation. In: NeurIPS (2020)

    Google Scholar 

  14. Pan, E., Rajak, P., Shrivastava, S.: Meta-regularization by enforcing mutual-exclusiveness. CoRR abs/2101.09819 (2021)

    Google Scholar 

  15. Tian, H., Liu, B., Yuan, X.-T., Liu, Q.: Meta-learning with network pruning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 675–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_40

    Chapter  Google Scholar 

  16. Zintgraf, L.M., Shiarlis, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: ICML (2019)

    Google Scholar 

  17. Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2021)

    Google Scholar 

  18. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  19. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020). https://jmlr.org/papers/v21/20-074.html

  20. Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1

    Chapter  MATH  Google Scholar 

  21. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: NeurIPS (2017)

    Google Scholar 

  22. Metz, L., Maheswaranathan, N., Cheung, B., Sohl-Dickstein, J.: Meta-learning update rules for unsupervised representation learning. In: ICLR (2019)

    Google Scholar 

  23. Alet, F., Schneider, M.F., Lozano-Pérez, T., Kaelbling, L.P.: Meta-learning curiosity algorithms. In: ICLR (2020)

    Google Scholar 

  24. Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: ICML (2018)

    Google Scholar 

  25. Elsken, T., Staffler, B., Metzen, J.H., Hutter, F.: Meta-learning of neural architectures for few-shot learning. In: CVPR (2020)

    Google Scholar 

  26. Finn, C.: Learning to Learn with Gradients. Ph.D. thesis, University of California, Berkeley, USA (2018). https://escholarship.org/uc/item/0987d4n3

  27. Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: ICLR (2018)

    Google Scholar 

  28. Yoon, J., Kim, T., Dia, O., Kim, S., Bengio, Y., Ahn, S.: Bayesian model-agnostic meta-learning. In: NeurIPS (2018)

    Google Scholar 

  29. Jamal, M.A., Qi, G.: Task agnostic meta-learning for few-shot learning. In: CVPR, pp. 11719–11727 (2019)

    Google Scholar 

  30. Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)

    Google Scholar 

  31. Tseng, H., Chen, Y., Tsai, Y., Liu, S., Lin, Y., Yang, M.: Regularizing meta-learning via gradient dropout. In: ACCV (2020)

    Google Scholar 

  32. Lee, H., Nam, T., Yang, E., Hwang, S.J.: Meta dropout: learning to perturb latent features for generalization. In: ICLR (2020)

    Google Scholar 

  33. Yao, H., et al.: Improving generalization in meta-learning via task augmentation. In: ICML (2021)

    Google Scholar 

  34. Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. CoRR abs/2102.04906 (2021). https://arxiv.org/abs/2102.04906

  35. Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: ICPR (2016)

    Google Scholar 

  36. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR (2017)

    Google Scholar 

  37. Xie, Z., Sato, I., Sugiyama, M.: A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In: Keskar (2021)

    Google Scholar 

  38. Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) NeurIPS (2018)

    Google Scholar 

  39. De Bernardi, M.: Loss-landscapes. https://pypi.org/project/loss-landscapes/3.0.6/

  40. Bertinetto, L., Henriques, J.F., Torr, P.H.S., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: ICLR (2019)

    Google Scholar 

  41. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019)

    Google Scholar 

  42. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)

    Google Scholar 

  43. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

  44. Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for meta-learning research. CoRR abs/2008.12284 (2020). https://arxiv.org/abs/2008.12284

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Eddie Law, K.L. (2022). Using Multiple Heads to Subsize Meta-memorization Problem. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15937-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15936-7

  • Online ISBN: 978-3-031-15937-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics