Abstract
The memorization problem is a meta-level overfitting phenomenon in meta-learning. The trained model prefers to remember learned tasks instead of adapting to new tasks. This issue limits many meta-learning approaches to generalize. In this paper, we mitigate this limitation issue by proposing multiple supervisions through a multi-objective optimization process. The design leads to a Multi-Input Multi-Output (MIMO) configuration for meta-learning. The model has multiple outputs through different heads. Each head is supervised by a different order of labels for the same task. This leads to different memories, resulting in meta-level conflicts as regularization to avoid meta-overfitting. The resulting MIMO configuration is applicable to all MAML-like algorithms with minor increments in training computation, the inference calculation can be reduced through early-exit policy or better performance can be achieved through low cost ensemble. In experiments, identical model and training settings are used in all test cases, our proposed design is able to suppress the meta-overfitting issue, achieve smoother loss landscapes, and improve generalisation.
K. L. E. Law would appreciate the financial support provided by Macao Polytechnic University through the research funding programme (#RP/ESCA-09/2021).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135 (2017). http://proceedings.mlr.press/v70/finn17a.html
Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://ieeexplore.ieee.org/document/9428530
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Marcus, G.: Deep learning: a critical appraisal. CoRR abs/1801.00631 (2018). https://arxiv.org/abs/1801.00631
Yin, M., Tucker, G., Zhou, M., Levine, S., Finn, C.: Meta-learning without memorization. In: ICLR (2020)
Doveh, S., et al.: MetAdapt: meta-learned task-adaptive architecture for few-shot classification. Pattern Recognit. Lett. 149, 130–136 (2021)
Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of MAML. In: ICLR (2020)
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. CoRR abs/1803.02999 (2018). https://arxiv.org/abs/1803.02999
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Rajendran, J., Irpan, A., Jang, E.: Meta-learning requires meta-augmentation. In: NeurIPS (2020)
Pan, E., Rajak, P., Shrivastava, S.: Meta-regularization by enforcing mutual-exclusiveness. CoRR abs/2101.09819 (2021)
Tian, H., Liu, B., Yuan, X.-T., Liu, Q.: Meta-learning with network pruning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 675–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_40
Zintgraf, L.M., Shiarlis, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: ICML (2019)
Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2021)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020). https://jmlr.org/papers/v21/20-074.html
Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
Metz, L., Maheswaranathan, N., Cheung, B., Sohl-Dickstein, J.: Meta-learning update rules for unsupervised representation learning. In: ICLR (2019)
Alet, F., Schneider, M.F., Lozano-Pérez, T., Kaelbling, L.P.: Meta-learning curiosity algorithms. In: ICLR (2020)
Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: ICML (2018)
Elsken, T., Staffler, B., Metzen, J.H., Hutter, F.: Meta-learning of neural architectures for few-shot learning. In: CVPR (2020)
Finn, C.: Learning to Learn with Gradients. Ph.D. thesis, University of California, Berkeley, USA (2018). https://escholarship.org/uc/item/0987d4n3
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: ICLR (2018)
Yoon, J., Kim, T., Dia, O., Kim, S., Bengio, Y., Ahn, S.: Bayesian model-agnostic meta-learning. In: NeurIPS (2018)
Jamal, M.A., Qi, G.: Task agnostic meta-learning for few-shot learning. In: CVPR, pp. 11719–11727 (2019)
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)
Tseng, H., Chen, Y., Tsai, Y., Liu, S., Lin, Y., Yang, M.: Regularizing meta-learning via gradient dropout. In: ACCV (2020)
Lee, H., Nam, T., Yang, E., Hwang, S.J.: Meta dropout: learning to perturb latent features for generalization. In: ICLR (2020)
Yao, H., et al.: Improving generalization in meta-learning via task augmentation. In: ICML (2021)
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. CoRR abs/2102.04906 (2021). https://arxiv.org/abs/2102.04906
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: ICPR (2016)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR (2017)
Xie, Z., Sato, I., Sugiyama, M.: A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In: Keskar (2021)
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) NeurIPS (2018)
De Bernardi, M.: Loss-landscapes. https://pypi.org/project/loss-landscapes/3.0.6/
Bertinetto, L., Henriques, J.F., Torr, P.H.S., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: ICLR (2019)
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Arnold, S.M.R., Mahajan, P., Datta, D., Bunner, I., Zarkias, K.S.: learn2learn: a library for meta-learning research. CoRR abs/2008.12284 (2020). https://arxiv.org/abs/2008.12284
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, L., Eddie Law, K.L. (2022). Using Multiple Heads to Subsize Meta-memorization Problem. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-15937-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)