Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MetaKernel: Learning Variational Random Features With Limited Labels

Published: 28 February 2022 Publication History

Abstract

Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot learning, we call MetaKernel. Specifically, we propose learning variational random features in a data-driven manner to obtain task-specific kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting. We treat the random feature basis as the latent variable, which is estimated by variational inference. The shared knowledge from related tasks is incorporated into a context inference of the posterior, which we achieve via a long-short term memory module. To establish more expressive kernels, we deploy conditional normalizing flows based on coupling layers to achieve a richer posterior distribution over random Fourier bases. The resultant kernels are more informative and discriminative, which further improves the few-shot learning. To evaluate our method, we conduct extensive experiments on both few-shot image classification and regression tasks. A thorough ablation study demonstrates that the effectiveness of each introduced component in our method. The benchmark results on fourteen datasets demonstrate MetaKernel consistently delivers at least comparable and often better performance than state-of-the-art alternatives.

References

[1]
X. Zhen et al., “Learning to learn kernels with variational random features,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 11409–11419.
[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Informat. Process. Syst., 2012, pp. 1097–1105.
[3]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[5]
L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, Apr. 2006.
[6]
B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
[7]
S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in Proc. Int. Conf. Learn. Representations, 2017.
[8]
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
[9]
J. Schmidhuber, “Learning to control fast-weight memories: An alternative to dynamic recurrent networks,” Neural Comput., vol. 4, no. 1, pp. 131–139, 1992.
[10]
S. Thrun and L. Pratt, Learning to Learn, Berlin, Germany: Springer, 2012.
[11]
M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” in Proc. 30th Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 3981–3989.
[12]
A. A. Rusu et al., “Meta-learning with latent embedding optimization,” in Proc. Int. Conf. Learn. Representations, 2019.
[13]
O. Vinyals et al., “Matching networks for one shot learning,” in Proc. 30th Int. Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 3630–3638.
[14]
J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Proc. Adv. Neural Informat. Process. Syst., 2017, pp. 4077–4087.
[15]
J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Meta-learning probabilistic inference for prediction,” in Proc. Int. Conf. Learn. Representations, 2019.
[16]
Y. Du, X. Zhen, L. Shao, and C. G. M. Snoek, “MetaNorm: Learning to normalize few-shot batches across domains,” in Proc. Int. Conf. Learn. Representations, 2021.
[17]
B. Schölkopf and A. J. Smola, Learning with Kernels. Cambridge, MA, USA: MIT Press, 2002.
[18]
B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2018.
[19]
T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Ann. Statist., vol. 36, no. 3, pp. 1171–1220, 2008.
[20]
N. Cristianini et al., An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[21]
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statist. Comput., vol. 14, pp. 199–222, 2004.
[22]
A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Proc. 20th Int. Conf. Adv. Neural Informat. Process. Syst., 2007, pp. 1177–1184.
[23]
A. Sinha and J. C. Duchi, “Learning kernels with random features,” in Proc. 30th Int. Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 1298–1306.
[24]
F. R. Bach, G. R. Lanckriet, and M. I. Jordan, “Multiple kernel learning, conic duality, and the smo algorithm,” in Proc. Int. Conf. Mach. Learn., 2004, Art. no.
[25]
J. Hensman, N. Durrande, and A. Solin, “Variational fourier features for gaussian processes,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 5537–5588, 2017.
[26]
L. Carratino, A. Rudi, and L. Rosasco, “Learning with SGD and random features,” in Proc. 32nd Int. Conf. Adv. Neural Informat. Process. Syst., 2018, pp. 10192–10203.
[27]
B. Bullins, C. Zhang, and Y. Zhang, “Not-so-random features,” in Proc. Int. Conf. Learn. Representations, 2018.
[28]
C.-L. Li, W.-C. Chang, Y. Mroueh, Y. Yang, and B. Poczos, “Implicit kernel learning,” in Proc. Int. Conf. Artif. Intell. Statist., 2019, pp. 2007–2016.
[29]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[30]
L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent components estimation,” 2014,.
[31]
L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” Proc. Int. Conf. Learn. Representations, 2017.
[32]
D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1530–1538.
[33]
D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1× 1 convolutions,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 10236–10245.
[34]
C. Winkler, D. Worrall, E. Hoogeboom, and M. Welling, “Learning likelihoods with conditional normalizing flows,” 2019,.
[35]
E. Triantafillou et al., “Meta-dataset: A dataset of datasets for learning to learn from few examples,” 2019,.
[36]
A. Rajeswaran, C. Finn, S. Kakade, and S. Levine, “Meta-learning with implicit gradients,” 2019,.
[37]
T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” 2020,.
[38]
K. R. Allen, E. Shelhamer, H. Shin, and J. B. Tenenbaum, “Infinite mixture prototypes for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 232–241.
[39]
B. Oreshkin, P. R. López, and A. Lacoste, “Tadam: Task dependent adaptive metric for improved few-shot learning,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 721–731.
[40]
S. W. Yoon, J. Seo, and J. Moon, “Tapnet: Neural network augmented with task-adaptive projection for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7115–7123.
[41]
V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,” in Proc. Int. Conf. Learn. Representations, 2018.
[42]
T. Cao, M. Law, and S. Fidler, “A theoretical analysis of the number of shots in few-shot learning,” 2019,.
[43]
X. Zhen, Y. Du, H. Xiong, Q. Qiu, C. G. M. Snoek, and L. Shao, “Learning to learn variational semantic memory,” in Proc. 34th Conf. Adv. Neural Informat. Process. Syst., 2020, pp. 9122–9134.
[44]
L. Zintgraf, K. Shiarli, V. Kurin, K. Hofmann, and S. Whiteson, “Fast context adaptation via meta-learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7693–7702.
[45]
Y. Chen et al., “Learning to learn without gradient descent by gradient descent,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 748–756.
[46]
H. Edwards and A. Storkey, “Towards a neural statistician,” 2016,.
[47]
C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-learning,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 9516–9527.
[48]
S. Sæmundsson, K. Hofmann, and M. P. Deisenroth, “Meta reinforcement learning with latent variable gaussian processes,” 2018,.
[49]
L. Bertinetto, J. F. Henriques, P. H. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Representations, 2019.
[50]
A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1842–1850.
[51]
T. Munkhdalai and H. Yu, “Meta networks,” in Proc. 34th Int. Conf. Int. Conf. Mach. Learn., 2017, pp. 2554–2563.
[52]
T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler, “Rapid adaptation with conditionally shifted neurons,” 2017,.
[53]
C. M. Bishop, Pattern Recognition and Machine Learning. Berlin, Germany: Springer, 2006.
[54]
N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” J. Mach. Learn. Res., vol. 12, no. 9, pp. 2539–2561, 2011.
[55]
M. Gönen and E. Alpaydın, “Multiple kernel learning algorithms,” J. Mach. Learn. Res., vol. 12, pp. 2211–2268, 2011.
[56]
D. Duvenaud, J. R. Lloyd, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, “Structure discovery in nonparametric regression through compositional kernel search,” 2013,.
[57]
T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola, “Multi-instance kernels,” in Proc. 19th Int. Conf. Mach. Learn., 2002, pp. 179–186.
[58]
W. Rudin, Fourier Analysis on Groups. Hoboken, NJ, USA: Wiley, 1962.
[59]
A. Wilson and R. Adams, “Gaussian process kernels for pattern discovery and extrapolation,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1067–1075.
[60]
Z. Yang, A. Wilson, A. Smola, and L. Song, “A la carte–learning fast kernels,” in Proc. 18th Int. Conf. Artif. Intell. Statist., 2015, pp. 1098–1106.
[61]
H. Avron, V. Sindhwani, J. Yang, and M. W. Mahoney, “Quasi-monte carlo feature maps for shift-invariant kernels,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 4096–4133, 2016.
[62]
W.-C. Chang, C.-L. Li, Y. Yang, and B. Poczos, “Data-driven random fourier features using stein effect,” 2017,.
[63]
G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, “Normalizing flows for probabilistic modeling and inference,” J. Mach. Learn. Res., vol. 22, no. 57, pp. 1–64, 2021.
[64]
D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling, “Improved variational inference with inverse autoregressive flow,” Adv. Neural Informat. Process. Syst., vol. 29, pp. 4743–4751, 2016.
[65]
G. Papamakarios, T. Pavlakou, and I. Murray, “Masked autoregressive flow for density estimation,” in Proc. Adv. Neural Informat. Process. Syst., 2017, pp. 2335–2344.
[66]
R. T. Chen, J. Behrmann, D. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” in Proc. 33rd Int. Conf. Adv. Neural Informat. Process. Syst., 2019, pp. 9916–9926.
[67]
J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel, “Flow++: Improving flow-based generative models with variational dequantization and architecture design,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 2722–2730.
[68]
P. Esling et al., “Universal audio synthesizer control with normalizing flows,” 2019,.
[69]
R. Prenger, R. Valle, and B. Catanzaro, “Waveglow: A flow-based generative network for speech synthesis,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 3617–3621.
[70]
J.-H. Jacobsen, A. Smeulders, and E. Oyallon, “i-RevNet: Deep invertible networks,” in Proc. Int. Conf. on Learn. Representations, 2018.
[71]
K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in Proc. 28th Int. Conf. Adv. Neural Informat. Process. Syst., 2015, pp. 3483–3491.
[72]
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2013,.
[73]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” 2014,.
[74]
F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” in Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw., 2000, pp. 189–194.
[75]
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
[76]
A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Netw., vol. 18, no. 5–6, pp. 602–610, 2005.
[77]
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” in Proc. 31st Int. Conf. Adv. Neural Informat. Process. Syst., 2017, pp. 3394–3404.
[78]
H. Kim et al., “Attentive neural processes,” in Proc. Int. Conf. Learn. Representations, 2019.
[79]
A. Krizhevsky, “Learning multiple layers of features from tiny images,” Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
[80]
X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1406–1415.
[81]
O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Proc. Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[82]
S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” 2013,.
[83]
J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg, “The quick, draw! – a.i. experiment,” 2016. [Online]. Available: quickdraw.withgoogle.com
[84]
B. Schroeder and Y. Cui, “FGVCx fungi classification challenge,” 2018. [Online]. Available: github.com/visipedia/fgvcx_fungi_comp
[85]
S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The German traffic sign detection benchmark,” in Proc. Int. Joint Conf. Neural Netw., 2013, pp. 1–8.
[86]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[87]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200–2011 Dataset,” California Institute of Technology, Pasadena, CA, USA, Tech. Rep. CNS-TR-2011-001, 2011.
[88]
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3606–3613.
[89]
M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Proc. Indian Conf. Comput. Vis., Graph. Image Process., 2008, pp. 722–729.
[90]
F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
[91]
K. Allen, E. Shelhamer, H. Shin, and J. Tenenbaum, “Infinite mixture prototypes for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 232–241.
[92]
A. Devos, S. Chatel, and M. Grossglauser, “Reproducing meta-learning with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Representations., 2019.
[93]
Y. Liu et al., “Learning to propagate labels: Transductive propagation network for few-shot learning,” in Proc. Int. Conf. Learn. Representations, 2019.
[94]
F. Hao, F. He, J. Cheng, L. Wang, J. Cao, and D. Tao, “Collect and select: Semantic alignment metric learning for few-shot learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8459–8468.
[95]
S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4367–4375.
[96]
B. Oreshkin, P. López, and A. Lacoste, “Tadam: Task dependent adaptive metric for improved few-shot learning,” in Proc. 32nd Int. Conf. Adv. Neural Informat. Process. Syst., 2018, pp. 719–729.
[97]
Q. Sun, Y. Liu, T. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 403–412.
[98]
J. Zhang, C. Zhao, B. Ni, M. Xu, and X. Yang, “Variational few-shot learning,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1685–1694.
[99]
S. Yoon, J. Seo, and J. Moon, “TapNet: Neural network augmented with task-adaptive projection for few-shot learning,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 7115–7123.
[100]
K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10649–10657.
[101]
C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4135–4144.
[102]
Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few-shot image classification: A good embedding is all you need?,” 2020,.
[103]
F. X. Yu, A. T. Suresh, K. M. Choromanski, D. N. Holtmann-Rice, and S. Kumar, “Orthogonal random features,” in Proc. Adv. Neural Informat. Process. Syst., 2016, pp. 1975–1983.
[104]
Z. Li, F. Zhou, F. Chen, and H. Li, “Meta-sgd: Learning to learn quickly for few-shot learning,” 2017,.
[105]
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural attentive meta-learner,” in Proc. Int. Conf. Learn. Representations, 2018.
[106]
S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4367–4375.
[107]
M. Bauer, M. Rojas-Carulla, J. B. Świątkowski, B. Schölkopf, and R. E. Turner, “Discriminative k-shot learning using probabilistic models,” 2017,.
[108]
S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by predicting parameters from activations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7229–7238.
[109]
S. Zagoruyko and N. Komodakis, “Wide residual networks,” 2016,.
[110]
H.-Y. Tseng, H.-Y. Lee, J.-B. Huang, and M.-H. Yang, “Cross-domain few-shot classification via learned feature-wise transformation,” 2020,.
[111]
Y. Guo et al., “A broader study of cross-domain few-shot learning,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 124–141.

Index Terms

  1. MetaKernel: Learning Variational Random Features With Limited Labels
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
        IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 46, Issue 3
        March 2024
        579 pages

        Publisher

        IEEE Computer Society

        United States

        Publication History

        Published: 28 February 2022

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media