MetaKernel: Learning Variational Random Features With Limited Labels
Pages 1464 - 1478
Abstract
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot learning, we call MetaKernel. Specifically, we propose learning variational random features in a data-driven manner to obtain task-specific kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting. We treat the random feature basis as the latent variable, which is estimated by variational inference. The shared knowledge from related tasks is incorporated into a context inference of the posterior, which we achieve via a long-short term memory module. To establish more expressive kernels, we deploy conditional normalizing flows based on coupling layers to achieve a richer posterior distribution over random Fourier bases. The resultant kernels are more informative and discriminative, which further improves the few-shot learning. To evaluate our method, we conduct extensive experiments on both few-shot image classification and regression tasks. A thorough ablation study demonstrates that the effectiveness of each introduced component in our method. The benchmark results on fourteen datasets demonstrate MetaKernel consistently delivers at least comparable and often better performance than state-of-the-art alternatives.
References
[1]
X. Zhen et al., “Learning to learn kernels with variational random features,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 11409–11419.
[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Informat. Process. Syst., 2012, pp. 1097–1105.
[3]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[5]
L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, Apr. 2006.
[6]
B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
[7]
S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in Proc. Int. Conf. Learn. Representations, 2017.
[8]
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
[9]
J. Schmidhuber, “Learning to control fast-weight memories: An alternative to dynamic recurrent networks,” Neural Comput., vol. 4, no. 1, pp. 131–139, 1992.
[10]
S. Thrun and L. Pratt, Learning to Learn, Berlin, Germany: Springer, 2012.
[11]
M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” in Proc. 30th Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 3981–3989.
[12]
A. A. Rusu et al., “Meta-learning with latent embedding optimization,” in Proc. Int. Conf. Learn. Representations, 2019.
[13]
O. Vinyals et al., “Matching networks for one shot learning,” in Proc. 30th Int. Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 3630–3638.
[14]
J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Proc. Adv. Neural Informat. Process. Syst., 2017, pp. 4077–4087.
[15]
J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Meta-learning probabilistic inference for prediction,” in Proc. Int. Conf. Learn. Representations, 2019.
[16]
Y. Du, X. Zhen, L. Shao, and C. G. M. Snoek, “MetaNorm: Learning to normalize few-shot batches across domains,” in Proc. Int. Conf. Learn. Representations, 2021.
[17]
B. Schölkopf and A. J. Smola, Learning with Kernels. Cambridge, MA, USA: MIT Press, 2002.
[18]
B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2018.
[19]
T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Ann. Statist., vol. 36, no. 3, pp. 1171–1220, 2008.
[20]
N. Cristianini et al., An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[21]
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statist. Comput., vol. 14, pp. 199–222, 2004.
[22]
A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Proc. 20th Int. Conf. Adv. Neural Informat. Process. Syst., 2007, pp. 1177–1184.
[23]
A. Sinha and J. C. Duchi, “Learning kernels with random features,” in Proc. 30th Int. Conf. Adv. Neural Informat. Process. Syst., 2016, pp. 1298–1306.
[24]
F. R. Bach, G. R. Lanckriet, and M. I. Jordan, “Multiple kernel learning, conic duality, and the smo algorithm,” in Proc. Int. Conf. Mach. Learn., 2004, Art. no.
[25]
J. Hensman, N. Durrande, and A. Solin, “Variational fourier features for gaussian processes,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 5537–5588, 2017.
[26]
L. Carratino, A. Rudi, and L. Rosasco, “Learning with SGD and random features,” in Proc. 32nd Int. Conf. Adv. Neural Informat. Process. Syst., 2018, pp. 10192–10203.
[27]
B. Bullins, C. Zhang, and Y. Zhang, “Not-so-random features,” in Proc. Int. Conf. Learn. Representations, 2018.
[28]
C.-L. Li, W.-C. Chang, Y. Mroueh, Y. Yang, and B. Poczos, “Implicit kernel learning,” in Proc. Int. Conf. Artif. Intell. Statist., 2019, pp. 2007–2016.
[29]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[30]
L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent components estimation,” 2014,.
[31]
L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” Proc. Int. Conf. Learn. Representations, 2017.
[32]
D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1530–1538.
[33]
D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1× 1 convolutions,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 10236–10245.
[34]
C. Winkler, D. Worrall, E. Hoogeboom, and M. Welling, “Learning likelihoods with conditional normalizing flows,” 2019,.
[35]
E. Triantafillou et al., “Meta-dataset: A dataset of datasets for learning to learn from few examples,” 2019,.
[36]
A. Rajeswaran, C. Finn, S. Kakade, and S. Levine, “Meta-learning with implicit gradients,” 2019,.
[37]
T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” 2020,.
[38]
K. R. Allen, E. Shelhamer, H. Shin, and J. B. Tenenbaum, “Infinite mixture prototypes for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 232–241.
[39]
B. Oreshkin, P. R. López, and A. Lacoste, “Tadam: Task dependent adaptive metric for improved few-shot learning,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 721–731.
[40]
S. W. Yoon, J. Seo, and J. Moon, “Tapnet: Neural network augmented with task-adaptive projection for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7115–7123.
[41]
V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,” in Proc. Int. Conf. Learn. Representations, 2018.
[42]
T. Cao, M. Law, and S. Fidler, “A theoretical analysis of the number of shots in few-shot learning,” 2019,.
[43]
X. Zhen, Y. Du, H. Xiong, Q. Qiu, C. G. M. Snoek, and L. Shao, “Learning to learn variational semantic memory,” in Proc. 34th Conf. Adv. Neural Informat. Process. Syst., 2020, pp. 9122–9134.
[44]
L. Zintgraf, K. Shiarli, V. Kurin, K. Hofmann, and S. Whiteson, “Fast context adaptation via meta-learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7693–7702.
[45]
Y. Chen et al., “Learning to learn without gradient descent by gradient descent,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 748–756.
[46]
H. Edwards and A. Storkey, “Towards a neural statistician,” 2016,.
[47]
C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-learning,” in Proc. Adv. Neural Informat. Process. Syst., 2018, pp. 9516–9527.
[48]
S. Sæmundsson, K. Hofmann, and M. P. Deisenroth, “Meta reinforcement learning with latent variable gaussian processes,” 2018,.
[49]
L. Bertinetto, J. F. Henriques, P. H. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Representations, 2019.
[50]
A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1842–1850.
[51]
T. Munkhdalai and H. Yu, “Meta networks,” in Proc. 34th Int. Conf. Int. Conf. Mach. Learn., 2017, pp. 2554–2563.
[52]
T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler, “Rapid adaptation with conditionally shifted neurons,” 2017,.
[53]
C. M. Bishop, Pattern Recognition and Machine Learning. Berlin, Germany: Springer, 2006.
[54]
N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” J. Mach. Learn. Res., vol. 12, no. 9, pp. 2539–2561, 2011.
[55]
M. Gönen and E. Alpaydın, “Multiple kernel learning algorithms,” J. Mach. Learn. Res., vol. 12, pp. 2211–2268, 2011.
[56]
D. Duvenaud, J. R. Lloyd, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, “Structure discovery in nonparametric regression through compositional kernel search,” 2013,.
[57]
T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola, “Multi-instance kernels,” in Proc. 19th Int. Conf. Mach. Learn., 2002, pp. 179–186.
[58]
W. Rudin, Fourier Analysis on Groups. Hoboken, NJ, USA: Wiley, 1962.
[59]
A. Wilson and R. Adams, “Gaussian process kernels for pattern discovery and extrapolation,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1067–1075.
[60]
Z. Yang, A. Wilson, A. Smola, and L. Song, “A la carte–learning fast kernels,” in Proc. 18th Int. Conf. Artif. Intell. Statist., 2015, pp. 1098–1106.
[61]
H. Avron, V. Sindhwani, J. Yang, and M. W. Mahoney, “Quasi-monte carlo feature maps for shift-invariant kernels,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 4096–4133, 2016.
[62]
W.-C. Chang, C.-L. Li, Y. Yang, and B. Poczos, “Data-driven random fourier features using stein effect,” 2017,.
[63]
G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, “Normalizing flows for probabilistic modeling and inference,” J. Mach. Learn. Res., vol. 22, no. 57, pp. 1–64, 2021.
[64]
D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling, “Improved variational inference with inverse autoregressive flow,” Adv. Neural Informat. Process. Syst., vol. 29, pp. 4743–4751, 2016.
[65]
G. Papamakarios, T. Pavlakou, and I. Murray, “Masked autoregressive flow for density estimation,” in Proc. Adv. Neural Informat. Process. Syst., 2017, pp. 2335–2344.
[66]
R. T. Chen, J. Behrmann, D. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” in Proc. 33rd Int. Conf. Adv. Neural Informat. Process. Syst., 2019, pp. 9916–9926.
[67]
J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel, “Flow++: Improving flow-based generative models with variational dequantization and architecture design,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 2722–2730.
[68]
P. Esling et al., “Universal audio synthesizer control with normalizing flows,” 2019,.
[69]
R. Prenger, R. Valle, and B. Catanzaro, “Waveglow: A flow-based generative network for speech synthesis,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 3617–3621.
[70]
J.-H. Jacobsen, A. Smeulders, and E. Oyallon, “i-RevNet: Deep invertible networks,” in Proc. Int. Conf. on Learn. Representations, 2018.
[71]
K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in Proc. 28th Int. Conf. Adv. Neural Informat. Process. Syst., 2015, pp. 3483–3491.
[72]
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2013,.
[73]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” 2014,.
[74]
F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” in Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw., 2000, pp. 189–194.
[75]
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
[76]
A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Netw., vol. 18, no. 5–6, pp. 602–610, 2005.
[77]
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” in Proc. 31st Int. Conf. Adv. Neural Informat. Process. Syst., 2017, pp. 3394–3404.
[78]
H. Kim et al., “Attentive neural processes,” in Proc. Int. Conf. Learn. Representations, 2019.
[79]
A. Krizhevsky, “Learning multiple layers of features from tiny images,” Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
[80]
X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1406–1415.
[81]
O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Proc. Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[82]
S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” 2013,.
[83]
J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg, “The quick, draw! – a.i. experiment,” 2016. [Online]. Available: quickdraw.withgoogle.com
[84]
B. Schroeder and Y. Cui, “FGVCx fungi classification challenge,” 2018. [Online]. Available: github.com/visipedia/fgvcx_fungi_comp
[85]
S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The German traffic sign detection benchmark,” in Proc. Int. Joint Conf. Neural Netw., 2013, pp. 1–8.
[86]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[87]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200–2011 Dataset,” California Institute of Technology, Pasadena, CA, USA, Tech. Rep. CNS-TR-2011-001, 2011.
[88]
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3606–3613.
[89]
M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Proc. Indian Conf. Comput. Vis., Graph. Image Process., 2008, pp. 722–729.
[90]
F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
[91]
K. Allen, E. Shelhamer, H. Shin, and J. Tenenbaum, “Infinite mixture prototypes for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 232–241.
[92]
A. Devos, S. Chatel, and M. Grossglauser, “Reproducing meta-learning with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Representations., 2019.
[93]
Y. Liu et al., “Learning to propagate labels: Transductive propagation network for few-shot learning,” in Proc. Int. Conf. Learn. Representations, 2019.
[94]
F. Hao, F. He, J. Cheng, L. Wang, J. Cao, and D. Tao, “Collect and select: Semantic alignment metric learning for few-shot learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8459–8468.
[95]
S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4367–4375.
[96]
B. Oreshkin, P. López, and A. Lacoste, “Tadam: Task dependent adaptive metric for improved few-shot learning,” in Proc. 32nd Int. Conf. Adv. Neural Informat. Process. Syst., 2018, pp. 719–729.
[97]
Q. Sun, Y. Liu, T. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 403–412.
[98]
J. Zhang, C. Zhao, B. Ni, M. Xu, and X. Yang, “Variational few-shot learning,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1685–1694.
[99]
S. Yoon, J. Seo, and J. Moon, “TapNet: Neural network augmented with task-adaptive projection for few-shot learning,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 7115–7123.
[100]
K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10649–10657.
[101]
C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4135–4144.
[102]
Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few-shot image classification: A good embedding is all you need?,” 2020,.
[103]
F. X. Yu, A. T. Suresh, K. M. Choromanski, D. N. Holtmann-Rice, and S. Kumar, “Orthogonal random features,” in Proc. Adv. Neural Informat. Process. Syst., 2016, pp. 1975–1983.
[104]
Z. Li, F. Zhou, F. Chen, and H. Li, “Meta-sgd: Learning to learn quickly for few-shot learning,” 2017,.
[105]
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural attentive meta-learner,” in Proc. Int. Conf. Learn. Representations, 2018.
[106]
S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4367–4375.
[107]
M. Bauer, M. Rojas-Carulla, J. B. Świątkowski, B. Schölkopf, and R. E. Turner, “Discriminative k-shot learning using probabilistic models,” 2017,.
[108]
S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by predicting parameters from activations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7229–7238.
[109]
S. Zagoruyko and N. Komodakis, “Wide residual networks,” 2016,.
[110]
H.-Y. Tseng, H.-Y. Lee, J.-B. Huang, and M.-H. Yang, “Cross-domain few-shot classification via learned feature-wise transformation,” 2020,.
[111]
Y. Guo et al., “A broader study of cross-domain few-shot learning,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 124–141.
Index Terms
- MetaKernel: Learning Variational Random Features With Limited Labels
Index terms have been assigned to the content through auto-classification.
Recommendations
Improving multi-label classification with missing labels by learning label-specific features
AbstractExisting multi-label learning approaches mainly utilize an identical data representation composed of all the features in the discrimination of all the labels, and assume that all the class labels are observed for each training sample. However, in ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
0162-8828 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Publisher
IEEE Computer Society
United States
Publication History
Published: 28 February 2022
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024
Other Metrics
Citations
View Options
View options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in