Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Shibal Ibrahim¹³,
Natalia Ponomareva¹⁴ &
Rahul Mazumder¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13713))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1449 Accesses

Abstract

Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular for improved prediction and efficient use of limited resources. Fine-tuning requires identification of best models to transfer-learn from and quantifying transferability prevents expensive re-training on all of the candidate models/tasks pairs. In this paper, we show that the statistical problems with covariance estimation drive the poor performance of H-score—a common baseline for newer metrics—and propose shrinkage-based estimator. This results in up to $80\%$ absolute gain in H-score correlation performance, making it competitive with the state-of-the-art LogME measure. Our shrinkage-based H-score is 3–10 times faster to compute compared to LogME. Additionally, we look into a less common setting of target (as opposed to source) task selection. We demonstrate previously overlooked problems in such settings with different number of labels, class-imbalance ratios etc. for some recent metrics e.g., NCE, LEEP that resulted in them being misrepresented as leading measures. We propose a correction and recommend measuring correlation performance against relative accuracy in such settings. We support our findings with $\sim $ 164,000 (fine-tuning trials) experiments on both vision models and graph neural networks.

S. Ibrahim—This work was completed as an Intern and Student Researcher at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big Transfer (BiT): General Visual Representation Learning

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Notes

1.
Condition number of a positive semidefinite matrix A, is the ratio of its largest and smallest eigenvalues.
2.
https://keras.io/api/applications/.

References

Bao, Y., Li, Y., Huang, S., et al.: An information-theoretic approach to transferability in task transfer learning. In: 2019 IEEE ICIP, pp. 2309–2313 (2019)
Google Scholar
Chen, Y., Wiesel, A., Eldar, Y.C., et al.: Shrinkage algorithms for MMSE covariance estimation. IEEE Trans. Signal Process. 58(10), 5016–5029 (2010)
Article MathSciNet MATH Google Scholar
Chollet, F., et al.: Keras. https://github.com/fchollet/keras (2015)
Cui, Y., Song, Y., Sun, C., et al.: Large scale fine-grained categorization and domain-specific transfer learning. CoRR abs/1806.06193 (2018)
Google Scholar
Deshpande, A., Achille, A., Ravichandran, A., et al.: A linearized framework and a new benchmark for model selection for fine-tuning (2021)
Google Scholar
Devlin, J., Chang, M., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Article MATH Google Scholar
Guyon, I.: Design of experiments for the nips 2003 variable selection benchmark (2003)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Huang, L.K., Wei, Y., Rong, Y., et al.: Frustratingly easy transferability estimation. ArXiv abs/2106.09362 (2021)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR 2017, Toulon, France, 24–26 April 2017. OpenReview.net (2017)
Google Scholar
Kornblith, S., Shlens, J., Le, Q.V.: Do better imageNet models transfer better? In: 2019 IEEE/CVF CVPR, pp. 2656–2666 (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems 25 (01 2012)
Google Scholar
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)
Article MathSciNet MATH Google Scholar
Li, H., Chaudhari, P., Yang, H., et al.: Rethinking the hyperparameters for fine-tuning. CoRR abs/2002.11770 (2020)
Google Scholar
Li, Y., Jia, X., Sang, R., et al.: Ranking neural checkpoints. In: Proceedings of the IEEE/CVF CVPR, pp. 2663–2673 (2021)
Google Scholar
Mahajan, D., Girshick, R.B., Ramanathan, V., et al.: Exploring the limits of weakly supervised pretraining. CoRR abs/1805.00932 (2018)
Google Scholar
Max, A.W.: Inverting modified matrices. In: Memorandum Rept. 42, Statistical Research Group, p. 4. Princeton Univ. (1950)
Google Scholar
Nguyen, C.V., Hassner, T., Seeger, M., Archambeau, C.: Leep: A new measure to evaluate transferability of learned representations (2020)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Pourahmadi, M.: High-dimensional covariance estimation: with high-dimensional data, vol. 882. John Wiley & Sons (2013)
Google Scholar
Rabanser, S., Günnemann, S., Lipton, Z.C.: Failing loudly: an empirical study of methods for detecting dataset shift. In: NeurIPS (2019)
Google Scholar
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding. J. Complex Netw. 9(2), cnab014 (2021)
Google Scholar
Rozemberczki, B., Sarkar, R.: Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of the 29th ACM CIKM, pp. 1325–1334. CIKM 2020, ACM, New York, NY, USA (2020)
Google Scholar
Rozemberczki, B., Sarkar, R.: Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings (2021)
Google Scholar
Schäfer, J., Strimmer, K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4, 32 (2005)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015)
Google Scholar
Tan, Y., Li, Y., Huang, S.: OTCE: a transferability metric for cross-domain cross-task representations. CoRR abs/2103.13843 (2021)
Google Scholar
Tran, A., Nguyen, C., Hassner, T.: Transferability and hardness of supervised classification tasks. In: 2019 IEEE/CVF ICCV, pp. 1395–1405 (2019)
Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11(95), 2837–2854 (2010)
MathSciNet MATH Google Scholar
You, K., Liu, Y., Wang, J., Long, M.: LogME: practical assessment of pre-trained models for transfer learning. In: ICML (2021)
Google Scholar
You, K., Liu, Y., Zhang, Z., Wang, J., Jordan, M.I., Long, M.: Ranking and tuning pre-trained models: a new paradigm of exploiting model hubs (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA
Shibal Ibrahim & Rahul Mazumder
Google Research, New York, NY, USA
Natalia Ponomareva

Authors

Shibal Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Ponomareva
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Mazumder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shibal Ibrahim .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 449 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ibrahim, S., Ponomareva, N., Mazumder, R. (2023). Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham. https://doi.org/10.1007/978-3-031-26387-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-26387-3_42
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26386-6
Online ISBN: 978-3-031-26387-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)