Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks

Jon Almazán¹²,
Byungsoo Ko¹³,
Geonmo Gu¹³,
Diane Larlus¹² &
…
Yannis Kalantidis¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

European Conference on Computer Vision

2628 Accesses

Abstract

Strong image search models can be learned for a specific domain, i.e. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from these various retrieval tasks. This is the more practical scenario that we consider in this paper. We address it with the proposed , an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains. We extend the pretrained model with multiple independently trained sets of adaptors that use pseudo-label sets of different sizes, effectively mimicking different pseudo-granularities. We reconcile all adaptor sets into a single unified model suited for all retrieval tasks by learning fusion layers that we guide by propagating pseudo-granularity attentions across neighbors in the feature space. Results on a benchmark composed of six heterogeneous retrieval tasks show that the unsupervised model improves the zero-shot performance of a state-of-the-art self-supervised learning model, and in some places reaches or improves over a task label-aware oracle that selects the most fitting pseudo-granularity per task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer

Few-shot adaptation of multi-modal foundation models: a survey

Article Open access 27 August 2024

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Notes

1.
We will use the term classes to refer to sets of images with the same label, whether the latter represents object instances or fine-grained classes.
2.
We chose DINO over the CLIP [45] model as the training set of CLIP is not public. Therefore the data in MRT might be part of its 400M image-text training pairs.

References

Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. Tech. rep, Stanford (2006)
Google Scholar
Avrithis, Y., Kalantidis, Y.: Approximate gaussian mixtures for large scale vocabularies. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 15–28. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_2
Chapter Google Scholar
Avrithis, Y., Kalantidis, Y., Anagnostopoulos, E., Emiris, I.Z.: Web-scale image clustering revisited. In: Proceedings of ICCV (2015)
Google Scholar
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Chapter Google Scholar
Boudiaf, M., et al.: A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 548–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_33
Chapter Google Scholar
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43
Chapter Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 139–156. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_9
Chapter Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of NeurIPS (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of ICCV (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of ICML (2020)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR (2005)
Google Scholar
Csurka, G. (ed.): Domain Adaptation in Computer Vision Applications. ACVPR, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1
Book Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proceedings of CVPR (2009)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of CVPR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of ICLR (2021)
Google Scholar
Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., Leal-Taixé, L.: The group loss for deep metric learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 277–294. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_17
Chapter Google Scholar
Fehervari, I., Ravichandran, A., Appalaraju, S.: Unbiased evaluation of deep metric learning algorithms. arXiv preprint arXiv:1911.12528 (2019)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: Proceedings of ICLR (2018)
Google Scholar
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Chapter Google Scholar
Goyal, P., et al.: Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988 (2021)
Grill, J.B., et al.: Bootstrap your own latent: A new approach to self-supervised learning. In: Proceedings of NeurIPS (2020)
Google Scholar
Gu, G., Ko, B.: Symmetrical synthesis for deep metric learning. In: Proceedings of AAAI (2020)
Google Scholar
Gu, G., Ko, B., Kim, H.G.: Proxy synthesis: Learning with synthetic classes for deep metric learning. In: Proceedings of AAAI (2021)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR (2020)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv:1606.08415 (2016)
Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: Discriminative embeddings for segmentation and separation. In: Proceedings of ICASSP (2016)
Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Proceedings of ICML (2019)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: Proceedings of ICML (2021)
Google Scholar
Kalantidis, Y., Lassance, C., Almazán, J., Larlus, D.: TLDR: Twin learning for dimensionality reduction. In: TMLR (2022)
Google Scholar
Ko, B., Gu, G.: Embedding expansion: Augmentation in embedding space for deep metric learning. In: Proceedings of CVPR (2020)
Google Scholar
Ko, B., Gu, G., Kim, H.G.: Learning with memory-based virtual classes for deep metric learning. In: Proceedings of ICCV (2021)
Google Scholar
Krause, J., Deng, J., Stark, M., Li, F.F.: Collecting a large-scale dataset of fine-grained cars. In: Proceedings of ICCV-W (2013)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of CVPR (2017)
Google Scholar
Lloyd, S.: Least squares quantization in pcm. TIT 28(2), 129–137 (1982)
MathSciNet MATH Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013)
Mccloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 104–169 (1989)
Google Scholar
Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41
Chapter Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of ICCVGIP (2008)
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Chapter Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of CVPR (2016)
Google Scholar
Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. In: Proceedings of EACL (2021)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of CVPR (2007)
Google Scholar
Philip, J., Berard, A., Gallé, M., Besacier, L.: Monolingual adapters for zero-shot neural machine translation. In: Proceedings of EMNLP (2020)
Google Scholar
Puigcerver, J., et al.: Scalable transfer learning with expert models. In: Proceedings of ICLR (2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)
Google Scholar
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Proceedings of NeurIPS (2017)
Google Scholar
Revaud, J., Almazán, J., Rezende, R., de Souza, C.: Learning with average precision: Training image retrieval with a listwise loss. In: Proceedings of ICCV (2019)
Google Scholar
Riquelme, C., et al.: Scaling vision with sparse mixture of experts. In: Proceedings of NeurIPS (2021)
Google Scholar
Sariyildiz, M.B., Kalantidis, Y., Larlus, D., Alahari, K.: Concept generalization in visual representation learning. In: Proceedings of ICCV (2021)
Google Scholar
Seidenschwarz, J., Elezi, I., Leal-Taixé, L.: Learning intra-batch connections for deep metric learning. In: Proceedings of ICML (2021)
Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: Proceedings of CVPR-W (2014)
Google Scholar
Shazeer, N., et al.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: Proceedings of ICLR (2017)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Proceedings of NeurIPS (2016)
Google Scholar
Tian, Y., Henaff, O.J., van den Oord, A.: Divide and contrast: Self-supervised learning from uncurated data. In: Proceedings of ICCV (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Tech. rep, California Institute of Technology (2011)
Google Scholar
Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25, 926–930 (2018)
Article Google Scholar
Wang, F., Xiang, X., Cheng, J., Yuille, A.L.: Normface: L2 hypersphere embedding for face verification. In: Proceedings of ACM Multimedia (2017)
Google Scholar
Wang, R., et al.: K-adapter: Infusing knowledge into pre-trained models with adapters. In: ACL/IJCNLP (Findings) (2021)
Google Scholar
Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., Mahajan, D.: Clusterfit: Improving generalization of visual representations. In: Proceedings of CVPR (2020)
Google Scholar
You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. In: Proceedings of ICLR (2018)
Google Scholar
Yuksel, S.E., Wilson, J.N., Gader, D.P.: Twenty years of mixture of experts. Trans. Neural Netw. Learn. Syst. 23, 177–1193 (2012)
Google Scholar
Zamir, A., Sax, A., Shen, W., Guibas, L., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. In: Proceedings of CVPR (2018)
Google Scholar
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: Self-supervised learning via redundancy reduction. In: Proceedings of ICML (2021)
Google Scholar
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. In: Proceedings of BMVC (2019)
Google Scholar
Zhai, X., et al.: LiT: Zero-shot transfer with locked-image text tuning. In: Proceedings of CVPR (2022)
Google Scholar

Download references

Acknowledgements

MIAI@Grenoble Alpes (ANR-19-P3IA-0003).

Author information

Authors and Affiliations

NAVER LABS Europe, Meylan, France
Jon Almazán, Diane Larlus & Yannis Kalantidis
NAVER Corp., Seongnam-si, South Korea
Byungsoo Ko & Geonmo Gu

Authors

Jon Almazán
View author publications
You can also search for this author in PubMed Google Scholar
Byungsoo Ko
View author publications
You can also search for this author in PubMed Google Scholar
Geonmo Gu
View author publications
You can also search for this author in PubMed Google Scholar
Diane Larlus
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Kalantidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jon Almazán .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 225 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almazán, J., Ko, B., Gu, G., Larlus, D., Kalantidis, Y. (2022). Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-19781-9_23
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19780-2
Online ISBN: 978-3-031-19781-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer

Few-shot adaptation of multi-modal foundation models: a survey

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 225 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer

Few-shot adaptation of multi-modal foundation models: a survey

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 225 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation