Nothing Special   »   [go: up one dir, main page]

Skip to main content

Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

  • 2628 Accesses

Abstract

Strong image search models can be learned for a specific domain, i.e. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from these various retrieval tasks. This is the more practical scenario that we consider in this paper. We address it with the proposed , an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains. We extend the pretrained model with multiple independently trained sets of adaptors that use pseudo-label sets of different sizes, effectively mimicking different pseudo-granularities. We reconcile all adaptor sets into a single unified model suited for all retrieval tasks by learning fusion layers that we guide by propagating pseudo-granularity attentions across neighbors in the feature space. Results on a benchmark composed of six heterogeneous retrieval tasks show that the unsupervised model improves the zero-shot performance of a state-of-the-art self-supervised learning model, and in some places reaches or improves over a task label-aware oracle that selects the most fitting pseudo-granularity per task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We will use the term classes to refer to sets of images with the same label, whether the latter represents object instances or fine-grained classes.

  2. 2.

    We chose DINO over the CLIP [45] model as the training set of CLIP is not public. Therefore the data in MRT might be part of its 400M image-text training pairs.

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. Tech. rep, Stanford (2006)

    Google Scholar 

  2. Avrithis, Y., Kalantidis, Y.: Approximate gaussian mixtures for large scale vocabularies. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 15–28. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_2

    Chapter  Google Scholar 

  3. Avrithis, Y., Kalantidis, Y., Anagnostopoulos, E., Emiris, I.Z.: Web-scale image clustering revisited. In: Proceedings of ICCV (2015)

    Google Scholar 

  4. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29

    Chapter  Google Scholar 

  5. Boudiaf, M., et al.: A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 548–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_33

    Chapter  Google Scholar 

  6. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43

    Chapter  Google Scholar 

  7. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 139–156. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_9

    Chapter  Google Scholar 

  8. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of NeurIPS (2020)

    Google Scholar 

  9. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of ICCV (2021)

    Google Scholar 

  10. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of ICML (2020)

    Google Scholar 

  11. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR (2005)

    Google Scholar 

  12. Csurka, G. (ed.): Domain Adaptation in Computer Vision Applications. ACVPR, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1

    Book  Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proceedings of CVPR (2009)

    Google Scholar 

  14. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of CVPR (2019)

    Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of ICLR (2021)

    Google Scholar 

  16. Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., Leal-Taixé, L.: The group loss for deep metric learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 277–294. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_17

    Chapter  Google Scholar 

  17. Fehervari, I., Ravichandran, A., Appalaraju, S.: Unbiased evaluation of deep metric learning algorithms. arXiv preprint arXiv:1911.12528 (2019)

  18. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: Proceedings of ICLR (2018)

    Google Scholar 

  19. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15

    Chapter  Google Scholar 

  20. Goyal, P., et al.: Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988 (2021)

  21. Grill, J.B., et al.: Bootstrap your own latent: A new approach to self-supervised learning. In: Proceedings of NeurIPS (2020)

    Google Scholar 

  22. Gu, G., Ko, B.: Symmetrical synthesis for deep metric learning. In: Proceedings of AAAI (2020)

    Google Scholar 

  23. Gu, G., Ko, B., Kim, H.G.: Proxy synthesis: Learning with synthetic classes for deep metric learning. In: Proceedings of AAAI (2021)

    Google Scholar 

  24. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR (2020)

    Google Scholar 

  25. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv:1606.08415 (2016)

  26. Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: Discriminative embeddings for segmentation and separation. In: Proceedings of ICASSP (2016)

    Google Scholar 

  27. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Proceedings of ICML (2019)

    Google Scholar 

  28. Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: Proceedings of ICML (2021)

    Google Scholar 

  29. Kalantidis, Y., Lassance, C., Almazán, J., Larlus, D.: TLDR: Twin learning for dimensionality reduction. In: TMLR (2022)

    Google Scholar 

  30. Ko, B., Gu, G.: Embedding expansion: Augmentation in embedding space for deep metric learning. In: Proceedings of CVPR (2020)

    Google Scholar 

  31. Ko, B., Gu, G., Kim, H.G.: Learning with memory-based virtual classes for deep metric learning. In: Proceedings of ICCV (2021)

    Google Scholar 

  32. Krause, J., Deng, J., Stark, M., Li, F.F.: Collecting a large-scale dataset of fine-grained cars. In: Proceedings of ICCV-W (2013)

    Google Scholar 

  33. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of CVPR (2017)

    Google Scholar 

  34. Lloyd, S.: Least squares quantization in pcm. TIT 28(2), 129–137 (1982)

    MathSciNet  MATH  Google Scholar 

  35. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013)

  36. Mccloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 104–169 (1989)

    Google Scholar 

  37. Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41

    Chapter  Google Scholar 

  38. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of ICCVGIP (2008)

    Google Scholar 

  39. Noroozi, M., Favaro, P.: Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  40. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of CVPR (2016)

    Google Scholar 

  41. Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. In: Proceedings of EACL (2021)

    Google Scholar 

  42. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of CVPR (2007)

    Google Scholar 

  43. Philip, J., Berard, A., Gallé, M., Besacier, L.: Monolingual adapters for zero-shot neural machine translation. In: Proceedings of EMNLP (2020)

    Google Scholar 

  44. Puigcerver, J., et al.: Scalable transfer learning with expert models. In: Proceedings of ICLR (2021)

    Google Scholar 

  45. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)

    Google Scholar 

  46. Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Proceedings of NeurIPS (2017)

    Google Scholar 

  47. Revaud, J., Almazán, J., Rezende, R., de Souza, C.: Learning with average precision: Training image retrieval with a listwise loss. In: Proceedings of ICCV (2019)

    Google Scholar 

  48. Riquelme, C., et al.: Scaling vision with sparse mixture of experts. In: Proceedings of NeurIPS (2021)

    Google Scholar 

  49. Sariyildiz, M.B., Kalantidis, Y., Larlus, D., Alahari, K.: Concept generalization in visual representation learning. In: Proceedings of ICCV (2021)

    Google Scholar 

  50. Seidenschwarz, J., Elezi, I., Leal-Taixé, L.: Learning intra-batch connections for deep metric learning. In: Proceedings of ICML (2021)

    Google Scholar 

  51. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: Proceedings of CVPR-W (2014)

    Google Scholar 

  52. Shazeer, N., et al.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: Proceedings of ICLR (2017)

    Google Scholar 

  53. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Proceedings of NeurIPS (2016)

    Google Scholar 

  54. Tian, Y., Henaff, O.J., van den Oord, A.: Divide and contrast: Self-supervised learning from uncurated data. In: Proceedings of ICCV (2021)

    Google Scholar 

  55. Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)

    Google Scholar 

  56. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Tech. rep, California Institute of Technology (2011)

    Google Scholar 

  57. Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25, 926–930 (2018)

    Article  Google Scholar 

  58. Wang, F., Xiang, X., Cheng, J., Yuille, A.L.: Normface: L2 hypersphere embedding for face verification. In: Proceedings of ACM Multimedia (2017)

    Google Scholar 

  59. Wang, R., et al.: K-adapter: Infusing knowledge into pre-trained models with adapters. In: ACL/IJCNLP (Findings) (2021)

    Google Scholar 

  60. Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., Mahajan, D.: Clusterfit: Improving generalization of visual representations. In: Proceedings of CVPR (2020)

    Google Scholar 

  61. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. In: Proceedings of ICLR (2018)

    Google Scholar 

  62. Yuksel, S.E., Wilson, J.N., Gader, D.P.: Twenty years of mixture of experts. Trans. Neural Netw. Learn. Syst. 23, 177–1193 (2012)

    Google Scholar 

  63. Zamir, A., Sax, A., Shen, W., Guibas, L., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. In: Proceedings of CVPR (2018)

    Google Scholar 

  64. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: Self-supervised learning via redundancy reduction. In: Proceedings of ICML (2021)

    Google Scholar 

  65. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. In: Proceedings of BMVC (2019)

    Google Scholar 

  66. Zhai, X., et al.: LiT: Zero-shot transfer with locked-image text tuning. In: Proceedings of CVPR (2022)

    Google Scholar 

Download references

Acknowledgements

MIAI@Grenoble Alpes (ANR-19-P3IA-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jon Almazán .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 225 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almazán, J., Ko, B., Gu, G., Larlus, D., Kalantidis, Y. (2022). Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19781-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19780-2

  • Online ISBN: 978-3-031-19781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics