Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Deep hashing for multi-label image retrieval: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Content-based image retrieval (CBIR) aims to display, as a result of a search, images with the same visual contents as a query. This problem has attracted increasing attention in the area of computer vision. Learning-based hashing techniques are amongst the most studied search approaches for approximate nearest neighbors in large-scale image retrieval. With the advance of deep neural networks in image representation, hashing methods for CBIR have started using deep learning to build binary codes. Such strategies are generally known as deep hashing techniques. In this paper, we present a comprehensive deep hashing survey for the task of image retrieval with multiple labels, categorizing the methods according to how the input images are treated: pointwise, pairwise, tripletwise and listwise, as well as their relationships. In addition, we present discussions regarding the cost of space, efficiency and search quality of the described models, as well as open issues and future work opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Cross-modal is a type of approach that uses two or more different modalities of signal representation as input for neural network (Jiang and Li 2016).

  2. The Hadamard product is a binary operation between matrices of the same dimension such that \(A = B \odot C\) implies that \(A_{i,j} = B_{i, j} C_{i,j}\).

  3. The Jaccard coefficient measures the similarity between finite sample sets and is defined as the intersection size divided by the joint size of the sample sets.

  4. https://pytorch.org/.

References

  • Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06), pp 459–468. https://doi.org/10.1109/FOCS.2006.49

  • Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  • Bezerra E (2016) Introdução à aprendizagem profunda. XXXI Simposio Brasileiro de Banco de Dados

  • Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  • Cakir F, He K, Bargal SA, Sclaroff S (2018) Hashing with mutual information. arXiv preprint arXiv:1803.00974

  • Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv:1605.07678

  • Cao Y, Long M, Wang J, Zhu H, Wen Q (2016) Deep quantization network for efficient image retrieval. In: AAAI, pp 3457–3463

  • Chen Z, Cai R, Lu J, Feng J, Zhou J (2018) Order-sensitive deep hashing for multimorbidity medical image retrieval. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 620–628

  • Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from National University of Singapore. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 48

  • Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255

  • Do TT, Doan AD, Cheung NM (2016) Learning to hash with binary deep neural network. In: European conference on computer vision. Springer, Berlin, pp 219–234

  • Erin Liong V, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2475–2483

  • Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  • Gong Y, Kumar S, Verma V, Lazebnik S (2012) Angular quantization-based binary codes for fast similarity search. In: Advances in neural information processing systems, pp 1196–1204

  • Gong Y, Kumar S, Rowley HA, Lazebnik S (2013a) Learning binary codes for high-dimensional data using bilinear projections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 484–491

  • Gong Y, Lazebnik S, Gordo A, Perronnin F (2013b) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  Google Scholar 

  • Grubinger M, Clough P, Müller H, Deselaers T (2006) The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: International workshop OntoImage, vol 5

  • Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742

  • He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, Berlin, pp 346–361

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • Hijazi S, Kumar R, Rowen C (2015) Using convolutional neural networks for image recognition

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  • Huang CQ, Yang SM, Pan Y, Lai HJ (2018) Object-location-aware hashing for multi-label image retrieval via automatic mask learning. IEEE Trans Image Process 27(9):4490–4502

    Article  MathSciNet  Google Scholar 

  • Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43

  • Jain P, Kulis B, Grauman K (2008) Fast image search for learned metrics. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE. https://doi.org/10.1109/CVPR.2008.4587841

  • Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 41–48

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  • Jiang QY, Li WJ (2016) Deep cross-modal hashing. In: CoRR

  • Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587

    Article  Google Scholar 

  • Krähenbühl P, Koltun V (2014) Geodesic object proposals. In: European conference on computer vision. Springer, Cham, pp 725–739

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2130–2137

  • Kulis B, Jain P, Grauman K (2009) Fast similarity search for learned metrics. IEEE Trans Pattern Anal Mach Intell 31(12):2143–2157

    Article  Google Scholar 

  • Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278

  • Lai H, Yan P, Shu X, Wei Y, Yan S (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479

    Article  MathSciNet  Google Scholar 

  • Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855

  • Li T, Gao S, Xu Y (2017) Deep multi-similarity hashing for multi-label image retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 2159–2162

  • Li Y, Miao Z, He M, Zhang Y, Li H (2018) Deep attention residual hashing. IEICE Trans Fundam Electron Commun Comput Sci 101(3):654–657

    Article  Google Scholar 

  • Liang D, Yan K, Wang Y, Zeng W, Yuan Q, Bao X, Tian Y (2017) Deep hashing with multi-task learning for large-scale instance-level vehicle search. In: 2017 IEEE international conference on multimedia and Expo workshops (ICMEW). IEEE, pp 192–197

  • Lin G, Shen C, Suter D, Van Den Hengel A (2013) A general two-step approach to learning-based hashing. In: Proceedings of the IEEE international conference on computer vision, pp 2552–2559

  • Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755

  • Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

  • Lin K, Lu J, Chen CS, Zhou J (2016) Learning compact binary descriptors with unsupervised deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1183–1192

  • Liu L, Qi H (2018) Discriminative cross-view binary representation learning. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1736–1744

  • Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282

    Article  Google Scholar 

  • Liu TY et al (2009) Learning to rank for information retrieval. Found Trends® Inf Retr 3(3):225–331

    Article  Google Scholar 

  • Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072

  • Liu L, Rahimpour A, Taalimi A, Qi H (2017a) End-to-end binary representation learning via direct binary embedding. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 1257–1261

  • Liu W, Ma H, Qi H, Zhao D, Chen Z (2017b) Deep learning hashing for mobile visual search. EURASIP J Image Video Process. https://doi.org/10.1186/s13640-017-0167-4

    Article  Google Scholar 

  • Lu J, Liong VE, Zhou X, Zhou J (2015) Learning compact binary face descriptor for face recognition. IEEE Trans Pattern Anal Mach Intell 37(10):2041–2056

    Article  Google Scholar 

  • Lu J, Liong VE, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367

    Article  MathSciNet  Google Scholar 

  • Ma C, Chen Z, Lu J, Zhou J (2018) Rank-consistency multi-label deep hashing. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  • Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11). Citeseer, pp 353–360

  • Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems, pp 1509–1517

  • Rahmani R, Goldman SA, Zhang H, Krettek J, Fritts JE (2005) Localized content based image retrieval. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 227–236

  • Rehman M, Iqbal M, Sharif M, Raza M (2012) Content based image retrieval: survey. World Appl Sci J 19(3):404–412

    Google Scholar 

  • Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  • Shen F, Gao X, Liu L, Yang Y, Shen HT (2017) Deep asymmetric pairwise hashing. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1522–1530

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  • Singhai N, Shandilya SK (2010) A survey on: content based image retrieval systems. Int J Comput Appl 4(2):22–26

    Google Scholar 

  • Song G, Tan X (2018) Learning multilevel semantic similarity for large-scale multi-label image retrieval. In: Proceedings of the 2018 ACM on international conference on multimedia retrieval. ACM, pp 64–72

  • Stutz D (2014) Understanding convolutional neural networks. In: Seminar report, Fakultät für Mathematik, Informatik und Naturwissenschaften Lehr-und Forschungsgebiet Informatik VIII computer vision

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  • Wan J, Wu P, Hoi SC, Zhao P, Gao X, Wang D, Zhang Y, Li J (2015) Online learning to rank for content-based image retrieval. In: Twenty-fourth international joint conference on artificial intelligence

  • Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406

    Article  Google Scholar 

  • Wang J, Shen H.T, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927

  • Wang J, Liu W, Kumar S, Chang SF (2016a) Learning to hash for indexing big data—a survey. Proc IEEE 104(1):34–57

    Article  Google Scholar 

  • Wang X, Shi Y, Kitani KM (2016b) Deep supervised hashing with triplet labels. In: Asian conference on computer vision. Springer, Berlin, pp 70–84

  • Wang D, Huang H, Lin HK, Mao XL (2017a) Supervised hashing for multi-labeled data with order-preserving feature. In: Chinese national conference on social media processing. Springer, Berlin, pp 16–28

  • Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017b) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6450–6458. https://doi.org/10.1109/CVPR.2017.683

  • Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790. https://doi.org/10.1109/TPAMI.2017.2699960

    Article  Google Scholar 

  • Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  • Wu D, Lin Z, Li B, Ye M, Wang W (2017) Deep supervised hashing for multi-label and large-scale image retrieval. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 150–158

  • Wu D, Lin Z, Li B, Liu J, Wang W (2018) Deep uniqueness-aware hashing for fine-grained multi-label image retrieval. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1683–1687

  • Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence

  • Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H (2015) Convolutional neural networks for text hashing. In: IJCAI, pp 1369–1375

  • Yang H, Lin K, Chen C (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451. https://doi.org/10.1109/TPAMI.2017.2666812

    Article  Google Scholar 

  • Zhang H, Liu L, Long Y, Shao L (2017) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638

    Article  MathSciNet  Google Scholar 

  • Zhang Z, Zou Q, Wang Q, Lin Y, Li Q (2018) Instance similarity deep hashing for multi-label image retrieval. arXiv preprint arXiv:1803.02987

  • Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1556–1564

  • Zhong C, Yu Y, Tang S, Satoh S, Xing K (2017) Deep multi-label hashing for large-scale visual search based on semantic graph. Asia-Pacific Web (APWeb) and web-age information management (WAIM) joint conference on web and big data. Springer, Berlin, pp 169–184

  • Zhou Y, Huang S, Zhang Y, Wang Y (2017) Deep hashing with triplet quantization loss. In: Visual communications and image processing (VCIP), 2017 IEEE. IEEE, pp 1–4

  • Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: AAAI, pp 2415–2421

  • Zhu Y, Li Y, Wang S (2019) Unsupervised deep hashing with adaptive feature learning for image retrieval. IEEE Signal Process Lett 26(3):395–399

    Article  Google Scholar 

  • Zhuang B, Lin G, Shen C, Reid I (2016) Fast training of triplet-based deep binary embedding networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5955–5964

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josiane Rodrigues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodrigues, J., Cristo, M. & Colonna, J.G. Deep hashing for multi-label image retrieval: a survey. Artif Intell Rev 53, 5261–5307 (2020). https://doi.org/10.1007/s10462-020-09820-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09820-x

Keywords

Navigation