Abstract
Content-based image retrieval (CBIR) aims to display, as a result of a search, images with the same visual contents as a query. This problem has attracted increasing attention in the area of computer vision. Learning-based hashing techniques are amongst the most studied search approaches for approximate nearest neighbors in large-scale image retrieval. With the advance of deep neural networks in image representation, hashing methods for CBIR have started using deep learning to build binary codes. Such strategies are generally known as deep hashing techniques. In this paper, we present a comprehensive deep hashing survey for the task of image retrieval with multiple labels, categorizing the methods according to how the input images are treated: pointwise, pairwise, tripletwise and listwise, as well as their relationships. In addition, we present discussions regarding the cost of space, efficiency and search quality of the described models, as well as open issues and future work opportunities.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Cross-modal is a type of approach that uses two or more different modalities of signal representation as input for neural network (Jiang and Li 2016).
The Hadamard product is a binary operation between matrices of the same dimension such that \(A = B \odot C\) implies that \(A_{i,j} = B_{i, j} C_{i,j}\).
The Jaccard coefficient measures the similarity between finite sample sets and is defined as the intersection size divided by the joint size of the sample sets.
References
Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06), pp 459–468. https://doi.org/10.1109/FOCS.2006.49
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Bezerra E (2016) Introdução à aprendizagem profunda. XXXI Simposio Brasileiro de Banco de Dados
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Cakir F, He K, Bargal SA, Sclaroff S (2018) Hashing with mutual information. arXiv preprint arXiv:1803.00974
Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv:1605.07678
Cao Y, Long M, Wang J, Zhu H, Wen Q (2016) Deep quantization network for efficient image retrieval. In: AAAI, pp 3457–3463
Chen Z, Cai R, Lu J, Feng J, Zhou J (2018) Order-sensitive deep hashing for multimorbidity medical image retrieval. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 620–628
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from National University of Singapore. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 48
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255
Do TT, Doan AD, Cheung NM (2016) Learning to hash with binary deep neural network. In: European conference on computer vision. Springer, Berlin, pp 219–234
Erin Liong V, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2475–2483
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Gong Y, Kumar S, Verma V, Lazebnik S (2012) Angular quantization-based binary codes for fast similarity search. In: Advances in neural information processing systems, pp 1196–1204
Gong Y, Kumar S, Rowley HA, Lazebnik S (2013a) Learning binary codes for high-dimensional data using bilinear projections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 484–491
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013b) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Grubinger M, Clough P, Müller H, Deselaers T (2006) The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: International workshop OntoImage, vol 5
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, Berlin, pp 346–361
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hijazi S, Kumar R, Rowen C (2015) Using convolutional neural networks for image recognition
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Huang CQ, Yang SM, Pan Y, Lai HJ (2018) Object-location-aware hashing for multi-label image retrieval via automatic mask learning. IEEE Trans Image Process 27(9):4490–4502
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43
Jain P, Kulis B, Grauman K (2008) Fast image search for learned metrics. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE. https://doi.org/10.1109/CVPR.2008.4587841
Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 41–48
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446
Jiang QY, Li WJ (2016) Deep cross-modal hashing. In: CoRR
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Krähenbühl P, Koltun V (2014) Geodesic object proposals. In: European conference on computer vision. Springer, Cham, pp 725–739
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2130–2137
Kulis B, Jain P, Grauman K (2009) Fast similarity search for learned metrics. IEEE Trans Pattern Anal Mach Intell 31(12):2143–2157
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278
Lai H, Yan P, Shu X, Wei Y, Yan S (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479
Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855
Li T, Gao S, Xu Y (2017) Deep multi-similarity hashing for multi-label image retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 2159–2162
Li Y, Miao Z, He M, Zhang Y, Li H (2018) Deep attention residual hashing. IEICE Trans Fundam Electron Commun Comput Sci 101(3):654–657
Liang D, Yan K, Wang Y, Zeng W, Yuan Q, Bao X, Tian Y (2017) Deep hashing with multi-task learning for large-scale instance-level vehicle search. In: 2017 IEEE international conference on multimedia and Expo workshops (ICMEW). IEEE, pp 192–197
Lin G, Shen C, Suter D, Van Den Hengel A (2013) A general two-step approach to learning-based hashing. In: Proceedings of the IEEE international conference on computer vision, pp 2552–2559
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755
Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35
Lin K, Lu J, Chen CS, Zhou J (2016) Learning compact binary descriptors with unsupervised deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1183–1192
Liu L, Qi H (2018) Discriminative cross-view binary representation learning. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1736–1744
Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282
Liu TY et al (2009) Learning to rank for information retrieval. Found Trends® Inf Retr 3(3):225–331
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072
Liu L, Rahimpour A, Taalimi A, Qi H (2017a) End-to-end binary representation learning via direct binary embedding. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 1257–1261
Liu W, Ma H, Qi H, Zhao D, Chen Z (2017b) Deep learning hashing for mobile visual search. EURASIP J Image Video Process. https://doi.org/10.1186/s13640-017-0167-4
Lu J, Liong VE, Zhou X, Zhou J (2015) Learning compact binary face descriptor for face recognition. IEEE Trans Pattern Anal Mach Intell 37(10):2041–2056
Lu J, Liong VE, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367
Ma C, Chen Z, Lu J, Zhou J (2018) Rank-consistency multi-label deep hashing. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11). Citeseer, pp 353–360
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems, pp 1509–1517
Rahmani R, Goldman SA, Zhang H, Krettek J, Fritts JE (2005) Localized content based image retrieval. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 227–236
Rehman M, Iqbal M, Sharif M, Raza M (2012) Content based image retrieval: survey. World Appl Sci J 19(3):404–412
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Shen F, Gao X, Liu L, Yang Y, Shen HT (2017) Deep asymmetric pairwise hashing. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1522–1530
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Singhai N, Shandilya SK (2010) A survey on: content based image retrieval systems. Int J Comput Appl 4(2):22–26
Song G, Tan X (2018) Learning multilevel semantic similarity for large-scale multi-label image retrieval. In: Proceedings of the 2018 ACM on international conference on multimedia retrieval. ACM, pp 64–72
Stutz D (2014) Understanding convolutional neural networks. In: Seminar report, Fakultät für Mathematik, Informatik und Naturwissenschaften Lehr-und Forschungsgebiet Informatik VIII computer vision
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Wan J, Wu P, Hoi SC, Zhao P, Gao X, Wang D, Zhang Y, Li J (2015) Online learning to rank for content-based image retrieval. In: Twenty-fourth international joint conference on artificial intelligence
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Wang J, Shen H.T, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927
Wang J, Liu W, Kumar S, Chang SF (2016a) Learning to hash for indexing big data—a survey. Proc IEEE 104(1):34–57
Wang X, Shi Y, Kitani KM (2016b) Deep supervised hashing with triplet labels. In: Asian conference on computer vision. Springer, Berlin, pp 70–84
Wang D, Huang H, Lin HK, Mao XL (2017a) Supervised hashing for multi-labeled data with order-preserving feature. In: Chinese national conference on social media processing. Springer, Berlin, pp 16–28
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017b) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6450–6458. https://doi.org/10.1109/CVPR.2017.683
Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790. https://doi.org/10.1109/TPAMI.2017.2699960
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Wu D, Lin Z, Li B, Ye M, Wang W (2017) Deep supervised hashing for multi-label and large-scale image retrieval. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 150–158
Wu D, Lin Z, Li B, Liu J, Wang W (2018) Deep uniqueness-aware hashing for fine-grained multi-label image retrieval. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1683–1687
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence
Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H (2015) Convolutional neural networks for text hashing. In: IJCAI, pp 1369–1375
Yang H, Lin K, Chen C (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451. https://doi.org/10.1109/TPAMI.2017.2666812
Zhang H, Liu L, Long Y, Shao L (2017) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638
Zhang Z, Zou Q, Wang Q, Lin Y, Li Q (2018) Instance similarity deep hashing for multi-label image retrieval. arXiv preprint arXiv:1803.02987
Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1556–1564
Zhong C, Yu Y, Tang S, Satoh S, Xing K (2017) Deep multi-label hashing for large-scale visual search based on semantic graph. Asia-Pacific Web (APWeb) and web-age information management (WAIM) joint conference on web and big data. Springer, Berlin, pp 169–184
Zhou Y, Huang S, Zhang Y, Wang Y (2017) Deep hashing with triplet quantization loss. In: Visual communications and image processing (VCIP), 2017 IEEE. IEEE, pp 1–4
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: AAAI, pp 2415–2421
Zhu Y, Li Y, Wang S (2019) Unsupervised deep hashing with adaptive feature learning for image retrieval. IEEE Signal Process Lett 26(3):395–399
Zhuang B, Lin G, Shen C, Reid I (2016) Fast training of triplet-based deep binary embedding networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5955–5964
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rodrigues, J., Cristo, M. & Colonna, J.G. Deep hashing for multi-label image retrieval: a survey. Artif Intell Rev 53, 5261–5307 (2020). https://doi.org/10.1007/s10462-020-09820-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09820-x