Nothing Special   »   [go: up one dir, main page]

skip to main content
tutorial
Open access

A Survey on Deep Hashing Methods

Published: 20 February 2023 Publication History

Abstract

Nearest neighbor search aims at obtaining the samples in the database with the smallest distances from them to the queries, which is a basic task in a range of fields, including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this survey, we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods, and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion.

1 Introduction

Nearest neighbor search is among the most basic tasks in various domains including data mining [196] and image retrieval [16]. There have been a variety of algorithms for exact nearest neighbor search, such as KD-tree [42, 165]. Unfortunately, when it comes to high-dimensional and large-scale data, the time cost of accurately identifying the sample nearest to the query is substantial. To tackle the challenge, the approximate nearest neighbor search has received ever-increasing attention since it could significantly decrease the search complexity under most circumstances [36, 120, 165]. Hashing is one of the most widely-used methods, because it is very efficient in terms of computation and storage [10]. Its purpose is to convert the high-dimensional features vectors into low-dimensional hash codes, so that the hash codes of the similar objects are as close as possible, and the hash codes of dissimilar objects are as different as possible. The existing hashing methods consist of local sensitive hashing [22, 75] and learning to hash. The purpose of local sensitive hashing is to map the original data into several hash buckets. The closer the original distance between objects is, the greater the probability of falling in the same hash bucket. Through this mechanism, many algorithms based on locally sensitive hashing have been proposed [6, 7, 32, 33, 127, 131], which show high superiority in both calculation and storage. However, in order to improve the recall rate of search, these methods usually need to build many different hash tables, so their applications on particularly large data sets are still limited.
Since local sensitive hashing is data-independent, researchers try to get high-quality hashing codes by learning good hash functions. As two pioneering methods, i.e., spectral hashing and semantic hashing, have been proposed [136, 171], learning to hash has sparked considerable academic interest in both machine-learning and data mining. With the development of deep learning [93], getting hash codes through deep learning gets more and more attention for two reasons. The first reason is that the powerful representation capabilities of deep learning can learn very complex hash functions. The second reason is that deep learning can achieve end-to-end hashing codes, which is very useful in many applications. In this survey, we mainly focus on deep supervised hashing methods and deep unsupervised hashing methods, which are two mainstreams in hashing research. Moreover, three related important topics including semi-supervised deep hashing, domain adaption deep hashing, and cross-modal deep hashing are also included.
Deep supervised hashing has been explored over a long period. The design of the deep supervised hashing method mainly includes two parts: the design of the network structure and the design of the loss function. For small datasets like MINST [94] and CIFAR-10 [89], shallow architecture such as AlexNet [90] and CNN-F [23] are widely used. While for complex datasets like NUS-WIDE [29] and COCO [109], deeper architecture such as VGG [149] and ResNet50 [61] are needed. The loss objectives are designed with the intention of maintaining similarity structures. These methods [15, 104] usually aim at narrowing the difference between the similarity structures in the original and Hamming spaces. Researchers usually obtain the similarities in the original space by using label information in supervised scenarios, which is widely studied in different deep hashing methods. Hence, how obtaining the similarities of learned hash codes are important for different algorithms. We further categorize the deep supervised hashing algorithms according to how measuring the similarities of learned hash codes into four classes, i.e., pairwise methods, ranking-based methods, pointwise methods and quantization. For each manner, we comprehensively analyze how the related articles design the optimization objective and take advantage of semantic labels, as well as what additional tricks are used.
Another area of research along this line is deep unsupervised hashing, which does not require any label information. Deep unsupervised hashing has drawn widespread attention recently, since it is easily applied in practice. In unsupervised settings, the semantic information is usually derived from the relationship in the original space. Based on manners of learning semantic information, we categorize the deep unsupervised hashing algorithms into pseudo-label-based methods, similarity reconstruction-based methods, and prediction-free self-supervised learning-based methods. In addition, we also introduce some other related important topics such as semi-supervised deep hashing, domain adaptation deep hashing, and multi-modal deep hashing methods. The overall structure of this survey is shown in Figure 1. Meanwhile, we also present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. At last, a comparison of some key algorithms was given.
Fig. 1.
Fig. 1. The overall structure of this survey.
Compared to other surveys on hashing methods [18, 163, 164, 165], our article mainly centers on recent deep hashing methods rather than traditional hashing methods and how they optimize the hashing network. Moreover, we study both deep supervised hashing as well as deep unsupervised hashing extensively. Finally, we classify two topics in a brand-new view based on the different manners of optimization. As far as we know, this is the most comprehensive survey about deep hashing, which is beneficial to researchers in understanding the mechanisms and trends of deep hashing.

2 Background

2.1 Nearest Neighbor Search

Given a \(d\)-dimensional Euclidean space \(\mathbb {R}^{d}\), the nearest neighbor search aims at finding the sample \(\text{NN}(\mathbf {x}_r)\) in a finite set \(\Pi \subset \mathbb {R}^{d}\) such that
\begin{equation} \text{NN}(\mathbf {x}_r) = \arg \min _{\mathbf {x}_b \in \Pi } \rho (\mathbf {x}_r,\mathbf {x}_b), \end{equation}
(1)
in which \(\mathbf {x}_r\in \mathbb {R}^{d}\) represents the query point. Note that \(\rho\) could be any metrics such as Euclidean distance, cosine distance along with general \(\ell _{p}\) distance. Many exact nearest neighbor search methods [42] have been developed by the researchers, which works quite well when \(d\) is small. However, nearest neighbor search is intrinsically costly due to the curse of dimensionality [3, 4]. Although KD-tree can be extended to high-dimensional situations, its efficiency is far from satisfactory.
To solve this problem, a series of algorithms for approximate nearest neighbors have been proposed [33, 46, 79, 128]. The principle of these methods is to find the nearest point with a high probability, rather than to find the nearest point accurately. These ANN algorithms are mainly divided into three categories: hashing-based methods [1, 33, 123], product quantization-based methods [44, 79, 86, 190], and graph-based methods [56, 125, 126]. These algorithms have greatly improved the efficiency of searching while ensuring a relatively high accuracy, so they are widely used in the industry. Compared to the other two types of methods, hashing-based algorithms are the longest studied and the most studied by researchers, because it has great potential in improving computing efficiency and reducing memory cost.

2.2 Hashing Algorithms

For nearest neighbor search, hashing algorithms are very efficient in terms of both computing and storage. Two main types of hashing-based search methods have been developed, i.e., hash table lookup [96, 151] and hash code ranking [80, 116].
The primary goal of hash table lookup is to decrease the number of distance calculations for speeding up searches. The structure of the hash table contains various buckets, each of which is indicated by one separate hash code. Each point is associated with a hash bucket that shares the same hash code. Thus, the manner to learn hash codes for this kind of algorithm is to increase the likelihood of producing the same hash codes for adjacent points in the original space. When a query is given, we can find the corresponding hash bucket according to the hash code of the query, so as to find the corresponding candidate set. After this step, we usually re-rank the points in the candidate set to get the final search target. However, the recall of selecting a single hash bucket as a candidate set will be relatively low. Two methods are usually adopted to overcome this problem. The first method is to select some buckets that are close to the target bucket at the same time. The second method is to independently create multiple different hash tables according to different hash codes. Then, we can select the corresponding target bucket from each hash table.
Hash code ranking is a relatively easier way than hash table lookup. When a query comes, we compute the Hamming distance between the query and each point in the searching dataset, then select the points with relatively smaller Hamming distances as the candidates for nearest neighbor search. After that, a re-ranking process by the original features is usually followed to obtain the final nearest neighbor. Different from hash table lookup methods, hash code ranking methods prefer hash codes that preserve the similarities or distances in the original space.

2.3 Deep Neural Networks

Deep neural networks [137] have achieved significant success in various areas including computer vision [38, 61] and natural language processing [52, 154]. Early works such as deep belief network [63], and autoencoder [129] are mostly based on multi-layer perceptions. However, these networks do not show much better performance compared with traditional methods such as support vector machine and k-nearest neighbors algorithm. As convolutional neural networks have been introduced to process image data, various popular deep networks have been proposed and achieved promising results. AlexNet [90] consists of five convolutional layers followed by three fully connected layers. VGGNet [149] increases the model depth and improves the performance of image classification. NIN [107] is further proposed to promote the discriminability of image patches within the receptive field. Researchers have found that the depth of representations is the key to high performance for various visual recognition tasks. However, the problem of vanishing/exploding gradients makes it difficult to build very deep neural networks. ResNet [61] tackles this problem by leveraging the residual learning to deepen the network and benefits from very deep models. Recently, Vision Transformer [38] has achieved great success on image classification tasks due to its high model capacity and easy scalability. These powerful neural network architectures has become the backbone networks in various applications, including semantic segmentation [155] and object detection [57]. In virtue of the strong representation ability of deep neural networks, deep hashing has shown great performance in image retrieval and drawn increasing attention recently.

2.4 Learning to Hash

Given an input item \(\mathbf {x}\), learning to hash aims at obtaining a hash function \(f\), which maps \(\mathbf {x}\) to a binary code \(\mathbf {b}\) for the convenience of the nearest neighbor search. The hash codes obtained by a good hash function should preserve the distance order in the original space as much as possible, i.e., those items that are close to the specific query in Hamming space should also be close to the query in the original space. Many traditional hash functions include spherical function, linear projection, and even a non-parametric function. A wide range of traditional hashing methods [46, 48, 58, 113, 140, 153, 165, 171, 196] have been proposed by researchers to learn compact hash codes, and achieved significant progress. For instance, AQBC [47] utilizes the angle between two vectors to measure the similarity and maps feature vectors into the most similar vertices of a binary hypercube. FSDH [53] regresses the semantic labels of samples to their binary codes and optimizes the hash codes in an alternative manner. For a more comprehensive understanding, refer to a survey article [163]. However, these simple hash functions do not scale well for huge datasets. For the strong representation ability of deep learning, more and more researchers pay attention to deep supervised hashing and develop a range of promising methods. These methods generally achieve better performance than traditional methods.

3 Deep Supervised Hashing

In this article, we first talk about deep supervised hashing methods, which are the basis of the subsequent deep unsupervised hashing techniques.

3.1 Overview

Deep supervised hashing uses deep neural networks as hash functions, which can generate hash codes in an end-to-end manner. We focus on the following four key problems: (1) what deep neural network architecture is adopted; (2) how to design the loss function for preserving the similarity structure; (3) how to optimize the deep neural network with the discretization problem; and (4) what other skills can be used to improve the performance. We first answer the first three problems in a nutshell and the last problem is left in the subsequent detailed introduction. Figure 2 shows a representative framework of deep supervised hashing.
Fig. 2.
Fig. 2. Basic Framework of Deep Supervised Hashing with Pairwise Similarity Measurement. The hash codes are produced by a hashing network. Afterwards, the pairwise similarity information of hash codes and ground truth is matched and thus get similarity preserving loss. More details will be discussed in Section 3.2.

3.1.1 Network Architecture.

Traditional hashing methods usually utilize linear projection and kernels, which show poor representation ability. After AlexNet and VGGNet [90, 149] were proposed, deep learning shows its superiority in computer vision, especially for classification problems. And more and more experiments have proved that the deeper the network, the better the performance. As a result, ResNet [61] takes advantage of residual learning, which can train very deep networks, achieved significantly better results. After that, ResNet and its variants have become basic architectures in deep learning [61, 71, 175]. The latest researches often utilize the popular architectures with pre-trained weights in large datasets such as ImageNet, following the idea of transfer learning. Most of the researchers utilize shallower architectures such as AlexNet, CNN-F, and design stacked convolutional neural networks for simple datasets, e.g., MNIST, CIFAR-10. Deeper architectures such as VGGNet and ResNet50 are often utilized for complex datasets such as NUS-WIDE and COCO. To be more precise, for deep supervised hashing methods, the hashing network is usually modified from these aforementioned standard networks by replacing the classification head with a fully-connected layer containing \(L\) units for hash code learning. The network outputs are usually continuous codes. The hash codes can be obtained using a sign activation. Graph neural networks, which capture the dependence between the nodes of graphs via message passing mechanisms, have been popular in various applications. They have also been adopted in recent hashing methods to learn the correlation of datasets [30, 157, 167].
The architecture of the hashing network is one of the most important factors for deep supervised hashing, and it affects both the accuracy of the search and the time cost of inference. If the architecture degenerates into MLP or linear projections, deep supervised hashing will degrade into traditional hashing methods. Although the deeper the network architecture, the greater the search accuracy, it also increases the time cost. We think that the architecture needs to be considered combined with the complexity of datasets. As we know, the majority of existing deep hashing methods can use any network architecture as needed. Therefore, we do not adopt the network architectures for categorizing the deep supervised hashing algorithms.

3.1.2 Similarity Measurement and Objective Function.

We first provide formal notations and key concepts in Table 1 for the sake of clarity. \(\mathcal {X}=\lbrace \mathbf {x}_i\rbrace _{i=1}^{N}\) is denoted as the training set. \(\mathcal {H}=\lbrace \mathbf {h}_i\rbrace _{i=1}^{N}\) denotes the outputs of the hashing network, i.e., \(\mathbf {h}_i= \Psi (\mathbf {x}_i)\). \(\mathcal {B}=\lbrace \mathbf {b}_i\rbrace _{i=1}^{N}\) is the obtained binary codes. We denote the similarity between pair of items \((\mathbf {x}_i,\mathbf {x}_j)\) in the input space and Hamming space as \(s_{ij}^o\) and \(s_{ij}^h\), respectively. In the input space, the similarity is the ground truth, which is mainly based on sample distance \(d_{ij}^o\) and semantic labels. The former refers to the distance of features, e.g., Euclidean distance \(||\mathbf {x}_i-\mathbf {x}_j||_2\), and the similarity can be computed using Gaussian function or Characteristic function, i.e., \(\exp (-\tfrac{(d_{i j}^{o})^{2}}{2 \sigma ^{2}})\) and \(I_{d_{ij}^o\lt \tau }\) where \(\tau\) is a given threshold. The cosine similarity is also popular for measurement. The latter is more popular in deep supervised hashing, where the value is 1 if the two examples share a common semantic label and 0 vice visa.
Table 1.
SymbolDescription
\(\mathbf {x}_i\) (\(\mathbf {X}\))input images (in matrix form)
\(\mathbf {b}_i\) (\(\mathbf {B}\))output hash codes (in matrix form)
\(\mathbf {h}_i\) (\(\mathbf {H}\))network outputs (in matrix form)
\(\mathbf {y}_i\) (\(\mathbf {Y}\))one-hot image labels (in matrix form)
\(\Psi (\cdot)\)hashing network
\(N\)the number of input images
\(L\)hash code length
\(\mathcal {E}\)a set of pair items
\(s_{ij}^o\)the similarity of item pair \((\mathbf {x}_i, \mathbf {x}_j)\) in the input space
\(s_{ij}^h\)the similarity of item pair \((\mathbf {x}_i, \mathbf {x}_j)\) in the Hamming space
\(d_{ij}^o\)the distance of item pair \((\mathbf {x}_i, \mathbf {x}_j)\) in the input space
\(d_{ij}^h\)the distance of item pair \((\mathbf {x}_i, \mathbf {x}_j)\) in the Hamming space
\(\epsilon\)margin threshold parameter
\(\mathbf {W}\)weight parameter matrix
\(\Theta\)set of neural parameters
Table 1. Summary of Symbols and Notation
The pairwise distance \(d_{ij}^h\) in the Hamming space is Hamming distance naturally, which is defined as follows:
\begin{equation} d_{i j}^{h}=\sum _{l=1}^{L} \delta \left[\mathbf {b}_{i}(l) \ne \mathbf {b}_{j}(l) \right]\!. \end{equation}
(2)
If the hash code is valued by 1 and 0, we have:
\begin{equation} d_{i j}^{h}=\Vert \mathbf {b}_{i}-\mathbf {b}_{j}\Vert _{1}\!, \end{equation}
(3)
and it varies from 0 to \(L\). As a result, the similarity in this circumstance is denoted as \(s_{ij}^h=(L-d_{ij}^h)/L\). If the code is valued by 1 and \(-\)1, we have:
\begin{equation} d_{ij}^h=\frac{1}{2}(L-\mathbf {b}_i^T\mathbf {b}_j). \end{equation}
(4)
The similarity is defined using the inner product, i.e., \(s_{i j}^{h}=(\mathbf {b}_{i}^{\top } \mathbf {b}_{j}+L)/2L\). We can also extend this to the weighted circumstance. In formulation,
\begin{equation} d_{i j}^{h}=\sum _{l=1}^{L} \lambda _{l} \delta \left[\mathbf {b}_{i}(l) \ne \mathbf {b}_{j}(l) \right]\!, \end{equation}
(5)
where each bit is associated with a weight \(\lambda _l\), and if the values of codes are 1 and \(-\)1, we have
\begin{equation} s_{i j}^{h}=(\mathbf {b}_{i}^{\top } \Lambda \mathbf {b}_{j}+ tr(\Lambda))/2tr(\Lambda), \end{equation}
(6)
in which \(\mathbf {\Lambda }=\text{diag}(\lambda _{1}, \lambda _{2}, \ldots , \lambda _{l})\) is diagonal and \(tr(\cdot)\) denotes the trace of the matrix. The weight of the associated hash bit fills each diagonal element of the matrix.
After defining the similarity measurement, we focus on the objective functions in deep supervised methods. A well-designed objective functions is one of the most important factors to promise the performance of deep supervised hashing. The main guideline for designing the objective function is to keep the similarity structure, which means to minimize the difference between the similarities in the original and Hamming spaces. As a result, most of the objective functions contain the terms of similarity information. Among them, the typical loss functions are in a pairwise manner, making similar pairs of images have similar hash codes (small Hamming distance) and dissimilar pairs of images have dissimilar hash codes (large Hamming distance). Besides, a variety of researchers adopt ranking-based similarity preserving loss terms. For example, triplet loss is often used to maintain as much consistency as possible between the ordering of numerous items calculated from the original and Hamming spaces. There are also several listwise loss terms that consider the whole datasets for similarity preserving.
Besides similarity information, the pointwise label information is also well-explored in the design of the objective function. There are three popular ways to take advantage of label information summarized below. The first way is a regression on hash codes with labels. The label is encoded into one-hot format matrix and regression loss, i.e., \(||\mathbf {Y}-\mathbf {W}\mathbf {H}||_F\) are added into the loss function. The second way is adding a classification layer after the hashing network, and a classification loss (e.g., cross-entropy loss) is added to the objective function. The last one is utilizing LabNet, which was first proposed in [99]. LabNet aims at capturing the ample semantic relationships among example pairs.
The quantization loss term is also commonly used in deep supervised hashing, especially in quantization-based hashing methods. The typical form of quantization is to penalize the distance between continuous codes (i.e., network outputs) and binary codes. As a common technique in deep hashing, bit balance loss penalizes the situation that each bit has a large chance of being 1 or \(-1\) among the whole dataset. Several regularization losses can be added to the loss function, which is also important for improving the performance.

3.1.3 Optimization Algorithm.

It is difficult to optimize the hashing network parameters because of the vanishing gradient issue resulting from the sign activation function, which is used to obtain binary hash codes. Specifically, the sign function is in-differentiable at zero and its gradient is zero for all nonzero input, which is fatal to the hashing network using gradient descent for optimization.
Almost all the works adopt that continuous relaxation by smoothing the sign function using the sigmoid function or the hyperbolic tangent function, and apply sign function to obtain final binary codes later in the evaluation phase. The first typical way is quantization function by adding a penalty term in loss function, which is often formulated as \(|||\mathbf {h}_i|-\mathbf {1}||_1\), or \(-||\mathbf {h}_{i}||\) with tanh activation. This penalty term helps the neural network to obtain \(sgn(\mathbf {h_i})\approx \mathbf {h_i}\). Note that this loss can be considered as a prior for every binary code \(h_i\) on basis of a variant of certain distribution, e.g., bimodal Laplacian and Cauchy distribution. From this view, we can get a few variants, e.g., pairwise quantization [200] and Cauchy quantization loss [15]. If the loss function is a non-smooth function and its derivative is hard to calculate, a modified version can be adopted instead, e.g., \(|x| \approx \log (\cosh x)\) [200]. The second way is an alternative scheme, which resolves the optimization into several sub-problems. Then, these sub-problems could be iteratively settled through alternating the minimization of objectives. In this alternative process, backpropagation can only work in one sub-problem, and the other sub-problems can be solved by other optimization methods. For example, DSDH [100] utilizes the discrete cyclic coordinate descend algorithm. These methods can keep the discrete constraint during the whole optimization process, while it can not lead to end-to-end training, which has limited application for solving the unknown sub-problems. The third method is named continuation, which utilizes a smoothed function \(y= tanh(\beta x)\) to approach the discrete activation function by increasing \(\beta\) [19]. There are some other ways to solve this problem by changing the calculation and the propagation of gradients, e.g., Greedy Hash [156] and Gradient Attention Network [72], which improve the effectiveness and accuracy of deep supervised hashing.

3.1.4 Summarization.

In this survey, we divide the current methods into the following four classes mainly based on how to measure the similarities in the Hamming space: the pairwise methods, the ranking-based methods, the pointwise methods, and the quantization methods. The quantization methods are separated from the pairwise methods due to their specificity. The key motivation we select how to measure the similarities of the learned hash codes for categorization is that the fundamental core of learning to hash is to maintain the similarity structure and the manners of similarity measurement decide the loss functions in deep supervised hashing. Additionally, neural network architectures, optimization manners as well as other skills are also significant for the retrieval performance. For each class, we will discuss the corresponding deep hashing methods in detail one by one. The detailed summarization of these methods is shown in Table 2.
Table 2.
ApproachPairwiseRanking-basedPointwiseBinarizationOther skills
SDH [40]Prod.--Quan.Bit Bal. + Orthogonality
DSH [111]Prod. + Margin--Quan.-
PCDH [27]Prod. + Margin-Cla. LayerDropPairwise Correlation
WMRSH [98]Prod. + Margin-Cla. LayerQuan.Bit and Table Weight
SHBDNN [36]Diff.--Quan. + AlternationBit Bal. + Independence
DDSH [82]Diff.--AlternationSplitting Training Set
CNNH [173]Diff.-Part of Hash Codes-Two-step
ADSH [84]Diff.--Quan. + AlternationAsymmetry
DIH [172]Diff.--Quan. + AlternationIncremental Part + Bit Bal.
HBMP [9]Diff.--DropBit Weight + Two-step
DOH [85]Diff.--RankingFCN
DPSH [101]Like.--Quan.-
DHN [200]Like.--Quan. + Smooth-
HashNet [19]Weighted Like.--Tanh + Continuation-
DSDH [100]Like.-Linear Reg. + L2Quan. + Alternation-
DAPH [139]Like.--Quan. + AlternationBit Bal. + Ind.
DAgH [180]Like. + Diff.---Two-step
DCH [15]Cauchy Like.--Cauchy Quan.-
DJSEH [99]Like.-LabNet + Linear Reg.Quan.Two-step + Asymmetry
ADSQ [181]Diff. + Like.-LabNetQuan. + AlternationBit Bal. + Two-step
MMHH [87]t-Distribution. Like.--Quan.Semi-Batch Optimization
DAGH [26]Like.-Linear Reg.Drop + AlternationReg. with Anchor Graph
HashGAN [14]Weighted Like.--Cosine Quan.GAN
DPH [20]Priority Like.--Priority Quan.Priority CE Loss
DFH [103]Like. + Margin--Quan. + AlternationQuantized Center Loss
DRSCH [189]Diff. + MarginTriplet + Margin-DropBit Weight
DNNH [92]-Triplet + Margin-Piecewise Thresholding-
DSRH [197]-Weighted Triplet-Quan.Bit Bal.
DTSH [169]-Triplet + Like.-Quan.-
DSHGAN [133]-Triplet + MarginCla. LayerDropGAN
AnDSH [198]-Matrix OptimizationAngular-softmaxDropBit Bal.
HashMI [8]-Mutual Information-Drop-
TALR [59]-Relaxed AP + NDCG-Tie-Awareness 
MLRDH [117]--Multi-linear Reg.AlternationHash Boosting
HCBDH [25]--Cla. Layer-Hadamad Loss
DBH [106]--Cla. Layer-Transform Learning
SSDpH [180]--Cla. LayerQuan.Bit Bal.
VDSH [195]--Linear Reg.Drop + Alternation-
PMLR [144]--Cla. Layer-Distribution Regu.
CSQ [185]--Center + Binary CEQuan.-
DPH [41]--Center + Polarization--
OrthHash [64]--Center + CE--
PSLDH [158]--Center + Partial LossQuan.-
DVsQ [16]-Triplet Loss + Margin-Inner-Product Quan.Label Embeddings
DPQ [88]--Cla. Layer-Joint Central Loss
DSQ [39]--Cla. LayerQuan. + AlternationJoint Central Loss
SPDAQ [24]Diff.-Cla. LayerDrop + AlternationAsymmetry
DQN [17]Diff.--Product Quan.Asymmetric Quan. Distance
DTQ [110]-Triplet + Margin-Weak-Orthogonal Quan.Group Hard
Table 2. A Summary of Deep Supervised Hashing Methods w.r.t the Different Manner of Similarity Measurement (Pairwise Methods, Ranking-based Methods, and Pointwise Methods), Binarization as well as Other Skills
Drop = Drop the sign operator in the neural network and treat the binary code as an approximation of the network output, Two-step = Two-step optimization. Reg. = Regression, Quan. = Quantization Loss, Cla. = Classification, Ind. = Independence, Regu. = Regularization, Bal. = Balance, Diff. = Difference Loss, Prod. = Product Loss, and Like. = Likelihood Loss.

3.2 Pairwise Methods

We further divide the techniques, which match the distances or similarities of image pairs derived from two spaces, i.e., the original space and the Hamming space into two parts as follows:
Difference loss minimization. The kind of losses minimizes the difference between the similarities, i.e., \(\min \sum _{(i, j) \in \mathcal {E}}(s_{i j}^{o}-s_{i j}^{h})^{2}\) [9, 36, 82, 84, 85, 172, 173]. \(s_{i j}^{h}\) can be derived with inner product of binary codes, i.e., \(s_{i j}^{h} = \mathbf {b}_i^T\mathbf {b}_j/L\) and \(s_{i j}^{o}\) is now valued by 1 or \(-1\). However, binary optimization is difficult to implement. Early methods utilizes the relaxed outputs of neural networks to replace the hash codes, i.e., \(s_{i j}^{h} = \mathbf {h}_i^T\mathbf {h}_j/L\) [36, 173]. Subsequent methods utilize a asymmetric manner to calculate the similarity, i.e., \(s_{i j}^{h} = \mathbf {b}_i^T\mathbf {h}_j/L\), which releases the impact of quantization error [82, 85, 172]. There are also works combining both symmetric and asymmetric similarities [84]. Weighted bits are also introduced for adaptive similarity calculation [9]. Note that the difference losses can be transformed into a product form. Hence, we also categorize the methods minimizing product loss as this group. They usually adopt a loss in the product form, i.e., \(\min \sum _{(i, j) \in \mathcal {E}} s_{i j}^{o} d_{i j}^{h}\) [40], which expects that if the similarities in the original space are higher, the distances in the Hamming space should be less. Subsequent methods usually involve a margin in the loss for better relaxation [27, 98, 111].
Likelihood loss minimization. This kind of losses is derived from the probabilistic model. Given similarity matrix \(\mathbf {S}=\lbrace s_{ij}^o\rbrace _{(i,j)\in \mathcal {E}}\) and hash codes \(\mathbf {B} = [\mathbf {b}_1, \ldots , \mathbf {b}_N]^T\), the posterior estimation of binary codes is formulated as follows:
\begin{equation} p(\mathbf {B}| \mathbf {S}) \propto p(\mathbf {S} |\mathbf {B}) p(\mathbf {B})=\prod _{(i, j) \in \mathcal {E}} p\left(s_{i j}^o | \mathbf {B}\right) p(\mathbf {B}), \end{equation}
(7)
where \(p(\mathbf {B})\) denotes a prior distribution and \(p(\mathbf {S}|\mathbf {B})\) is the likelihood. The conditional probability of \(s_{ij}^o\) given their hash codes is denoted by \(p(s_{i j}^o |\mathbf {B})\). Note that the \(s_{ij}^h\) is derived from \(\mathbf {B}\). In formulation,
\begin{equation} p\left(s_{i j}^{o} | \mathbf {B}\right) = p \left(s_{i j}^{o} | s_{i j}^{h} \right)=\left\lbrace \begin{array}{cc}\sigma \left(s_{i j}^{h}\right)\!, & s_{i j}^{o} = 1 \\ 1-\sigma \left(s_{i j}^{h}\right)\!, & s_{i j}^{o} = 0 \end{array}\right.\!\!\!, \end{equation}
(8)
in which \(\sigma (x)=1/(1+e^x)\). From Equation (8), the probabilistic model expects the similarities in the Hamming space to be larger if the similarities in the original space are larger. The loss function is the negative log-likelihood (NLL) [101, 200], i.e.,
\begin{equation} \mathcal {L}_{NLL} = -\log p(\mathbf {S}|\mathbf {H})=\sum _{(i, j) \in \mathcal {E}}\log (1+e^ {s_{ij}^h})-s_{i j}^o s_{ij}^h. \end{equation}
(9)
Similarly, the hashing network usually cannot directly obtain the hash codes. Hence, these codes \(\mathbf {B}\) will be replaced by the network outputs \(\mathbf {H}\) to generate \(s_{ij}^h\). The majority of methods adopt the symmetric similarities, while several methods utilize the asymmetric form for similarity calculation [99, 139]. However, the sigmoid function in Equation (8) is not optimal, and there are a number of works that utilize different tools to design valid probability functions, e.g., priority weighting [20], Cauchy distribution [16], imbalance learning [19] and t-Distribution [87]. Subsequent works combine label information with pairwise similarity learning for better semantic preserving [26, 99, 100]. Li et al. [103] associate the likelihood loss with Fisher’s Linear discriminant, and introduce a margin for discriminative hash codes. Chen et al. [26] reduce the computational cost by introducing anchors for similarity calculation. There are also some works combining both difference loss minimization and likelihood loss minimizing for comprehensive optimization [182].
Although these methods in each group could utilize the same pairwise loss term, they may involve different architectures, optimization manners and regularization terms, as shown in Table 2. These details of different variants will be shown below.

3.2.1 Difference Loss Minimization.

Deep Supervised Hashing (DSH) [111]. DSH unitizes a network consisting of three convolutional-pooling layers and two fully connected layers. Recall that the outputs of the hashing network are \(\lbrace \mathbf {h}_i \rbrace _{i=1}^N\). The origin pairwise loss function is defined as follows:
\begin{equation} \begin{aligned}\quad & \mathcal {L}_{\text{DSH}}=\sum _{(i, j) \in \mathcal {E}}\frac{1}{2}s_{ij}^od_{ij}^h+\frac{1}{2}(1-s_{ij}^o)[\epsilon -d_{ij}^h]_+\\ & \mbox{s.t.}\ \forall \quad \mathbf {h}_i,\mathbf {h}_j \in \lbrace -1,1\rbrace ^L, \end{aligned} \end{equation}
(10)
where \(d_{ij}^h = ||\mathbf {h}_i-\mathbf {h}_j||_2^2\), \([\cdot ]_+\) denotes \(\max (\cdot ,0)\) and \(\epsilon \gt 0\) is a given threshold parameter. The loss function obeys a distance-similarity product minimization formulation that expects similar examples mapped to similar binary codes and rewards dissimilar examples transferred to distinct binary codes when the Hamming distances are smaller compared with the margin threshold \(m\). It is noticed that when \(d_{ij}^h\) is larger than \(m\), the loss does not produce gradients. This idea is similar to the hinge loss function.
As we discuss before, DSH relaxes the binary constraints and a regularizer is added to the continuous outputs of the hashing network, which approximates the binary codes, i.e., \(h\approx sgn(h)\). The pairwise loss is rewritten as
\begin{equation} \mathcal {L}_{\text{DSH}}=\frac{1}{2}s_{ij}^o||\mathbf {h}_i-\mathbf {h}_j||_2^2+\frac{1}{2}(1-s_{ij}^o)[\epsilon -||\mathbf {h}_i-\mathbf {h}_j||_2^2]_+ +\lambda _1\sum _{k=i,j}|||\mathbf {h}_k|-\mathbf {1}||_1, \end{equation}
(11)
where \(\mathbf {1}\) denotes a all-one vector and \(||\cdot ||_p\) produces the \(\ell _{p}\)-norm of the vector. \(\lambda _1\) is a parameter to balance the effects of the regularization loss. DSH does not utilize saturating non-linearities because it may slow down the training process. With the above loss function, the neural network is able to be trained with an end-to-end back propagation algorithm. For the evaluation process, the binary codes can be derived using the sign activation function. DSH is a straight-forward deep supervised hashing method in the early period, and its idea originates from Spectral Hashing [171] but with a deep learning framework.
Pairwise Correlation Discrete Hashing (PCDH) [27]. PCDH utilizes four fully connected layers after the convolutional-pooling layer, named deep feature layer, hash-like layer, discrete hash layer as well as classification layer, respectively. The third layer can directly generate discrete hash code. Different from DSH, PCDH leverages \(\ell _2\) norm of deep features and hash-like codes. Besides, the classification loss is included in the final function:
\begin{equation} \begin{aligned}\mathcal {L}_{\text{PCDH}} &= \mathcal {L}_{s}+\lambda _1 \mathcal {L}_{p}+\beta \mathcal {L}_{l} \\ &=\sum _{(i, j) \in \mathcal {E}}\left(\frac{1}{2}(1-s_{i j}^o)[\epsilon -\Vert \mathbf {h}_{i}-\mathbf {h}_{j}\Vert _{2}^{2}]_+^{2}+\frac{1}{2} s_{i j}\Vert \mathbf {h}_{i}-\mathbf {h}_{j}\Vert _{2}^{2}\right) \\ &\quad +\lambda _1\sum _{(i, j)\in \mathcal {E}}\left(\frac{1}{2}\left(1-s_{i j}^o\right)[\epsilon -||\mathbf {z}_i-\mathbf {z}_j||_{2}^{2}]_+^{2}+\frac{1}{2} s_{i j}^o\Vert \mathbf {z}_i-\mathbf {z}_j\Vert _{2}^{2}\right)\\ &\quad +\lambda _2\left(\sum _{i=1}^{N} \phi \left(\mathbf {w}_{i}^{T} \mathbf {b}_{i}, \mathbf {y}_{i}\right)+\sum _{j=1}^{N} \phi \left(\mathbf {w}_{j}^{T} \mathbf {b}_{j}, \mathbf {y}_{j}\right)\right), \end{aligned} \end{equation}
(12)
where \(\mathbf {z}_i,\mathbf {h}_i,\) and \(\mathbf {b}_i\) denote the outputs of the first three fully connected layers. The last term is the classification cross-entropy loss.1 Note that the second term is called pairwise correlation loss, which guides the similarity learning of deep features to avoid overfitting. The classification loss provides semantic supervision, which helps the model achieve competitive performance. Besides, PCDH proposes a pairwise construction module named Pairwise Hard, which samples positive pairs with the maximum distance between deep features and negative pairs with the distances smaller than the threshold randomly. It is evident that Pairwise Hard chooses the hard pairs with the large loss for effective hash code learning.
Supervised Deep Hashing (SDH) [40]. SDH utilizes the fully-connected neural network for deep hashing and has a similar loss function except for a term that enforces a relaxed orthogonality constraint on all projection matrices (i.e., weight matrices in a neural network) for the property of fully-connected layers. Bit balance regularization is also included, which will be introduced in Equation (14).
Supervised Hashing with Binary Deep Neural Network (SH-BDNN) [36]. The architecture of SH-BDNN is stacked by a fully connected layer, in which \(\mathbf {W}_i\) denotes the weights in the ith layer. SH-BDNN not only considers the bit balance, i.e., each bit obeys a uniform distribution, but also considers the independence of different hash bits. Given the hash code matrix \(\mathbf {B} = [\mathbf {b}_1, \ldots , \mathbf {b}_N]^T\), the two conditions are formulated as
\begin{equation} \mathbf {B}^T\mathbf {1}=\mathbf {0}, \frac{1}{N}\mathbf {B}^T\mathbf {B}=\mathbf {I}, \end{equation}
(13)
where \(\mathbf {1}\) is a \(L\)-dimension vector whose elements are all one, and \(\mathbf {I}\) is an identity matrix of size \(N\) by \(N\). The loss function is
\begin{equation} \begin{aligned}\mathcal {L}_{\text{SH-BDNN}} &= \frac{1}{2 N}\left\Vert \frac{1}{L}\mathbf {H}\mathbf {H}^{T} -\mathbf {S}\right\Vert ^{2}\\ &\quad +\frac{\lambda _{1}}{2} \sum _{k=1}^{K-1}||\mathbf {W}^{(k)}||^{2} +\frac{\lambda _{2}}{2N}||\mathbf {H}-\mathbf {B}||^{2} \\ &\quad +\frac{\lambda _{3}}{2}\left\Vert \frac{1}{N} \mathbf {H}^{T}\mathbf {H}-\mathbf {I}\right\Vert ^{2} +\frac{\lambda _{4}}{2 N}||\mathbf {H}^T \mathbf {1}||^{2} \\ &\ \mbox{s.t.} \quad \mathbf {B} \in \lbrace -1,1\rbrace ^{N \times L}\!. \end{aligned} \end{equation}
(14)
\(\mathbf {H}\) is stacked by the outputs of network and \(\mathbf {B}\) is stacked by the binary codes to be optimized from the Equation (14). \(\mathbf {S}\) is the pairwise similarity matrix valued 1 or \(-\)1. The first term is similarity difference loss minimization, the second term is the \(\ell _{2}\) regularization, the third term is the quantization loss, and the last two terms are to punish the correlation and the imbalance of bits, respectively. Note that the \(\mathbf {B}\) is not the sign of \(\mathbf {H}\). As a result, the loss function is optimized by updating the network parameter and \(\mathbf {B}\) alternatively. To be specific, \(\mathbf {B}\) is optimized with a fixed neural network, while the neural network is trained with fixed \(\mathbf {B}\) alternatively. SH-BDNN has a well-designed loss function, which follows Kernel-based Supervised Hashing [113]. However, the architecture does not include the popular convolutional neural network, and it is not an end-to-end model. As a result, the efficiency of this model is low in large-scale datasets.
Convolutional Neural Network Hashing (CNNH) [173]. CNNH is the earliest deep supervised hashing framework to our knowledge. It adopts a two-step strategy. In the first step, it optimizes the objective function using a coordinate descent strategy as follows:
\begin{equation} \mathcal {L}_{\text{CNNH}}=\left\Vert \frac{1}{L}\mathbf {H}\mathbf {H}^{T} -\mathbf {S}\right\Vert ^{2}, \end{equation}
(15)
which generates approximate binary codes. In the second step, CNNH utilizes obtained hash codes to train the convolutional neural network with \(L\) output units. Besides, if class labels are available, the fully connected layer with \(K\) output units is added, which correspond to the \(K\) class labels of images and the classification loss is added to the loss function. Although CNNH uses labels in a clumsy manner, this two-step strategy is still popular in deep supervised hashing and inspires many other state-of-the-art methods.
Deep Discrete Supervised Hashing (DDSH) [82]. DDSH uses a column-sampling manner for partitioning the training data into \(\lbrace \mathbf {x}_i\rbrace _{i\in \Omega }\) and \(\lbrace \mathbf {x}_i\rbrace _{i\in \Gamma }\), where \(\Omega\) and \(\Gamma\) are the indexes. The loss function is designed in an asymmetric form:
\begin{equation} \mathcal {L}_{\text{DDSH}}= \sum _{i\in \Omega ,j\in \Gamma }\mathcal {L}\left(s_{ij}^o - \mathbf {b}_i^T \mathbf {h}_j\right)^2 +\sum _{i,j\in \Omega }\mathcal {L}\left(s_{ij}^o - \mathbf {b}_i^T \mathbf {b}_j\right)^2, \end{equation}
(16)
where \(\mathbf {b}_i\) and \(\mathbf {h}_i\) are the binary code to be optimized and the output of the network, respectively. \(\mathbf {b}_i\) and \(\mathbf {h}_i\) are updated alternatively following [36]. It is notable because DDSH takes an asymmetric strategy for learning to hash, which aids in both binary code generation and continuous feature learning through the pairwise similarity structure.
Hashing with Binary Matrix Pursuit (HBMP) [9]. HBMP also takes advantage of the two-step strategy introduced above. Different from CNNH, HBMP utilizes the weighted Hamming distances and adopts a different traditional hashing algorithm called binary code inference to get hash codes. In the first step, the objective function is written in the following equation
\begin{equation} \mathcal {L}_{\text{HBMP}}=\frac{1}{4}\sum _{i,j}\left(\mathbf {b}_i^T\Lambda \mathbf {b}_j -s_{ij}^o\right)^2\!, \end{equation}
(17)
where \(\mathbf {\Lambda }\) is a diagonal weight matrix. It is noticed that the similarity matrix with each element \(S^h_{ij}=\mathbf {b}_i^T\Lambda \mathbf {b}_j\) can be approximated by a step-wise algorithm. HBMP also trains a convolutional neural network by the obtained hash codes with point-wise hinge loss and shows that deep neural networks help to simplify the optimization problem and get robust hash codes.
Asymmetric Deep Supervised Hashing (ADSH) [84]. ADSH considers the samples in the database and query set using an asymmetric manner, which can help to train the model more effectively, especially for large-scale nearest neighbor search. ADSH contains two critical components, i.e., a feature learning part and a loss function part. The first one is to utilize a hashing network to learn discrete codes for queries. The second one is used to directly learn discrete codes for database points by minimizing the same objective function with supervised information. The loss function is formulated as
\begin{equation} \begin{aligned}& \mathcal {L}_{\text{ADSH}}=\sum _{i \in \Omega , j\in \Gamma } \left(\mathbf {h}_i^T\mathbf {b}_j-L s_{ij}^o\right)^2\!, \\ & \mbox{s.t.}\quad \mathbf {b}_j\in \lbrace -1,1\rbrace ^L, \end{aligned} \end{equation}
(18)
where \(\Omega\) is the index of query points, \(\Gamma\) is the index of database points. Network parameters \(\Theta\) and binary codes \(\mathbf {b}_j\) are updated alternatively following SH-BDNN [36] during the optimization process. If only the database points are available, we let \(\Omega \subset \Gamma\) and add a quantization loss \(\sum _{i \in \Omega }(\mathbf {b}_i-\mathbf {h}_i)^2\) with the coefficient \(\gamma\). This asymmetric strategy combines deep hashing and traditional hashing, which can help achieve better performance.
Deep Incremental Hashing Network (DIHN) [172]. DIHN tries to learn hash codes in an incremental manner. Similar to ADSH [84], the dataset is divided into two parts, i.e., original and incremental databases, respectively. When a new image comes from an incremental database, its hash code is learned while keeping the hash codes of the original database unchanged. The optimization process still uses the strategy of alternately updating parameters.
Deep Ordinal Hashing (DOH) [85]. DOH generates ordinal hash codes by taking advantage of both local and global features. Specifically, two subnetworks learn the local semantics using a spatial attention module-enhanced fully convolutional network and the global semantics using a convolutional neural network, respectively. Afterward, the two outputs are combined to produce \(R\) ordinal outputs \(\lbrace \mathbf {h_i}^r\rbrace _{r=1}^R\). For each segment \(\mathbf {h_i}^r\), the corresponding hash code can be obtained as follows:
\begin{equation} \begin{aligned}\mathbf {b}_i^r &= \arg \max _{\mathbf {\theta }} \mathbf {\theta }^T\mathbf {h}_i,\\ \text{s.t.}&\ \mathbf {\theta }\in \lbrace 0,1\rbrace ^L, \Vert \mathbf {\theta }\Vert _{1}=1. \end{aligned} \end{equation}
(19)
The full hash code can be obtained by concatenating \(\lbrace \mathbf {b}_i^r\rbrace _{r=1}^R\). DOH adopts an end-to-end ranking-to-hashing framework, which avoids using the undifferentiable sign function. Furthermore, it uses a relatively complex network that is able to handle large datasets with higher performance.

3.2.2 Likelihood Loss Minimization.

Deep Pairwise Supervised Hashing (DPSH) [101]. DPSH adopts CNN-F [23] as the backbone of the hashing network and the standard form of likelihood loss based on similarity information. Besides similarity information, quantization loss is also introduced to the final loss function, i.e.,
\begin{equation} \mathcal {L}_{DPSH}=-\sum _{(i,j)\in \mathcal {E}}\left(s_{i j}^o s_{ij}^h-\log \left(1+e^{s_{ij}^h}\right)\right) +\lambda _1 \sum _{i=1}^{N}||\mathbf {h}_{i}-sgn(\mathbf {h}_{i})||_{2}^{2}, \end{equation}
(20)
where \(s_{ij}^h=\tfrac{1}{2}\mathbf {h}_{i}^T\mathbf {h}_{j}\) and \(\mathbf {h}_{i}\) is the output of the hashing network. Although triplet loss was popular at that time, DPSH adopts the pairwise form to simultaneously learn deep features and hash codes, which improves both accuracy and efficiency. This likelihood loss function can easily introduce different Bayesian priors, making it flexible in applications and achieving better performance than different loss functions.
Deep Hashing Network (DHN) [200]. It has a similar likelihood loss function to DPSH. Differently, DHN considers the quantization loss as Bayesian prior and proposes a bimodal Laplacian prior for the output \(\mathbf {h}_i\), i.e.,
\begin{equation} p\left(\mathbf {h}_{i}\right)=\frac{1}{2 \epsilon } \exp \left(-\frac{\left\Vert \left|\mathbf {h}_{i}\right|-\mathbf {1}\right\Vert _{1}}{\epsilon }\right), \end{equation}
(21)
and the negative log likelihood (i.e., quantization loss) is
\begin{equation} \mathcal {L}_{Quan}=\sum _{i=1}^N |||\mathbf {h}_i-\mathbf {1}||_1, \end{equation}
(22)
which can be smoothed by a smooth surrogate [74] into
\begin{equation} \mathcal {L}_{Quan}=\sum _{i=1}^N \sum _{l=1}^L log (cosh(|h_{il}|-1)), \end{equation}
(23)
where \(\mathbf {h}_{ik}\) is the kth element of \(\mathbf {h}_i\). We notice that the DHN replaced \(\ell _2\) norm (ITQ quantization error [48]) by \(\ell _1\) norm. [200] also shows that the \(\ell _{1}\) norm is an upper bound of the \(\ell _{2}\) norm, and the \(\ell _{1}\) norm encourages sparsity and is easier to optimize.
HashNet [19]. As a variant of DHN, HashNet considers the imbalance training problem that the positive pairs are much more than the negative pairs. Hence, it adopts Weighted Maximum LikeLihood (WML) loss with different weights for each image pair. The weight is formulated as
\begin{equation} \begin{aligned}w_{i j}=c_{i j} \cdot \left\lbrace \begin{array}{ll}{|\mathcal {S}| /\left|\mathcal {S}_{1}\right|,} & {s_{i j}^o=1} \\ {|\mathcal {S}| /\left|\mathcal {S}_{0}\right|,} & {s_{i j}^o=0} \end{array}\right.\!\!\!, \end{aligned} \end{equation}
(24)
where \(\mathcal {S}_{1}=\lbrace (i,j)\in \mathcal {E}: s_{i j}^o=1\rbrace\) comprises similar image pairs, while \(\mathcal {S}_0 = \mathcal {E}/\mathcal {S}_1\) comprises dissimilar image pairs. \(c_{i j}=\tfrac{\mathbf {y}_{i} \cap \mathbf {y}_{j}}{\mathbf {y}_{i} \cup \mathbf {y}_{j}}\) for multi-label datasets and equals 1 for single-label datasets. Besides, the sigmoid function in condition probability is substituted by \(1/1+e^{-\alpha x}\) called adaptive sigmoid function, which equals to adding a hyper-parameter into the hash code similarity computation, i.e., \(s_{ij}^h=\alpha \mathbf {b}_i^T\mathbf {b}_j\). Different from other methods, HashNet continuously approximates sign function through the hyperbolic tangent function
\begin{equation} \lim _{\beta \rightarrow \infty } \tanh (\beta z)=\operatorname{sgn}(z). \end{equation}
(25)
The activation function for outputs is \(\tanh (\beta _t \cdot)\) through updating \(\beta _t\rightarrow \infty\) step-wise and the optimal network with \(\operatorname{sgn}(\cdot)\) can be derived. Besides, this operation can be illustrated using multi-stage pretraining, which means that the deep network using activation function \(tan(\beta _{t+1}\cdot)\) is initialized using the well-trained network using activation function \(tan(\beta _t\cdot)\). The two skills proposed by HashNet greatly increase the performance of deep supervised hashing.
Deep Priority Hashing (DPH) [20]. DPH also adds different weights to different image pairs, but reduces the weights of pairs with higher confidence, which is similar to AdaBoost [138]. The difficulty is measured by \(q_{ij}\), which indicates how difficult a pair is classified as similar when \(s_{ij}^o=1\) or classified as dissimilar when \(s_{ij}^o=0\). In formulation,
\begin{equation} \begin{aligned}q\left(s_{i j}^o | \mathbf {h}_{i}, \mathbf {h}_{j}\right) &=\left\lbrace \begin{array}{ll}{\frac{1+s_{ij}^h}{2},} & {s_{i j}^o=1} \\ {\frac{1-s_{ij}^h}{2},} & {s_{i j}^o=0} \end{array}\right.\\ &=\left(\frac{1+s_{ij}^h}{2}\right)^{s_{i j}^o}\left(\frac{1-s_{ij}^h}{2}\right)^{1-s_{i j^o}}\!\!. \end{aligned} \end{equation}
(26)
Besides, the weight characterizing class imbalance is measured by \(\alpha _{ij}\):
\begin{equation} \alpha _{i j}=\left\lbrace \begin{array}{l}\frac{\left|\mathcal {S}_{i}\right|\left|\mathcal {S}_{j}\right|}{\sqrt {\left|\mathcal {S}_{i}^{1}\right|\left|\mathcal {S}_{j}^{1}\right|}}, s_{i j}=1 \\ \frac{\left|\mathcal {S}_{i}\right|\left|\mathcal {S}_{j}\right|}{\sqrt {\left|\mathcal {S}_{i}^{0}\right|\left|\mathcal {S}_{j}^{0}\right|}}, s_{i j}=0 \end{array}\right.\!\!\!, \end{equation}
(27)
where \(\mathcal {S}_{i}=\lbrace (i,j)\in \mathcal {E}:\forall j\rbrace\), and
\begin{equation} \begin{aligned}&\mathcal {S}_{i}^{1}=\left\lbrace (i, j) \in \mathcal {E}: \forall j, s_{i j}^{o}=1\right\rbrace \\ &\mathcal {S}_{i}^{0}=\left\lbrace (i, j) \in \mathcal {E}: \forall j, s_{i j}^{o}=0\right\rbrace \!. \end{aligned} \end{equation}
(28)
The final priority weight is formulated as
\begin{equation} w_{i j}=\alpha _{i j}(1-q_{i j})^{\gamma }\!\!\!, \end{equation}
(29)
where \(\gamma\) is a hyper-parameter. With the priority cross-entropy loss, DPH down-weighs confident image pairs and prioritizes on difficult image pairs with low confidence. Similarly, priority quantization loss changes the weight for different images to be \(w_i^{\prime }=(1-q_i)\gamma\) and \(q_i\) measures how likely a continuous output can be perfectly quantized into a binary code. In this way, DPH achieved better performance than HashNet.
Deep Supervised Discrete Hashing (DSDH) [100]. Besides leveraging the pairwise similarity information, DSDH also takes advantage of label information by adding a linear regression loss with regularization to the loss function. By dropping the binary restrictions, the loss is formulated as
\begin{equation} \mathcal {L}_{DSDH}=-\sum _{(i,j)\in \mathcal {E}}\left(s_{i j}^o s_{ij}^h-\log \left(1+e^{s_{ij}^h}\right)\right) +\lambda _1 \sum _{i=1}^{N}||\mathbf {h}_{i}-sgn(\mathbf {h}_{i})||_{2}^{2} +\lambda _2||\mathbf {y}_i-\mathbf {W}^T\mathbf {b}_i|| +\lambda _3||\mathbf {W}||_F, \end{equation}
(30)
where \(s_{ij}^h=\tfrac{1}{2}\mathbf {h}_i^T\mathbf {h}_j\) and the label is encoded in one-hot format \(\mathbf {y}_i\). The second term in Equation (30) is the linear regression term and the last term is an \(\ell _2\) regularization. \(\lbrace \mathbf {h}_i\rbrace _{i=1}^N\), \(\lbrace \mathbf {b}_i\rbrace _{i=1}^N\), and \(\mathbf {W}\) are updated alternatively by using gradient descent method and discrete cyclic coordinate descend method. DSDH greatly increases the performance of image retrieval since it takes advantage of both label information and pairwise similarity information. It should be noted that in the linear regression term, the binary code is updated by discrete cyclic coordinate descend, so the constraint of discreteness is met.
Deep Cauchy Hashing (DCH) [15]. DCH is a Bayesian learning framework similar to DHN, but it replaced the sigmoid function with the function based on Cauchy distribution in the conditional probability. DCH aims at improving the search accuracy with Hamming distances smaller than radius 2. Probability on the basis of generalized sigmoid function could be extremely large when Hamming distances are greater than 2. This could be detrimental to current Hamming ball retrieval. DCH tackles this problem via incorporating the Cauchy distribution, since the probability drops rapidly if Hamming distances are greater than 2. The Cauchy distribution is formulated as
\begin{equation} \sigma \left(d_{ij}^h\right)=\frac{\gamma }{\gamma + d_{ij}^h}, \end{equation}
(31)
where \(\gamma\) is a hyper-parameter and \(d_{ij}^h\) is measured by the normalized Euclidean distance, i.e., \(d_{ij}^h= d(\mathbf {h}_i,\mathbf {h}_j)=\tfrac{L}{2}(1-cos(\mathbf {h}_i,\mathbf {h}_j)\). Besides, the prior is based on a variant of the Cauchy distribution, i.e.,
\begin{equation} P\left(\mathbf {h}_{i}\right)=\frac{\gamma }{\gamma +d\left(\left|\mathbf {h}_{i}\right|, \mathbf {1}\right)}. \end{equation}
(32)
The final loss function is formulated as the log-likelihood plus the quantization loss based on the prior weight. However, this loss function will get almost the same hash code for images with the same label. Even worse, the relationship for the dissimilar pairs is not considered.
Maximum-Margin Hamming Hashing (MMHH) [87]. In view of the shortcomings of DCH, MMHH utilizes the t-Distribution and contains different objective functions for similar and dissimilar pairs. The total loss is the weighted sum of two losses. Besides, a margin \(\zeta\) is utilized to avoid producing the exact same hash codes. The Cauchy distribution in DCH is replaced by
\begin{equation} \sigma \left(d_{i j}^{h}\right)=\left\lbrace \begin{array}{ll}\frac{1}{1+\max \left(0, d_{i j}^{h}-\zeta \right)}, s_{i j}^{o}=1 \\ \frac{1}{1+\max \left(\zeta , d_{i j}^{h}\right)}, \quad s_{i j}^{o}=0 \end{array}\right. \end{equation}
(33)
The loss function is the weighted log-likelihood of conditional probability, i.e.,
\begin{align} \mathcal {L}_{MMHH} &=\sum _{(i,j)\in \mathcal {E}} w_{i j}\left(s_{i j}^o\right) \log \left(1+\max \left(0, d_{ij}^h-\zeta \right)\right) \\ &\quad +\sum _{(i,j)\in \mathcal {E}} w_{i j}\left(1-s_{i j}^o\right) \log \left(1+\frac{1}{\max \left(\zeta , d_{ij}^h\right)}\right)\\ &\quad +\lambda _1\sum _{i=1}^N ||\mathbf {h}_i-sgn(\mathbf {h}_i)||_2^2 \end{align}
(34)
The last term is a standard quantization loss. MMHH also proposed a semi-batch optimization strategy to alleviate the imbalance problem. Specifically, the binary codes of the training data are stored as extra memory. The pairwise loss is calculated by the new codes computed in the current epoch and their similar and dissimilar pairs are added into the memory bank for a new epoch. In general, MMHH solves the shortcomings of DCH, which greatly improves search performance.
Deep Fisher Hashing (DFH) [103]. DFH points out that the pairwise loss minimization is similar to Fisher’s Linear discriminant, which maximizes the gaps between inter-class examples whilst minimizing the gaps between the intra-class examples. Its logistic loss function is similar to MMHH and the final loss function is formulated as
\begin{equation} \mathcal {L}_{DFH} =\sum _{(i,j)\in \mathcal {E}} s_{i j}^o \log \left(1+ e^{d_{ij}^h+\epsilon }\right) +\sum _{(i,j)\in \mathcal {E}} \left(1-s_{i j}^o\right) \log \left(1+e^{-d_{ij}^h+\epsilon }\right) +\lambda _1\sum _{i=1}^N ||\mathbf {h}_i-sgn(\mathbf {h}_i)||_2^2, \end{equation}
(35)
in which \(\epsilon\) is a margin parameter. Besides, the quantized center loss is added to the objective function, which not only minimizes intra-class distances but also maximizes inter-class distances between binary hash codes of each image.
Deep Asymmetric Pairwise Hashing (DAPH) [139]. Similar to ADSH, DAPH also adopted an asymmetric strategy. The difference is that DAPH uses two networks with different parameters for the database and queries. Besides, the bit independence, bit balance and quantization loss are added to the loss function following SH-BDNN. The loss function is optimized by updating the two neural networks alternatively.
Deep Attention-guided Hashing (DAgH) [182]. DAgH adopts a two-step framework similar to CNNH, while it utilizes neural networks to learn hash codes in both two steps. In the first step, the objective function is the combination of the log-likelihood loss and the difference loss with a margin. In the second step, DAgH utilizes binary point-wise cross-entropy for optimization. Besides, the backbone of DAgH includes a fully convolutional network with an attention module for obtaining accurate deep features.
Deep Joint Semantic-Embedding Hashing (DSEH) [99]. DSEH is the first work to introduce LabNet in deep supervised hashing. It also adopts a two-step framework with LabNet and ImgNet, respectively. LabNet is a neural network designed to capture abundant semantic correlation with image pairs, which can help to guide the hash code learning in the second step. \(\mathbf {f}_i\) denotes the label embedding produced from one-hot label \(\mathbf {y}_i\). LabNet replaces the input from images to their label and learns the hash codes from labels with a general hashing loss function. In the second step, ImgNet utilizes an asymmetric loss between the labeled features in the first step and the newly obtained features from ImageNet \(\mathbf {h}_j\), i.e. \(s_{ij}^h={\mathbf {f}_i}^T\mathbf {h}_j\) along with the binary cross-entropy loss similar to DAgH [182]. DSEH fully makes use of the label information from the perspectives of both pairwise loss and cross-entropy loss, which can help generate discriminative and similarity-preserving hash codes.
Asymmetric Deep Semantic Quantization (ADSQ) [181]. ADSQ increases the performance by utilizing two hashing networks and reducing the difference between the continuous network outputs and the desired hash codes, and the difference loss is also involved.
Deep Anchor Graph Hashing (DAGH) [26]. In the anchor graph, a minimal number of anchors are used to link the whole dataset, allowing for implicit computation of the similarities between distinct examples. At first, it samples a number of anchors and builds an anchor graph between training samples and anchors. Then, the loss function can be divided into two parts. The first part contains a typical pairwise likelihood loss and a linear regression loss. In the second part, the loss is calculated by the distances between training samples and anchors in the same class, and both deep features and binary codes are used to compute the distances. Besides a general pairwise likelihood loss and a linear regression loss, DAGH minimizes the distances between deep features of training samples and binary codes of anchors belonging to the same class. This method fully utilizes the remaining labeled data during mini-batch training and helps to obtain efficient binary hash codes.

3.3 Ranking-based Methods

In this section, we will review the category of deep supervised hashing algorithms that use the ranking to preserve the similarity structure. Specifically, these methods attempt to preserve the similarity relationships for over two examples that are calculated in the original and Hamming spaces. We further divide ranking-based methods into two groups:
Triplet methods. Due to the ease with which triplet-based similarities could be obtained, triplet ranking losses are popular in deep supervised hashing. These losses attempt to keep the rankings consistent in the Hamming space and the original space for each sampled triplet. For each triplet \((\mathbf {x}_i,\mathbf {x}_j,\mathbf {x}_k)\) with \(s_{ij}^o\gt s_{ik}^o\), they usually attempt to minimize a difference loss with margin [92], i.e.,
\begin{equation} \mathcal {L}_{Triplet}(\mathbf {h}_i,\mathbf {h}_j,\mathbf {h}_k)=max\left(0,m+d_{ij}^h-d_{ik}^h\right), \end{equation}
(36)
where \(m\) is a margin parameter. Subsequent works introduce the weights based on ranking for each triplet [197] or utilize the likelihood loss for preserving triplet ranking [169]. The triplet loss can also be combined with the pairwise loss above [189].
List-wise methods. This sub-class usually considers the rankings in the whole dataset rather than in sampled triplet. An example is to optimize ranking-based metrics, i.e., Average Precision and Normalized Discounted Cumulative Gain [59]. Other works utilize the mutual information [8] and matrix optimization [198] for optimizing the hash network from the view of whole datasets. These methods can release the bias during triplet sampling but usually suffer from poor efficiency.

3.3.1 Triplet Methods.

Deep Neural Network Hashing (DNNH) [92]. DNNH modifies the popular triplet ranking objective [130] to preserve the relative relationships of samples. To be more precise, given a triplet \((\mathbf {x}_i,\mathbf {x}_j,\mathbf {x}_k)\) with \(s_{ij}^o\gt s_{ik}^o\), the ranking loss with margin is formulated as
\begin{equation} \mathcal {L}_{DNNH}(\mathbf {h}_i,\mathbf {h}_j,\mathbf {h}_k)=max\left(0,1+d_{ij}^h-d_{ik}^h\right). \end{equation}
(37)
The loss encourages the binary code \(\mathbf {b}_j\) to be closer to the \(\mathbf {b}_i\) than \(\mathbf {b}_k\). By substituting the Euclidean distance for the Hamming distance, the loss function becomes convex, allowing for straightforward optimization:
\begin{equation} \mathcal {L}_{DNNH}(\mathbf {h}_i,\mathbf {h}_j,\mathbf {h}_k)=max(0,1+||\mathbf {h}_i-\mathbf {h}_j||_2^2-||\mathbf {h}_i-\mathbf {h}_k||_2^2). \end{equation}
(38)
Besides, DNNH introduces a sigmoid activation function along with a piece-wise threshold function, which encourage the continuous outputs to approach discrete codes. The piece-wise threshold function is defined as
\begin{equation} \begin{aligned}g(s)=\left\lbrace \begin{array}{lr}{0,} & {s\lt 0.5-\epsilon } \\ {s,} & {0.5-\epsilon \le s \le 0.5+\epsilon } \\ {1,} & {s\gt 0.5+\epsilon } \end{array}\right. \end{aligned}\!\!, \end{equation}
(39)
in which \(\epsilon\) denotes a positive hyper-parameter. It is evident that most elements of the outputs will be exact 0 or 1 by using this piece-wise threshold function, thus resulting in less quantization loss.
Deep Regularized Similarity Comparison Hashing (DRSCH) [189]. Besides the triplet loss, DRSCH also took advantage of pairwise information by introducing a difference loss as the regularization term. The bit weights are also included when calculating the distances in the Hamming space.
Deep Triplet Supervised Hashing (DTSH) [169]. DTSH replaces the ranking loss by the negative log triplet label likelihood as
\begin{equation} \mathcal {L}_{DTSH}(\mathbf {h}_i,\mathbf {h}_j,\mathbf {h}_k)=log\left(1+e^{s_{ij}^h-s_{ik}^h-m}\right)-\left(s_{ij}^h-s_{ik}^h-m\right), \end{equation}
(40)
which considers the conditional probability [100], and \(m\) is a margin parameter.
Deep Semantic Ranking-based Hashing (DSRH) [197]. DSRH leverages a surrogate loss based on triplet loss. Given query \({q}\) and database \(\lbrace \mathbf {x}_i\rbrace _{i=1}^N\), the rankings \(\lbrace r_i\rbrace _{i=1}^N\) in database is defined as the number of labels shared with the query. The ranking loss is defined in a triplet form
\begin{equation} \mathcal {L}_{DSRH}=\sum _{i=1}^N\sum _{j:r_j\lt r_i} w(r_i,r_j)\delta max\left(0,\epsilon +d_{qi}^h-d_{qj}^h\right), \end{equation}
(41)
where \(\delta\) and \(\epsilon\) are two hyper-parameters, and \(w(r_i,r_j)\) is the weight for each triplet:
\begin{equation} \omega (r_{i}, r_{j})=\frac{2^{r_{i}}-2^{r_{j}}}{Z}. \end{equation}
(42)
The form of weights comes from Normalized Discounted Cumulative Gains [78] score, and \(Z\) is a normalization constant, which can be omitted. Besides, the bit balance loss and weight regularization are added to the loss function. DSRH improves deep hashing by the surrogate loss, especially on multi-label image datasets.

3.3.2 Listwise Methods.

Hashing as Tie-Aware Learning to Rank (HALR) [59]. HALR explicitly optimizes popular ranking-based assessment metrics including average precision and normalized discounted cumulative gain, which improves the retrieval performance based on ranking. It is noticed that tied ranks may occur due to integer-valued Hamming distance. Hence, HALR introduces a tie-aware formulation of these metrics and trains the hashing network using their continuous relaxations for effective optimization.
Hashing with Mutual Information (HashMI) [8]. HashMI follows the idea of minimizing neighborhood ambiguity and derives a loss term based on mutual information, which is sufficiently connected to the aforementioned ranking-based assessment metrics. Given an image \(\mathbf {x}_i\), the random variable \(\mathcal {V}_{i,\Phi }\) is defined as a mapping from \(\mathbf {x}_j\) to \(d_{ij}^h\), where \(\Phi\) is the hashing network. \(\mathcal {C}_i\) is the set of images that share the same label with \(\mathbf {x}_i\), i.e., the neighbor of \(\mathbf {x}_i\). The mutual information is defined as
\begin{equation} \mathcal {I}_{HashMI}(\mathcal {V}_{i,\Phi };\mathcal {C}_{i})= H(\mathcal {C}_{i})-H(\mathcal {C}_{i}|\mathcal {V}_{i,\Phi }). \end{equation}
(43)
The mutual information is incorporated over the deep feature space for any hashing network \(\Phi\), such that a measurement of the quality is obtained which desires to be maximized
\begin{equation} \mathcal {O}=-\int _\Omega \mathcal {I}(\mathcal {V}_{i,\Phi };\mathcal {C}_{i})p_id\mathbf {x}_i, \end{equation}
(44)
where \(\Omega\) is the sample space and \(p_i\) denotes the prior distribution, which can be removed. After discretion, the loss function turns into:
\begin{equation} \mathcal {L}_{HashMI}= -\sum _{i=1}^N \mathcal {I}(\mathcal {V}_{i,\Phi };\mathcal {C}_{i}), \end{equation}
(45)
whose gradient can be calculated by relaxing the binary constraint and effective minibatch back propagation. The minibatch back propagation is able to effectively retrieve one example against the other example within a minibatch cyclically similar to leave-one-out validation.
Angular Deep Supervised Hashing (AnDSH) [198]. AnDSH calculates the Hamming distance between images of different classes to form an upper triangular matrix with size \(K\) by \(K\), where \(K\) is the number of categorizations. The mean of Hamming distance matrices is maximized, while the variance of the matrices is minimized to make sure that all elements in the matrix could be covered by these hash codes and from the view of bucket theory there is no weakness, i.e., achieving bit balance. Besides, this method utilizes classification loss similar to PCDH but replaces the softmax loss by A-softmax objective [115] that could obtain potentially larger inter-class variation along with larger inter-class separation.

3.4 Pointwise Methods

In this section, we review pointwise methods that directly take advantage of label information instead of similarity information. Early methods usually add a classification layer to map the hash-like representations into label distributions [76, 106, 124, 180, 193, 194, 195]. Then, the hash codes are enhanced with the standard classification loss in label space. Further works include the probabilistic models for better binary optimization [144]. Recent methods usually build the classification loss in the Hamming space instead. Specifically, they will generate some central hash codes,2 each of which is associated with a class label. These methods enforce the network outputs to approach their corresponding hash centers with different loss terms, i.e., binary cross-entropy [185], difference loss [25], polarization loss [41], softmax loss [64], and partial softmax loss [158]. These hash centers are mostly produced by Hadamard matrix [25, 64, 185], random sampling [64, 185] as well as adaptive optimization [158], which achieves better performance compare with the former two manners.
Deep Binary Hashing (DBH) [106]. After pre-training of a convolution neural network on the ImageNet, DBH adds a latent layer with sigmoid activation, where the neurons are utilized to learn hash-like representations while fine-tuning with classification loss on the target dataset. The outputs of the latent layer are discretized into binary hash codes. DBH also emphasizes that the obtained hash codes are for coarse-level search because the quality of hash codes is limited.
Supervised Semantics-preserving Deep Hashing (SSDpH) [180]. SSDpH utilizes a similar architecture to DBH and adds the quantization loss and the bit balance loss for regularization. In this way, SSDpH can produce high-quality hash codes for better retrieval performance.
Very Deep Supervised Hashing (VDSH) [195]. VDSH builds a very deep hashing network and trains the network with an efficient strategy layer-wise motivated by alternating direction method of multipliers (ADMM) [5]. In virtue of the strong representation ability of the deeper neural network, VDSH can produce better hash codes for effective image retrieval.
SUBIC [76]. SUBIC generates structured binary hash codes consisting of the concatenation of several one-hot encoded vectors (i.e., blocks) and obtains each one-hot encoded vector with several softmax functions (i.e., block softmax). Besides classification loss and bit balance regularization, SUBIC utilizes the mean entropy for quantization loss for each block. SUBIC can also be applied to a range of downstream search tasks including instance retrieval and image classification.
Just Maximizing Likelihood Hashing (PMLR) [144]. PMLR integrates two dense layers above the top of the hashing network. It utilizes the probability models to parameterize the hashing network for binary constraints. Then, PMLR utilizes a classification loss along with a regularization term for better hash code distributions in Hamming space.
Central Similarity Quantization (CSQ) [185]. CSQ also utilizes a classification model but in a different way. First, CSQ generates some central hash codes by the properties of a Hadamard matrix or random sampling from Bernoulli distributions, such that the distance between each pair of centroids is large enough. Each label is corresponding to a centroid in the Hamming space and thus each image has its corresponding semantic hash center according to its label. Afterward, the model is trained by the central similarity loss (i.e., binary cross-entropy) with the supervised label information as well as the quantization loss. In formulation,
\begin{equation} \mathcal {L}_{CSQ}=\sum _{i=1}^{N} \sum _{l=1}^L [\mathbf {c}_{i,l}\log \mathbf {h}_{i,l}+(1-\mathbf {c}_{i,l})\log (1-\mathbf {h}_{i,l})] + \lambda _1 \sum _{i=1}^N (|| |\mathbf {h}_i-\mathbf {1}|-1||_1), \end{equation}
(46)
where \(\mathbf {c}_i \in \lbrace 0,1\rbrace ^L\) is the hash center generated from labels and \(\mathbf {h}_i\in (0,1)^L\) is the output of the hashing network. It is evident that CSQ directly enforces the generated hash codes to approach the corresponding centroids with some relaxations. The core of CSQ is to map the semantic labels into Hamming space to guide hash code learning directly. Thus, samples with comparable labels are converted into similar hash codes, maintaining the global similarities between image pairs and then resulting in effective hash codes for image retrieval.
Hadamard Codebook-based Deep Hashing (HCDH) [25]. HCDH also utilizes the Hadamard matrix by minimizing the \(\ell _2\) difference between hash-like outputs and the target hash codes with their corresponding labels (i.e., Hadamard loss). Different from CSQ, HCDH trains the classification loss and Hadamard loss simultaneously. Hadamard loss can be interpreted as learning the hash centers guided by their supervised labels in \(L_2\) norm. Note that HCDH is able to yield discriminative and balanced binary codes for the property of the Hadamard codebook.
Deep Polarized Network (DPN) [41]. DPN combines metric learning framework with learning to hash and develops a novel polarization loss which minimizes the distance between hash centers and hashing network outputs. In formulation,
\begin{equation} \mathcal {L}_{DPN}=\sum _{i=1}^{N} \sum _{l=1}^L \max \left(\epsilon -\mathbf {h}_{il} \cdot \mathbf {c}_{il}, 0\right), \end{equation}
(47)
where \(\mathbf {c}_i \in \lbrace 0,1\rbrace ^L\) is the hash center and \(\mathbf {h}_i\in (-1,1)^L\) is the output of the hashing network. Different from CSQ, the hash centers can be updated after a few epochs. It has been proved that minimizing polarization loss can simultaneously minimize inter-class and maximize intra-class Hamming distances theoretically. In this way, the hash codes can be easily derived for effective image retrieval.
OrthHash [64]. OrthHash is a one-loss model that gets rid of the hassles of tuning the balance coefficients of various losses. Similar to CSQ, OrthHash generates hash centers using Bernoulli distributions. Then, it maximizes the cosine similarity between the hashing network outputs and their corresponding hash centers. In formulation,
\begin{equation} \mathcal {L}_{OrthHash}=- \sum _{i=1}^{N} \log \frac{\exp \left(\mathbf {c}_i^{\top } \mathbf {h}_{i}\right)}{\sum _{\mathbf {c}\in \mathcal {C}} \exp \left(\mathbf {c}^{\top } \mathbf {h}_{i}\right)}\!, \end{equation}
(48)
where \(\mathcal {C}\) denotes the set of all hash centers. Compared with CSQ and DPN, OrthHash not only compare the network outputs and corresponding hash centers, but also considers the other hash centers of different labels. In this way, OrthHash improves the discriminativeness of hash codes. Moreover, since Hamming distance is equivalent to cosine distance for hash codes, OrthHash can promise quantization error minimization. With a single classification objective, it realizes the end-to-end training of deep hashing with promising performance.
Partial-Softmax Loss based Deep Hashing (PSLDH) [158]. PSLDH generates a semantic-preserving hash center for each label instead of using Hadamard matrix or random sampling [185]. Specifically, it not also minimizes the inner product of each hash center pair, but also maximizes the information of each hash bit with a bit balance loss term. Moreover, PSLDH trains the hashing network with a partial-softmax loss, which compares the network outputs with both their corresponding hash centers and other centers of partial categories in the datasets. Let \(\mathbf {c}^j\) denote the hash center associated with the \(j\)th category. The loss is formulated as
\begin{equation} \mathcal {L}_{PSLDH}=\sum _{i=1}^{N} \sum _{j \in \Gamma _{i}}-\log \frac{\exp \left(\eta \left(\mathbf {h}_{i}^{T} \mathbf {c}^{j}-\mu L\right)\right)}{\exp \left(\eta \left(\mathbf {h}_{i}^{T} \mathbf {c}^{j}-\mu L\right)\right)+\sum _{q \in \Psi _{i}} \exp \left(\eta \mathbf {h}_{i}^{T} \mathbf {c}^{q}\right)}, \end{equation}
(49)
where \(\Gamma _{i}\) denotes the index set of categories associated with \(x_i\), and \(\Psi _{i}\) denotes the index set of categories unassociated with \(x_i\).

3.5 Quantization

The quantization techniques have been presented to be derivable from our aforementioned difference loss minimization in Section 3.2 [165]. From a statistical standpoint, the quantization error could bound the distance reconstruction error [79]. Consequently, quantization could be utilized for deep supervised hashing. These methods usually leverage deep neural networks to generate deep features and then adopt product quantization approaches for subsequent quantization. Hence, they optimize the deep features with pairwise difference loss [17], pairwise likelihood loss [39], and triplet loss [110] for better retrieval performance. Further works combine label semantic information for discriminative deep features [16]. Recent works [39, 88] integrate deep neural networks into the process of product quantization rather than feature generation and achieve better performance. Other than product quantization, composite quantization can also be enhanced by deep learning [24]. Then, we will review the typical deep supervised hashing methods based on quantization.
Deep Quantization Network (DQN) [17]. DQN generates hash code \(b_i\) from the obtained representation \(z_i\in \mathbb {R}^D\) with semantics preserved using the product quantization method. First, it decomposes the feature space into the target space, i.e., a Cartesian product of \(M\) low-dimensional subspaces, and each subspace is quantized into \(T\) codewords via clustering. More precisely, the original feature is partitioned into \(M\) sub-vectors i.e., \(\mathbf {z}_i=[\mathbf {z}_{i1};\dots ;\mathbf {z}_{iM}], i=1,\ldots ,N\) and \(\mathbf {z}_{im}\in \mathbb {R}^{D/M}\) is the sub-vector of \(\mathbf {z}_i\) in the \(m\)th subspace. Thus, all sub-vectors in each subspace are quantized into \(T\) codewords using K-means without mutual influences. The total loss is defined as follows:
\begin{equation} \begin{gathered}\mathcal {L}_{D Q N}=\sum _{i, j} \left(s_{i j}^{o}-\cos (z_{i}, z_{j})\right)^2+\lambda _{1} \sum _{m=1}^{M} \sum _{i=1}^{N}\left\Vert z_{i m}-C_{m} b_{i m}\right\Vert _{2}^{2}, \\ s.t.\ \left\Vert b_{i m}\right\Vert _{0}=1, b_{i m} \in \lbrace 0,1\rbrace ^{T}, \end{gathered} \end{equation}
(50)
where \(cos(\cdot)\) denotes the cosine similarity metric and \(\mathbf {C}_m=[\mathbf {c}_{m1},\ldots ,\mathbf {c}_{mT}]\) represents \(T\) codewords of the \(m\)th subspace, and \(\mathbf {b}_{im}\) is the one-hot embedding to guide which codeword in \(\mathbf {C}_m\) should be used to approach the \(i\)th point \(\mathbf {z}_{im}\). Mathematically, the second term, i.e., product quantization can be reformulated as
\begin{equation} \sum _{i=1}^N||\mathbf {z}_i-\mathbf {C}\mathbf {b}_i||_2^2, \end{equation}
(51)
where \(\mathbf {C}\) is a \(D\times MT\) matrix can be written as \(\mathbf {C}=diag(\mathbf {C}_1,\ldots ,\mathbf {C}_M)\). Note that the quantization loss of converting the feature \(\mathbf {z}_i\) into binary code \(\mathbf {b}_i\) can be restricted via minimizing \(Q\). Besides, quantization-based hashing also adds pairwise similarity preserving loss to the final loss function. Finally, Asymmetric Quantizer Distance (AQD) is widely used for approximate nearest neighbor search, which is formulated as
\begin{equation} AQD(\mathbf {q},\mathbf {x}_i)=\sum _{m=1}^M||\mathbf {z}_{qm}-\mathbf {C}_m\mathbf {b}_{im}||_2^2, \end{equation}
(52)
where \(\mathbf {z}_{qm}\) is the \(m\)th sub-vector for the feature of query \(\mathbf {q}\).
Deep Triplet Quantization (DTQ) [110]. DTQ uses a triplet loss to preserve the similarity information and a smooth orthogonality regularization is added to the codebooks, which are similar to the bit independence. Let \(\mathcal {T}\) denote the set of all triplet. Each triplet \((\mathbf {x}_i,\mathbf {x}_j,\mathbf {x}_k)\) satisfies \(s_{ij}^o\gt s_{ik}^o\). The total loss function is as follows:
\begin{equation} \begin{aligned}\mathcal {L}_{DTQ} =& \sum _{(\mathbf {x}_i,\mathbf {x}_j,\mathbf {x}_k)\in \mathcal {T}} (\max (0, \epsilon +||\mathbf {z}_i- \mathbf {z}_j ||_2^2-||\mathbf {z}_i- \mathbf {z}_k ||_2^2)+\lambda _{1} \sum _{m=1}^{M} \sum _{i=1}^{N}\left\Vert z_{i m}-C_{m} b_{i m}\right\Vert _{2}^{2} \\ &+\lambda _2 \sum _{m=1}^M\sum _{m^{\prime }=1}^M||\mathbf {C}_m^T\mathbf {C}_{m^{\prime }}-\mathbf {I}||^2. \end{aligned} \end{equation}
(53)
The last term is the orthogonality penalty term. In addition, DTQ selects triplets by Group Hard to make sure that the number of explored valid triplets is suitable for optimization. Specifically, the training data are split into various groups, and a hard (i.e., with positive triplet loss) negative example is picked randomly as an anchor-positive image pair from every group.
Deep Visual-semantic Quantization (DVsQ) [16]. DVsQ optimizes the quantization network using labeled image samples along with the semantic messages from their latent text domains. Specifically, by using the image representations \(\mathbf {z}_i\) from the pre-trained network, it produces deep visual-semantic representations. They are then trained to forecast the word embeddings \(\mathbf {v}\) (i.e., \(\mathbf {v}_i\) for label \(i\)), which are further estimated by a skip-gram model. The loss function includes the adaptive margin ranking loss and a quantization loss:
\begin{equation} \mathcal {L}_{DVsQ}=\sum _{i=1}^N\sum _{j\in \mathbf {y}_i}\sum _{k\notin \mathbf {y}_i}max(0,\delta _{jk}-cos(\mathbf {v}_j,\mathbf {z}_i)+cos(\mathbf {v}_k,\mathbf {z}_i)) + \lambda _1 \sum _{i=1}^N\sum _{j=1}^{|\mathbf {y}|}||\mathbf {v}_j^T (\mathbf {z}_i-\mathbf {C}\mathbf {b}_i)||_2^2, \end{equation}
(54)
where \(\mathbf {y}_i\) is the label set of the \(i\)th image, and \(\delta _{jk}\) is an adaptive margin and the quantization loss is inspired by the maximum inner-product search. DVsQ adopts the same strategy as LabNet and combines the visual information and semantic quantization in a uniform framework instead of a two-step approach. By this means, DVsQ greatly improves the retrieval performance.
Deep Product Quantization (DPQ) [88]. DPQ leverages both the powerful capacity of product quantization (PQ) and the end-to-end learning ability of deep learning to optimize the clustering results of product quantization through classification tasks. Specifically, for each input \(\mathbf {x}_i\), it first uses an embedding layer and an MLP to obtain the deep representation \(\mathbf {z}_i \in \mathbb {R}^{MF}\). Then, the representation is sliced into \(M\) sub-vectors with \(z_{i,m} \in \mathbb {R}^F\) similar to PQ. Different from DQN, an MLP is used to turn each sub-vector into a probabilistic vector with \(T\) elements \(p_m(t), t=1,\ldots ,T\) by softmax function. The matrix \(\mathbf {C}_m\in \mathbb {R}^{T\times D}\) denotes the \(T\) centroids. \(p_m(k)\) denotes the probability that the \(m\)th sub-vector is quantized by the \(t\)th row of \(\mathbf {C}_m\). The soft representation of the \(m\)th sub-vector is calculated by combining the row vectors of \(\mathbf {C}_m\).
\begin{equation} soft_m=\sum _{t=1}^Tp_m(t)\mathbf {C}_m(t). \end{equation}
(55)
Considering the probability \(p_m(k)\) in one-hot format, given \(t^*=argmax_t p_m(t)\), the hard probability is denoted as \(e_m(t)=\delta _{tt*}\) in one-hot format and we have
\begin{equation} hard_m=\sum _{t=1}^Te_m(t)\mathbf {C}_m(t). \end{equation}
(56)
The obtained sub-vectors of soft and hard representations are then concatenated to produce the ultimate representations, i.e., \(\text{soft}=[\text{soft}_1, \ldots , \text{soft}_M]\) and \(\text{hard}=[\text{hard}_1, \ldots , \text{hard}_M] \in \mathbb {R}^{MD}\). Each representation is followed by a fully-connected classification layer. Besides two classification losses, the joint central loss is also added by first learning the center vector for each categorization and minimizing the distances between deep features. It is noticed that both the soft and hard representations come from the same centers in DPQ, which encourages both representations to approach the centers, reducing the disparity between the soft and hard representations. This helps to improve the discriminative power of the features and to contribute to the retrieval performance. Gini batch loss and Gini sample loss are also introduced for the class balance and encourage the two representations of the same image to be closer. Overall, DPQ replaces the k-means process in PQ and DQN technique with deep learning combined with a classification model and is able to create compressed representations for fast classification and fast image retrieval.
Deep Spherical Quantization (DSQ) [39]. DSQ first uses the deep neural network to obtain the \(\ell _{2}\) normalized features and then quantizes these features on a unit hypersphere with an elaborate quantization manner. After constraining the continuous representations to staying on a unit hypersphere, DSQ attempts to reduce the reconstruction loss using multi-codebook quantization (MCQ). Different from PQ, MCQ draws near the representation vectors with the summation of multiple codewords instead of the concatenation. \(\hat{\mathbf {y}}_i\) denotes the predicted label distribution. \(\phi _{y_i}\) denotes the feature center of the \(y_i\)th class. The overall loss for training the model is as follows:
\begin{equation} \begin{aligned}\mathcal {L}_{DSQ}=& \sum _{i=1}^N -\log \mathbf {y}_i^T log \hat{\mathbf {y}}_i+\lambda _1 \sum _{i=1}^N||\mathbf {z}_{i}-[\mathbf {C}_1,\ldots ,\mathbf {C}_M]\mathbf {b}_{i}||_2^2\\ &+ \lambda _2 \sum _{i=1}^N ||\mathbf {z}_i - \phi _{y_i} ||_2^2 + \lambda _3 \sum _{i=1}^N ||\phi _{y_i}-\mathbf {C}\mathbf {b}_i||_2^2\\ & s.t.\ ||\mathbf {b}_{im}||_0=1, \mathbf {b}_i \in \lbrace 0,1\rbrace ^K, \mathbf {b}_i=[\mathbf {b}_{i,1}^T,\ldots ,\mathbf {b}_{i,M}^T]^T\!, \end{aligned} \end{equation}
(57)
where the first, second, third and last term is the softmax loss, quantization loss, the center loss and the discriminative loss, respectively. The last two losses encourage both the quantized vectors and deep features to approach their centers, respectively.
Similarity Preserving Deep Asymmetric Quantization (DPDAQ) [24]. DPDAQ adopts Asymmetric Quantizer Distance to approach the desired similarity metric, which is similar to ADSH. Moreover, it uses composite quantization instead of product quantization and the representations in the training set come from the deep neural network in an unquantized form. SPDAQ also takes advantage of similarity information and label information to achieve better retrieval performance.

3.6 Other Techniques for Deep Hashing

3.6.1 Hashing with Generative Adversarial Networks.

Generative Adversarial Networks (GANs) [49] are popular neural network models to generate virtual examples without needing supervised knowledge. There are also several hashing methods leveraging GANs to enhance the performance.
Deep Semantic Hashing with GAN (DSH-GAN) [133]. DSH-GAN is the first hashing method that takes advantage of GANs for image retrieval. It typically includes four components, i.e., a neural network to produce image representations, an adversarial discriminator for differentiating between synthetic images and real images, a hashing network for projecting representations into binary codes and a classification head. Specifically, the generator network attempts to generate synthetic images after concatenating the label embedding as well as generated noise embedding. The discriminator attempts to jointly differentiate between real samples and synthetic ones and categorize the inputs into proper semantic labels. Finally, the overall framework is optimized using the adversarial loss to mix two sources and the classification loss to obtain the ground truth labels using a classic minimax mechanism. The input of the network is image triplets, each of which contains three images. The first one is a real image treated as a query, the second one is a synthetic image created with the same label as the query image by the generator network, and the third one is a synthetic image with different semantics. GAN provides a hashing model with strong generalization potential from the maintaining of semantics and similarity, which improves the quality of hash codes.
HashGAN [14]. HashGAN augments the training data with images synthesized by pair conditional Wasserstein GAN (WGAN) inspired by [54], which sufficiently explores the pairwise semantic relationships. In this module, the training samples along with the pairwise similarities are considered as inputs and a generator and a discriminator is trained simultaneously by adding the pairwise similarity besides the loss function of WGAN. The hash encoder produces high-quality binary codes for all occurred pictures using a likelihood objective similar to HashNet. HashGAN is also capable of coping with the dataset without class labels but with pairwise similarity information.

3.6.2 Ensemble Learning.

Guo et al. [55] point out that for the current deep supervised hashing model, simply increasing the length of the hash code with a single hashing model cannot significantly enhance the performance. The potential cause is that the loss functions adopted by existing methods are prone to produce highly correlated and redundant hash codes. Inspired by this, several methods attempt to leverage ensemble learning to increase the retrieval performance with more hash bits.
Ensemble-based Deep Supervised Hashing (EbDSH) [55]. EbDSH leverages an ensemble learning strategy for better retrieval performance. Specifically, it trains a number of deep hashing models with different training datasets, training data, initialization, and networks, then concatenates them into the final hash codes. It is noticed that the ensemble strategy is suitable for parallelization and incremental learning.
Weighted Multi-deep Ranking Supervised hashing (WMRSH) [98]. WMRSH attempts to generate a high-quality hash function using multiple hash tables derived from the hashing networks. To be specific, WMRSH adds bit-wise weights and table-wise weights for each bit in each hash table. For each bit in a table, the similarity preservation is measured by product loss. Afterward, the bit independence is measured by the correlation between two bits. Finally, the table-wise weight can be derived from the mean average precision for every hash table. The final weight is the product of the three above terms for the final hash codes (i.e., the concatenation of the hash tables with weights). A similar strategy called Hash Boosting has been introduced in [117].
Apart from these methods, NMLayer [43] balances the importance of each bit and merges the redundant bits together to learn more compact hash codes.

3.6.3 Training Strategy for Deep Hashing.

In this subsection, we will introduce two methods that adopt different training strategies from most other methods.
Greedy Hash [156]. Greedy Hash adopts a greedy algorithm for fast processing of hashing discrete optimization by introducing a hash layer with a sign function instead of the quantization error. To overcome the ill-posed gradient problem [177], the gradients are transmitted entirely to the front layer, which effectively prevents the vanishing gradients of the sign function and updates all bits together. This strategy is also adopted in recent works [134].
Gradient Attention deep Hashing (GAH) [72]. This work points out a dilemma in learning deep hashing models through gradient descent that it makes no difference to the loss if the paired hash codes change their signs together. As a result, GAH generates attention on the derivatives of each hash bit for each image by maximizing the decrease of loss during optimization. It leverages a gradient attention network with two fully-connected layers to produce normalized weights and then applies them to the derivatives in the last layer. In conclusion, this model optimizes the training process by adopting a gradient attention network for acceleration.

4 Deep Unsupervised Hashing

4.1 Overview

Recently, unsupervised hashing methods have received widespread attention due to their sufficient leverage of the unlabeled data, which facilities the practical applications in the real world. Since deep unsupervised methods can not acquire label information, the semantic information is obtained in deep feature space with pre-trained networks. With semantic information, the problem can be converted into a supervised problem. However, how to infer semantics information and how to utilize semantics information for learning hash codes are two key problems here. According to semantics learning manners, the unsupervised methods can be mainly classified into three categories, i.e., similarity reconstruction-based methods, pseudo-label-based methods, and prediction-free self-supervised learning-based methods. Similarity reconstruction-based methods usually generate pairwise semantic information, and then leverage pairwise semantic preserving techniques in Section 3.2 for hash code learning. Pseudo-label-based methods usually produce pointwise pseudo-labels for inputs and then leverage pointwise semantic preserving techniques in Section 3.4 for hash code learning. Lastly, prediction-free self-supervised learning-based methods leverage data itself for training without generating explicit semantic information, i.e., similarity signals and pseudo-labels. Specifically, they usually utilize regularization terms, auto-encoder models, generative models, and contrastive learning to produce high-quality hash codes. The regularization terms include bit balance loss term, bit independence loss term and a transformation-invariant regularization term. Several approaches may combine different kinds of semantics learning manners. The optimization of binarization is still an important problem for deep unsupervised hashing. Most of the methods use \(tanh(\cdot)\) to approximate \(sign(\cdot)\) and generate approximate hash codes by the hashing network for optimization. The summary of these algorithms is shown in Table 3. Then, we elaborate on these classes as below.
Table 3.
ApproachSimilarity InformationPseudo-LabelBinarizationOther skills
SSDH [177]Local Dist.-Tanh-
DistillHash [177]Local Dist. + Neighbour Information-Tanh-
SADH [141]Network Output + Adjacent Matrix-Drop + Alternation-
MLS\(^3\)RUDH [159]Local Dist. + Manifold Dist.-Tanh-
TBH [145]Hash Codes-Bottleneck Reg.AE
GLC [120]Local Dist. + K-Means-Tanh-
MBE [104]Local Dist.--Bit Bal. with Bi-Half Layer
CIMON [122]Local Dist. + Spectral Clu. + Conf.-TanhContrastive Learning
DATE [121]Local Dist. + Distribution Dist. + Conf.-TanhContrastive Learning
PLUDDH [67]-Kmeans + Cla. LayerTanh + Quan. Loss-
DAVR [69]-Deep Clu. + Triplet LossDrop-
CUDH [51]-Deep Embedding Clu.Tanh + Quan. Loss-
DVB [143]Adjacent MatrixClu.Quan. LossVAE + Bit Indep.
UDHPL [186]-Kmeans + PCA + MI--
DU3H [191]Local Dist. + Conf.Kmeans + Hash CenterTanhGCN
UDMSH [132]Local Dist. + Conf.-Quan. Loss-
DSAH [108]Updated Local Dist. + Conf.-Quan. Loss-
UDKH [37]-Hash Code + Deep Clu.Alternative-
BDNN [36]--DropAE
DH [40]--Quan. LossBit Bal. + Ind.
DeepBit [105]--Quan. LossBit Bal. + Trans. Reg.
UTH [105]--Quan. LossBit Bal. + Triplet Trans. Reg.
BGAN [152]Local Dist.-Tanh + ContinuationGAN
BinGAN [201]--Quan. LossGAN + Bit independence
HashGAN [45]--Quan. LossBit Bal. + Ind. + Trans. Reg. + GAN
SGH [31]---VAE
CIBHash [134]--DropContrastive Learning
SPQ [77]--QuantizationCross Contrastive Learning
HashSIM [119]Local Dist.-TanhBit Contrastive Learning
Table 3. A Summary of Deep Unsupervised Hashing Approaches w.r.t the Manner of Generating Similarity Information, Generating and Handling the Pseudo-Label, Binarization, as well as Other Skills
Drop = Drop the sign operator in the neural network and treat the binary code as an approximation of the network output, Reg. = Regression, Quan. = Quantization, Dist. = Distance, Conf. = Confidence, Trans. = Transformation, Ind. = Independence, Bal. = Balance., and Cla. = Classification, Clu. = Clustering.

4.2 Similarity Reconstruction-based Methods

Similarity reconstruction-based methods aim at leveraging pairwise methods to solve the problem. However, the similarity information is unavailable without label annotation. Hence, these methods utilize a two-step framework as shown in Figure 3. Firstly, they extract deep representations \(\mathbf {z}_i\) using the pre-trained neural network and then infer the similarity information \(\lbrace s_{ij}^o\rbrace _{(i,j)\in \mathcal {E}}\) by distance metrics in deep feature space. Secondly, a hashing network is trained to create similarity-preserving binary codes by leveraging the reconstructed similarity structure as guidance. With similarity information, the problem can be solved with pairwise supervised methods. The key to this kind of methods is how to generate accurate similarity information. Early methods usually truncate pairwise distances in deep feature space [177]. Further studies utilizes the neighbourhood information [120, 122, 179], confidence degree [122], and other similarity matrices [121, 159] to obtain a precise similarity structure for reliable guidance of subsequent optimization. Recently, a few researchers argue that static similarity structure from the pre-trained network is not optimal and propose to update it based on obtained hash codes [108, 141, 145]. Next, we revise these methods in detail.
Fig. 3.
Fig. 3. Basic Framework of Deep Unsupervised Hashing with Similarity Reconstruction. The deep features are extracted by a pre-trained network for similarity reconstruction. More details will be discussed in Section 4.2.
Semantic Structure-based Unsupervised Deep Hashing (SSDH) [177]. SSDH is the first study along this line, which applies VGG-F model to extract deep features and perform hash code learning. It studies the cosine distance for each pair in deep feature space, and finds that the distribution of cosine distances can be approximated by two half Gaussian distributions. Hence, through parameter estimation, SSDH sets two distance threshold \(d_l\) and \(d_r\) and construct a similarity structure as follows:
\begin{equation} s_{ij}^o=\left\lbrace \begin{array}{ll} 1, & \text{if}\quad d(\mathbf {z}_i, \mathbf {z}_j) \le d_{l} \\ 0, & \text{if}\quad d_{l}\lt d(\mathbf {z}_i, \mathbf {z}_j)\lt d_{r} \\ -1, & \text{if}\quad d(\mathbf {z}_i, \mathbf {z}_j) \ge d_{r} \end{array}\right.\!\!\!, \end{equation}
(58)
where \(d(\cdot , \cdot)\) denotes the cosine distance of two vectors. From Equation (58), SSDH considers sample pairs with distance smaller than \(d_l\) as semantically similar while considers sample pairs with distances large than \(d_r\) as semantically dissimilar. Similar to SH-BDNN, a similarity difference loss is adopted as follows:
\begin{equation} \mathcal {L}_{SSDH}=\sum _{i=1}^{N} \sum _{j=1}^{N}\left|s_{ij}^o\right|\left(s_{ij}^h -s_{ij}^o\right)^{2}\!\!, \end{equation}
(59)
where \(s_{ij}^h = \mathbf {h}_i^T\mathbf {h}_j/L\), \(\mathbf {h}_i\) denotes the output of the deep network with activation function \(tanh(\cdot)\). The activation function \(sign(\cdot)\) is utilized instead during evaluation. However, the performance of SSDH is limited due to two issues. On one hand, its similarity structure is typically unreliable using two coarse thresholds. On the other hand, it discards a range of signals in similarity structure.
DistillHash [179]. DistillHash leverages the similarity signals from local structures to distill similarity signals. Specifically, for each pair of images, it studies the similarities of their neighbors and then removes the similarity signal if it has huge variants in local structures. The distillation process can be implemented with Bayes optimal classifier. Finally, DistillHash leverages likelihood loss minimization to train the hashing network with the similarity structure
\begin{equation} \mathcal {L}_{DistillHash}=- \sum _{i=1}^{N} \sum _{j=1}^{N} \left(\mathbf {1}_{s_{ij}^o=1} \sigma \left(s_{ij}^b\right)+ \mathbf {1}_{s_{ij}^o=-1} \left(1-\sigma \left(s_{ij}^b\right)\right)\right). \end{equation}
(60)
The improvement of DistillHash over SSDH is mainly the introduction of local structures to distill confident signals, which releases the first issue in the last paragraph.
Similarity Adaptive Deep Hashing (SADH) [141]. SADH trains the model alternatively over three parts. In part one, it trains the hashing network under the guidance of binary codes. In part two, it leverages the network output to update the similarity structure. In part three, the hash codes are optimized with network output following the ADMM process. The alternative optimization improves the robustness of the model and helps achieve better hash codes for image retrieval.
Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing (MLS\(^3\)RUDH) [159]. MLS\(^3\)RUDH incorporates the manifold structure in deep feature space to generate an accurate similarity structure. Specifically, it leverages a random walk on the nearest neighbor graph to measure the manifold similarity. The final similarity structure is denoted as follows:
\begin{equation} s_{ij}^o=\left\lbrace \begin{array}{ll}1, & \mathbf {x}_{j} \in N^{c}\left(\mathbf {x}_{i}\right) \wedge \mathbf {x}_{j} \in N^{m}\left(\mathbf {x}_{i}\right) \\ -1, & \mathbf {x}_{j} \in N^{c}\left(\mathbf {x}_{i}\right) \wedge \mathbf {x}_{j} \notin N^{m}\left(\mathbf {x}_{i}\right) \\ 0, & \text{ otherwise} \end{array}\right.\!\!\!, \end{equation}
(61)
where \(N^c(\cdot)\) and \(N^m(\cdot)\) denote the set of the neighbour samples in terms of both cosine similarity and manifold similarity, respectively. Then, the hashing network is optimized through difference loss minimization as
\begin{equation} \mathcal {L}_{MLS^3RUDH}= \sum _{i=1}^{N} \sum _{j=1}^{N} \log \left(\cosh \left(s_{ij}^h -s_{ij}^o\right)\right)\!. \end{equation}
(62)
MLS\(^3\)RUDH leverages the manifold similarity to generate a more accurate similarity structure, which guides the optimization of the hashing network effectively.
Auto-Encoding Twin-Bottleneck Hashing (TBH) [145]. TBH introduces an adaptive code-driven graph to guide hash code learning. It contains a binary bottleneck to construct code-driven similarity graph and a continuous bottleneck for reconstruction. To be specific, the similarity structure is defined by hash codes
\begin{equation} s_{ij}^o = 1- d_{ij}^h/L, \end{equation}
(63)
where \(d_{ij}\) is the Hamming distance between \(b_i\) and \(b_j\). The outputs of the continuous bottleneck are fed into graph neural networks with the similarity structure as the adjacency for the final reconstruction. Moreover, TBH involves adversarial learning to regularize the network for high-quality hash codes. TBH utilizes a dynamic graph guided with the reconstruction loss for accurate similarity structures, which helps hash code preserve better similarity for reliable retrieval.
Deep Unsupervised Hashing by Global and Local Consistency (GLC) [120]. GLC extracts semantic information from both local and global views. For local views, it builds reliable graphs and penalty graphs based on the cosine distances of image pairs. For global views, it utilizes global clustering to derive cluster centers for different classes. During the optimization of hashing network, GLC preserves the local similarity using a product loss and minimizes the Hamming distances between the hash codes in the same cluster. Compared with previous methods, GLC preserves the similarity from different views in a unified manner, resulting in effective retrieval performance.
CIMON [122]. CIMON first sets a threshold \(d_t\) to partition the local similarity signals based on cosine metric into two groups. Inspired by the fact that the representations of samples with the similar semantic information ought to be on a high-dimensional manifold, CIMON adopts the results of spectral clustering to remove contradictory results for refining the semantic similarities. Moreover, it constructs the confidence of the similarity signals. The semantic information includes similarity signals \(\lbrace s_{ij}^o\rbrace _{(i,j)\in \mathcal {E}}\) and their confidence \(\lbrace w_{ij}\rbrace _{(i,j)\in \mathcal {E}}\). In formulation,
\begin{equation} s_{ij}^o= {\left\lbrace \begin{array}{ll}1 & c_{i}=c_{j} \& d(\mathbf {z}_i,\mathbf {z}_j)\lt d_{t} \\ -1 & c_{i} \ne c_{j} \& d(\mathbf {z}_i,\mathbf {z}_j)\lt d_{t} \\ 0 & \text{ otherwise} \end{array}\right.}\!\!, \end{equation}
(64)
where \(\lbrace c_i\rbrace _{i=1}^N\) is the cluster label of clustering. The confidence is built based on the cumulative distribution function
\begin{equation} w_{i j}= {\left\lbrace \begin{array}{ll}\frac{\Phi _{1}(d_t)-\Phi _{1}\left(d(\mathbf {z}_i,\mathbf {z}_j)\right)}{\Phi _{1}(d_t)-\Phi _{1}(0)} & d(\mathbf {z}_i,\mathbf {z}_j) \le d_t \& s_{ij}^o \ne 0 \\ \frac{\Phi _{2}\left(d(\mathbf {z}_i,\mathbf {z}_j)\right)-\Phi _{2}(d_t)}{\Phi _{2}(2)-\Phi _{2}(d_t)} & d_t\lt d(\mathbf {z}_i,\mathbf {z}_j) \& s_{ij}^o \ne 0 \\ 0 & \hat{S}_{i j}=0 \end{array}\right.}\!, \end{equation}
(65)
where \(\Phi _\cdot (\cdot)\) is cumulative distribution function of estimated Gaussian distribution. CIMON generates two groups of semantic information by data augmentation and matches the hash code similarity with similarity information in a parallel and cross manner. Moreover, contrastive learning is also introduced to improve the quality of hash codes. To our knowledge, CIMON is the first method using contrastive learning for hash code learning and achieves impressive performance due to both reliable similarity information and contrastive learning.
Maximizing Bit Entropy (MBE) [104]. MBE utilizes the continuous cosine similarity signals to guide hash code learning. More importantly, it introduces a bi-half layer for better quantization. Specifically, for the continuous network outputs, MBE sorts the elements of each dimension over all the minibatch samples, and then assigns the top half elements to 1 and the remaining elements to \(-1\). In this manner, MBE can achieve absolute bit balance. The optimization of the bi-half layer is based on a straight-through estimator similar to [156].
DATE [121]. DATE characterizes each image by a set of its augmented views, which can be considered as examples from its latent distributions. Then, it calculates the semantic distances between sample pairs by computing the distribution divergence using a non-parametric way. Specifically, we define the smoothed ball divergence statistic written as
\begin{equation} \begin{aligned}B D\left(\left\lbrace \mathbf {z}_{i}^{r}\right\rbrace _{r=1}^{R},\left\lbrace \mathbf {z}_{j}^{r}\right\rbrace _{m=1}^{R}\right) &=\frac{1}{R} \sum _{r=1}^{R}\left(\left(\frac{1}{R} \sum _{r=1}^{M} d\left(\mathbf {z}_{i}^m, \mathbf {z}_{j}^r\right)-d\big (\mathbf {z}_{i}^m, \mathbf {z}_{i}^r\big)\right)^{2}\right.\\ &\quad \left. +\left(\frac{1}{R} \sum _{r=1}^{M} d\left(\mathbf {z}_{j}^m, \mathbf {z}_{j}^r\right)-d\left(\mathbf {z}_{j}^m, \mathbf {z}_{i}^r\right)\right)^{2}\right), \end{aligned} \end{equation}
(66)
where \(\lbrace \mathbf {z}_{i}^{r}\rbrace _{r=1}^{R}\) and \(\lbrace \mathbf {z}_{j}^{r}\rbrace _{r=1}^{R}\) denote the features of augmented views of images \(\mathbf {x}_i\) and \(\mathbf {x}_j\) through a pre-trained network. Then, the distribution distance is combined with cosine distance to generate reliable semantic information. Contrastive learning is also utilized for high-quality hash codes. Through accurate semantic information enhanced by augmentations, DATE can achieve promising performance for image retrieval.

4.3 Pseudo-label-based Methods

The second class of deep unsupervised methods generates pseudo-labels. These methods treat pseudo-labels as semantic information and convert this problem into supervised hashing. Most of them first leverage clustering (e.g., K-means and spectral clustering) to generate pseudo-labels [67, 69, 143, 186, 191]. Then, these pseudo-labels guide hash code learning with deep supervised hashing methods. Further studies utilize a deep clustering framework to combine clustering with the hashing network to adaptively update pseudo-labels [37, 51].
Pseudo Label-based Unsupervised Deep Discriminative Hashing (PLUDDH) [67]. PLUDDH utilizes the pre-trained network to extract deep features and then generates pseudo-labels via clustering. Then the hashing network is supervised by pseudo-labels. It has the same neural network architecture as DBN and trains it with the classification loss and the quantization loss. PLUDDH explores deep feature space with coarse clustering, which may generate false pseudo-labels. Hence, its retrieval performance is limited when the dataset is complicated.
Unsupervised Learning of Discriminative Attributes and Visual Representations (DAVR) [69]. DAVR adopts a two-step framework. In the first stage, a CNN is trained coupled with unsupervised discriminative clustering [150] to generate the cluster membership. In the second stage, cluster membership is utilized as supervision to uncover common cluster properties while optimizing their separability using a triplet objective. In general, the unsupervised hashing is converted into a supervised problem by the obtained pseudo labels.
Unsupervised Deep Hashing with Pseudo Labels (UDHPL) [186]. UDHPL first extracts features and reduces their dimension with Principle Component Analysis (PCA) to release the noise. Then it generates the pseudo-labels through the Bayes’ rule. UDHPL maximizes the correlation between the projection vectors of pseudo-labels and deep features, and the features can be projected into the Hamming space. With a rotation matrix, the hash code can be generated, which will guide the optimization of the hashing network. UDHPL improves the pseudo-labels through PCA and guides the network training with mutual information maximization, which helps to preserve similarity information for effective retrieval.
Clustering-driven Unsupervised Deep Hashing (CUDH) [51]. CUDH first extracts deep features from the pre-trained network. Inspired by the deep clustering model DEC [174], which performs clustering in the embedding space, it modifies the model to iteratively learn discriminative clusters in the Hamming space with extra quantization loss. CUDH is capable of generating discriminative hash code in virtue of the deep clustering model.
Deep Unsupervised Hybrid-similarity Hadamard Hashing (DU3H) [191]. DU3H first generates pseudo-labels through K-means clustering. Instead of adding a classification layer, DU3H utilizes Hadamard matrix to project pseudo-labels into Hamming space. This strategy is similar to CSQ [185] but in unsupervised scenarios. Moreover, it generates a similarity structure for preserving pairwise similarity, which considers the confidence of different signals. This consideration of different confidence can also be seen in UDMSH [132] and DSAH [108]. Lastly, a two-layer GCN is introduced to amplify the discrepancy of similarity signals to further guide the hash code learning. DU3H combines pointwise methods and pairwise methods in recent deep supervised learning domains, which helps to achieve significant improvement.
Unsupervised Deep K-means Hashing (UDKH) [37]. UDKH is a joint framework which combines deep clustering with traditional clustering, i.e., K-means. It first uses K-means clustering results to initialize the cluster labels. UDKH learns both hash codes and cluster labels in an alternative manner. Specifically, it first fixes clustering results and optimizes the hash codes as well as the hashing network under supervision. Then, it fixes the hash codes and leverages Discrete Proximal Linearized Minimization [142] to derive the updated pseudo-labels. UDKH repeats the above steps until convergence. UDKH improves the quality of hash codes along with the pseudo-labels with progressive learning, achieving better performance compared with unsupervised methods using fixed pseudo-labels.

4.4 Prediction-Free Self-Supervised Learning-based Methods

The last class of deep unsupervised methods is prediction-free self-supervised learning-based methods. The early methods often impose several constraints on hash codes by minimizing regularization terms (i.e., the bit balance loss, the bit independence loss the quantization loss, and transformation-invariant loss) [40, 105]. To extract more information through deep neural networks, several researchers introduce popular self-supervised techniques into deep unsupervised hashing, such as auto-encoder [31, 36, 145] and generative adversarial network [45, 152, 201], and so on. Recently, contrastive learning has shown promising performance in producing discriminative representations in various domains. Inspired by the fact that hash code is a specific form of representation, several methods involve contrastive learning into recent unsupervised hashing, which helps to get high-quality hash codes [77, 121, 122, 134]. Following the scheme of contrastive learning in [60], these methods usually first transform each input \(\mathbf {x}_i\) into two views \({\mathbf {x}}_i^{(1)}\) and \({\mathbf {x}}_i^{(2)}\). Then, the hashing network projects them into two hash codes \({\mathbf {b}}_i^{(1)}\) and \({\mathbf {b}}_i^{(2)}\). Given the \(\mathbf {\alpha } \star \mathbf {\beta }\) denotes the cosine similarity of two vectors \(\mathbf {\alpha }\) and \(\mathbf {\beta }\), the network is trained by minimizing the loss for each batch as follows:
\begin{equation} \mathcal {L}_{CL}=-\frac{1}{2 N_B} \sum _{i=1}^{N_B}\left(\log \frac{e^{\mathbf {b}_{i}^{(1)} \star \mathbf {b}_{i}^{(2)} / \tau }}{Z_{i}^{(1)}}+\log \frac{e^{\mathbf {b}_{i}^{(1)} \star \mathbf {b}_{i}^{(2)} / \tau }}{Z_{i}^{(2)}}\right), \end{equation}
(67)
where \(\tau\) is a temperature parameter, \(N_B\) is the batch size, and \(Z_{i}^{(r)}=\sum _{j \ne i}(e^{\mathbf {b}^{(r)} \star \mathbf {b}_{j}^{(1)} / \tau }+e^{\mathbf {b}_i^{(r)} \star \mathbf {b}_j^{(2)} / \tau })\), \(r=1\) or 2. This term can also be illustrated using mutual information [134]. Minimizing Equation (67) has three potential benefits. First, since the numerator penalizes the difference in binary codes of samples under different views, it assists in the production of transformation-invariant binary codes. Second, since the denominator promotes to amply the distances between binary codes of different examples which facilities the binary codes to approach a uniform distribution in the Hamming space [166], it assists in optimizing the capacity of hash bits [141], preserving the most semantic information. Third, because contrastive learning demonstrates promising performance in various tasks including linear classification, clustering as well as transfer learning [60, 102], it aids in developing high-quality binary codes for effective retrieval [121].
Deep Hashing (DH) [40]. DH utilizes a deep hashing network and optimizes the parameters of the network with three criteria for the hash codes. Firstly, it minimizes a quantization loss by minimizing the gap between the network output and the learnt hash codes. Secondly, it minimizes the bit balance loss in Equation (14) so that generated binary codes distribute evenly on each bit. Thirdly, it regularizes the weights of hashing network for independent hash codes. The parameters of the hashing network are updated by back-propagation based on the composite objective function. DH only imposes several constraints on hash codes without inferring similarity information from the training data, which results in limited performance.
DeepBit [105]. DeepBit utilizes a deep convolutional neural network as the backbone. It also minimizes the quantization loss as well as the bit balance loss. Differently, DeepBit enforces the hash codes invariant to image rotation. The rotation invariant loss is formulated as
\begin{equation} \mathcal {L}(RI)= \sum _{i=1}^N\sum _{\theta =-R}^{R} exp\left(-\frac{\theta ^2}{2}\right)\left\Vert \mathbf {h}_i- \mathbf {h}_{i,\theta }\right\Vert ^{2}, \end{equation}
(68)
where \(\theta\) is the rotation angle and \(\mathbf {h}_{i,\theta }\) denotes the network output from \(\mathbf {x}_i\) with rotation \(\theta\). This loss acts as a regularization term to enforce the hash codes invariant to certain transformations, which improves the performance compared with DH.
Unsupervised Triplet Hashing (UTH) [73]. UTH builds the triplets from the dataset, each of which contains an anchor example, a rotated example along with a random example. Afterward, the hashing network is optimized using the triplet inputs. The quantization loss and bit balance loss are also adopted for high-quality hash codes. The triplet loss compares the hash codes from different hash codes, which helps generate discriminative hash codes compared with the regularization loss in DeepBit. Hence, UTH performs better than DeepBit in various experiments.
HashGAN [45]. HashGAN contains three networks, i.e., a generator, a discriminator, and a hashing network. The hashing network utilizes \(L\) sigmoid function for final activation. Its objective for real data contains four losses. It first minimizes the entropy of each bit, which is equivalent to a quantization loss. The other three terms enforce the bit balance, invariance to different transformations, and bit independence. Similar to DSH-GAN, the discriminator is trained in an adversarial form. It also leverages the synthesized images by minimizing the distances between outputs of the hashing network and the inputs of the generator, which acts like an auto-encoder. Moreover, it encourages the generator to produce synthetic samples with similar statistics to real samples with \(L_2\)-norm loss. With GAN, HashGAN achieves better performance on both information retrieval and clustering tasks.
Stochastic Generative Hashing (SGH) [31]. SGH proposes to utilize a generative manner to train the hashing network through the Minimum Description Length principle. In this manner, the obtained binary codes compress the whole dataset as much as possible. Specifically, it contains a generative network and an encoding network to build the mapping between inputs and binary codes from adverse directions. During optimization, it trains a variational auto-encoder to reconstruct the input using the least information in binary codes. SGH is a general framework, which can be degraded into ITQ [48] as well as Binary Autoencoder [21].
Unsupervised Hashing with Contrastive Information Bottleneck (CIBHash) [134]. CIBHash adapts contrastive learning in deep unsupervised hashing. It considers the outputs of the hashing network as a form of representation and minimizes the contrastive loss on the outputs. Specifically, CIBHash generates two views for each input, and minimizes the contrastive learning objective, i.e., Equation (67). To estimate the gradient of hashing network with discrete stochastic variables, CIBHash leverages the straight-through gradient estimator [2] and the gradients are transmitted entirely to the front layer similar to [156]. CIBHash also illustrates the objective with Information bottleneck theory with an improved model variant. From that moment on, contrastive learning has been shown an effective tool for deep unsupervised learning since then.
Self-supervised Product Quantization (SPQ) [77]. SPQ combines contrastive learning with deep quantization. The codewords and deep continuous representations are simultaneously optimized by contrasting individually augmented views in a cross manner. Specifically, for two views of each sample, i.e., \({\mathbf {x}}_i^{(1)}\) and \({\mathbf {x}}_i^{(2)}\), SPQ generates deep features \({\mathbf {z}}_i^{(1)}\) and \({\mathbf {z}}_i^{(2)}\) and employs codebooks in the quantization head to generate quantized features \({\hat{\mathbf {z}}}_i^{(1)}\) and \({\hat{\mathbf {z}}}_i^{(2)}\). Instead of comparing similarity between two visual descriptors or two quantized features, SPQ attempts to maximizes cross-similarity between the continuous representation from one perspective and the feature after product quantization from the other perspective. In formulation,
\begin{equation} \mathcal {L}_{SPQ}=-\frac{1}{2 N_B} \sum _{i=1}^{N_B}\left(\log \frac{e^{{\mathbf {z}}_i^{(1)} \star {\hat{\mathbf {z}}}_i^{(1)} / \tau }}{Z_{i}^{(2)}}+\log \frac{e^{\hat{\mathbf {z}}_i^{(1)} \star {{\mathbf {z}}}_i^{(2)} / \tau }}{Z_{i}^{(2)}}\right), \end{equation}
(69)
where \(Z_i^{(1)}=\sum _{j\ne i} e^{{\mathbf {z}}_i^{(1)} \star {\hat{\mathbf {z}}}_j^{(2)} / \tau }\) and \(Z_i^{(2)}=\sum _{j\ne i} e^{\hat{\mathbf {z}}_i^{(2)} \star {{\mathbf {z}}}_j^{(1)} / \tau }\). With the cross contrastive learning strategy, both codewords and continuous representations are concurrently optimized to produce high-quality outputs for effective image retrieval.
Hashing via Structural and Intrinsic Similarity Learning (HashSIM) [119]. HashSIM utilizes contrastive learning for deep unsupervised hashing from a different view. For each batch, it stacks two views of binary codes into two distinct matrices \(\mathbf {B}^{(1)}\) and \(\mathbf {B}^{(2)} \in \mathbb {R}^{N_B\times L}\), and takes their column vectors as bit vectors \(\lbrace \mathbf {c}_l^{(r)}\rbrace _{l=1}^L\), \(r=1\) or 2. Then, HashSIM develops a intrinsic similarity learning objective as follows:
\begin{equation} \mathcal {L}_{HashSIM}=-\frac{1}{2L} \sum _{l=1}^{L}\left(\log \frac{e^{\mathbf {c}_{l}^{(1)} \star \mathbf {c}_{l}^{(2)} / \tau }}{Z_{i}^{(1)}}+\log \frac{e^{\mathbf {c}_{l}^{(1)} \star \mathbf {c}_{l}^{(2)} / \tau }}{Z_{l}^{(2)}}\right), \end{equation}
(70)
where \(Z_{i}^{(r)}=\sum _{l^{\prime } \ne l}(e^{\mathbf {c}_l^{(r)} \star \mathbf {c}_{l^{\prime }}^{(1)} / \tau }+e^{\mathbf {c}_{l}^{(r)} \star \mathbf {c}_{l^{\prime }}^{(2)} / \tau })\). Due to the fact the numerator attempts to reduce the gap between each hash bit under distinct augmentations and the denominator attempts to enlarge the distance between distinct bits, minimizing this self-supervised objective helps produce robust and independent hash codes for effective image retrieval.

5 Related Important Topics

5.1 Semi-Supervised Deep Hashing

Semi-supervised deep hashing simultaneously leverages the semantic information from both labeled samples and unlabeled samples, and a range of semi-supervised deep hashing models have been developed recently. Compared with supervised methods and unsupervised methods, these methods can typically overcome label scarcity in practical with limited performance degradation. These methods usually incorporate semi-supervised techniques (e.g., pairwise pseudo-labeling [148, 176, 187], GAN [161, 162], and transductive learning [147]) into deep semi-supervised hashing. Then, the retrieval performance can benefit from abundant unlabeled images in the real world. Generally, semi-supervised deep hashing provides a cost-effective solution to practical applications with promising performance, which desires further study in large-scale scenarios. We then review these methods in detail.
Semi-Supervised Deep Hashing (SSDH) [187]. SSDH minimizes the semi-supervised loss function containing three terms, i.e., a ranking term, a embedding term, as well as a pseudo-label term. Supervised ranking term leverages a triplet loss for labeled data. Then, SSDH generates an online k-NN graph for all data, which guides pairwise similarity preserving of the hashing network. Moreover, in semi-supervised settings, it generates pseudo-labels which further guide the similarity preserving process. SSDH is the first to perform deep hashing in a semi-supervised fashion.
Deep Hashing with a Bipartite Graph (BGDH) [176]. BGDH builds a bipartite graph to uncover the latent semantic structure for unlabeled data. Different from the similarity graph in unsupervised hashing, its similarity structure is based on the relationships between labeled examples and unlabeled examples, resulting in a bipartite graph. Then, BGDH utilizes the bipartite graph to guide hash code learning by pairwise similarity preserving. It also adopts the loss term in DPSH [101] for supervised learning. Through mining the relationship in deep feature space, BGDH utilizes unlabeled data in an appropriate manner and improves the performance.
Semi-Supervised Generative Adversarial Hashing (SSGAH) [161]. SSGAH combines a Generative Adversarial Network with deep semi-supervised hashing. It contains a generative network, a discriminator and a deep hashing network. The generative network produces two synthetic images \(\mathbf {x}_{syn}^p\) and \(\mathbf {x}_{syn}^n\) for each real image \(\mathbf {x}\) and the similarity between \(\mathbf {x}\) and \(\mathbf {x}_{syn}^p\) is larger than the similarity between \(\mathbf {x}\) and \(\mathbf {x}_{syn}^n\). In this way, SSGAH learns the distribution of triplet-wise semantic message from both labeled samples as well as unlabeled samples. The discriminator estimates the likelihood that each input is synthetic. The hashing network is optimized using a triplet loss with the incorporation of synthetic positive and negative images. SSGAH can produce hash codes, which could sufficiently explore semantics in the datasets by training the framework using an adversarial manner.
Semi-supervised Deep Pairwise Hashing (SSDPH) [148]. SSDPH chooses a variety of labeled anchors in the training set, and then uses the pairwise objective for preserving similarities between labeled samples. More importantly, it leverages the technique of temporal ensembling from semi-supervised learning for learning similarity information from unlabeled data. Specifically, it contains a teacher model and a student model. The teacher model provides supervised information to guide the similarity learning, which is then updated in an ensemble manner. SSDPH first combines deep hashing with semi-supervised techniques, which improves the retrieval performance in real-world applications.
Transductive Semi-supervised Deep Hashing (TSDH) [147]. TSDH extends the traditional transductive learning principle into deep semi-supervised hashing, which treats pseudo-labels of unlabeled data as variables and optimizes them alternatively with the hashing network. To accomplish this, it adds a classification layer after producing hash codes. Moreover, it involves a pairwise loss for similarity preservation. Lastly, TSDH estimates the confidence of pseudo-labels by the proximity distance
\begin{equation} v_{i}=\sum _{\mathbf {z}_{j} \in \mathcal {N}\left(\mathbf {z}_{i}\right)}\Vert \mathbf {z}_{i}-\mathbf {z}_{j}\Vert _{2}\!, \end{equation}
(71)
\begin{equation} r_{i}=1-\frac{v_{i}}{v_{\max }}, \quad v_{\max }=\max \left\lbrace v_{1}, \ldots , v_{N}\right\rbrace \!, \end{equation}
(72)
where \(\mathbf {z}_i\) is the extracted features of \(\mathbf {x}_i\) and \(N(\mathbf {z}_i)\) denotes the k-nearest neighbor set of \(\mathbf {z}_i\). In this manner, samples that reside in densely populated regions are assigned a high confidence level. In summary, TSDH utilizes the popular transductive learning technique to improve the retrieval performance of semi-supervised hashing.
Adversarial Binary Mutual Learning (ACML) [162]. ACML also integrates a Generative Adversarial Network into semi-supervised deep hashing. Specifically, it contains a discriminative network and a generation network to mould the relationships between inputs and binary codes from opposite views. Then, an adversarial network is trained to differentiate between real and fake pairs of samples and their hash codes. In this way, it can leverage unlabeled data to make the discriminative network and generation network mutually learn from each other. Moreover, it introduces a Weibull distribution for better similarity preserving. ACML combines a Generative Adversarial Network with deep semi-supervised hashing and shows promising retrieval performance.

5.2 Domain Adaptation Deep Hashing

The data in the domain of interest is likely to be insufficient in practice, while the labeled samples from a separate but correlated domain is usually accessible. To sufficiently utilize the labeled samples from source domains, several domain adaptive hashing methods have been developed in recent years. These hashing methods usually combine similarity preserving techniques (e.g., pairwise [160, 199] and ranking-based similarity preserving [118]) in deep supervised hashing with domain adaptation techniques (e.g., discrepancy minimization [70, 188], adversarial learning [62, 118], and centroid alignment [62, 160]). Hence, their methods are quite flexible. However, the cross-domain retrieval performance of current hashing methods is still not satisfactory, which desires further exploration in the future. We then review these methods as follows.
Domain Adaptive Hashing (DAH) [160]. DAH contains three parts, i.e., a supervised hashing module for source data, an unsupervised hashing module for target data, and a domain disparity reduction module. For source data, it minimizes the likelihood loss along with the quantization loss. For source data, it leverages the source output to generate the label distributions and then minimizes the entropy to ensure that the target outputs approximate source outputs from each category. Furthermore, DAH reduces the domain difference between the source and target representations through the minimization of multi-kernel Maximum Mean Discrepancy. This work is the first to combine unsupervised domain adaptation with deep hashing and improves the efficiency for cross-domain image retrieval.
Domain Adaptive Hashing with Intersectant Generative Adversarial Networks (IGAN) [62]. Different from DAH, IGAN generates the pseudo-labels for target domains and then aligns the semantic centroid for all categories. Moreover, it leverages two generators to reconstruct images in two domains and the generators and the discriminators are updated using a GAN objective. IGAN improves the retrieval performance using GAN as well as centroid alignment, which are two common techniques in domain adaption.
Deep Domain Adaptation Hashing with Adversarial Learning (DeDAHA) [118]. DeDAHA contains two different CNNs for learning image representations. An adversarial loss is enforced to explore the knowledge robust to different domains. Then DeDAHA utilizes a standard triplet loss to learn the hashing encoder. When the label annotations in target data are unavailable, DeDAHA leverages a multi-stage framework for unsupervised domain adaptation hashing.
Deep Transfer Hashing (DPH) [199]. DPH first uses a neural network to extract deep features and then incorporates a deep transformation mapping network for domain adaptation. Then, for effective transfer learning, DPH generates the similarity information based on the cosine similarity of deep features as well as the hash codes in source domains to guide hash code learning. DPH shows great generality utilizing the powerful representative capacity of deep learning.
Optimal Projection Guided Transfer Hashing (GTH) [188]. GTH seeks for the maximum likelihood estimation solution to minimize the error matrix between two hash projections of target and source domains. In this way, GTH can produce domain-invariant hash projections for effective cross-domain image retrieval. However, GTH assumes that similar domains should have small discrepancies between hash projections, which may be not promised in most scenarios.
Domain Adaptation Preconceived Hashing (DAPH) [70]. DAPH first reduces the distribution discrepancy across two domains through learning a transformation matrix to project the samples from different domains into a common space. Moreover, it involves a reconstruction constraint to release the information loss from the transformation. For effective hash code learning, it adds a quantization loss to project features into hash codes. The whole learning process is in an alternative manner for updating the transformation matrix, projection, and binary codes. DAPH improves the performance for challenging cross-domain retrieval.

5.3 Multi-Modal Deep Hashing

Multimedia data have exploded in multiple modalities including text, audio, image, and video since the dawn of the information era and the fast expansion of the Internet. Multi-model deep hashing has arisen much interest in the field of deep hashing recently. These methods typically project multiple modalities of data into a shared Hamming space using deep neural networks for effective cross-modal retrieval. The framework of multi-modal deep hashing methods is similar to general deep hashing methods except that the similarity information includes the intra-modal and inter-modal forms. However, each loss term characterizing the similarity information is similar to that in deep supervised hashing discussed above. Existing methods [170] can also be categorized into supervised methods [13, 83, 178] and unsupervised methods [66, 168, 184]. Cao et al. [11] give a detailed review for the multi-modal hashing methods that includes [12, 13, 28, 35, 50, 65, 68, 81, 83, 95, 97, 178, 183, 192].

6 Evaluation Protocols

6.1 Evaluation Metrics

For deep hashing algorithms, the space cost only depends on the length of the hash codes, so the length is usually kept the same when comparing the performance of different algorithms. The search efficiency is measured by the average search time for a query, which mainly depends on the architecture of the neural networks. Besides, if the weighted Hamming distance is used, we cannot take advantage of bit operation for efficiency.
As discussed above, we usually use search accuracy to measure performance. The most popular matrices include Mean Average Precision, Recall, Precision, as well as the precision-recall curve. Precision: Precision is defined by the proportion of returned samples that share the common label with the query. The formula can be formulated as
\begin{equation} precision=\frac{TP}{TP+FP}, \end{equation}
(73)
where \(TP\) denotes the number of returned samples that have a common label with the query and \(FP\) denotes the number of returned samples that do not have a common label with the query. \(precision\)@\(k\) means the total number of returned sample is \(k\), i.e., \(TP+FP=k\).
Recall: Recall is defined by the proportion of samples in the database that have a common label with the query that is retrieved. The formula can be formulated as
\begin{equation} recall=\frac{TP}{TP+FN}, \end{equation}
(74)
where \(FN\) is the total number of samples in the database that have a common label with the query, including samples not retrieved. \(recall\)@\(k\) means the number of returned examples is \(k\).
Precision-recall curve: The precision rate and recall rate in image retrieval are both influenced by \(k\). The precision and recall rates of an approach are inversely proportional. As a result, we could create the precision-recall curve by altering \(k\) and using the precision rate and recall rate, respectively.
Mean average precision (MAP): When the recall rate varies between 0 and 1, the average accuracy can be computed by varying the precision rate. The sequence summation approach is used to compute the average accuracy in practical applications with discretion
\begin{equation} AP=\frac{1}{F}\sum _{k=1}^N precision\text{@}k\Delta \lbrace T\text{@}k\rbrace , \end{equation}
(75)
in which \(\Delta \lbrace T\text{@}k\rbrace\) denotes the change in recall from item \(k-1\) to \(k\). The sum of \(\Delta \lbrace T\text{@}k\rbrace\) is \(F\) and the core idea of AP is to evaluate a ranked list through averaging the precision at every position. Afterward, MAP can be derived by taking the mean of the average precision of every query. In several works, MAP is calculated in terms of top K ranked retrieval results. Some researchers also calculate the MAP with Hamming Radius r, when only samples with distances not bigger than \(r\) are considered.
Alexandre Sablayrolles et al. [135] show that the above popular evaluation protocols for supervised hashing are not satisfactory because a trivial solution that encodes the output of a classifier significantly outperforms existing methods. Furthermore, they provide a novel evaluation protocol based on retrieval of unseen classes and transfer learning. However, if the design of hashing methods avoids using the encoding of the classifier, the above popular evaluation protocols are still effective generally.

6.2 Datasets

The scales of regularly used assessment datasets range from small to large to extremely large. Single-label datasets and multi-label datasets are two types of datasets.
MNIST [94] is comprised 60,000 training samples and 10,000 testing samples. It is a single-labeled dataset, where the 10 different classes represent different digits. Each image is represented by 784-dimensional raw features.
CIFAR-10 [89] is comprised 60,000 real-world images in 10 distinct categorizations. It is a single-labeled dataset, where 10 different categorizations imply airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, as well as trucks. These examples are identified with semantic labels utilized to assess the performance of various hashing approaches.
ImageNet [34] is a large-scale dataset that consists of over 1.2 million images hand-annotated by the huge project to find out what objects are included. It is a single-labeled dataset, consisting of 1,000 categories such as “balloon” or “strawberry”.
NUS-WIDE [29] is a well-known multi-labeled image dataset collected by a team from NUS. It consists of 269,648 examples with 5,018 unique tags. These samples are manually associated with some of the 81 concepts. Because images have typically over one label, two samples are treated as semantic similar if they share one common semantic label.
MS COCO [109] is a popular multi-labeled datasets, consisting of 82,783 training examples along with 40,504 validation examples, each of which is associated with part of the 80 categories. After removing examples without any class information, 122,218 samples can be obtained for evaluating the performance of hashing methods.

6.3 Performance Analysis

6.3.1 Performance Comparison of Deep Supervised Methods.

We present the results of some representative deep supervised hashing and quantization algorithms over CIFAR-10, NUS-WIDE, ImageNet, and MS COCO. For CIFAR-10, 100 images are selected randomly per class (resulting in 1,000 images totally) as queries and the rest of samples are adopted as the database. A total of 500 samples per class (resulting in 5,000 samples totally) make up the training set. For NUS-WIDE, a subset of 195,834 samples that correspond to the 21 most frequent labels are picked. Afterward, 100 samples per class (resulting in 2,100 samples totally) are picked as queries and the remaining samples make up the retrieval set. A total of 500 images per class (resulting in 10,500 images totally) are sampled as the training set. For ImageNet, 100 categories are randomly selected as in [19]. The samples associated with the chosen categories in the training set make up the database, and the samples in the validation set are utilized as queries. A total of 100 examples of each category are selected from the database for training. For MS COCO, 5,000 samples are used as queries and the rest are used as the database. A total of 10,000 samples from the database are selected for training.
Note that for the various experimental settings, most of the experimental results are not shown in this summary in detail. The representative compared results of hashing methods are shown in Tables 4 and 5. From the results, there are several observations as follows:
Table 4.
 CIFAR-10NUS-WIDE
Method12bits24bits32bits48bits12bits24bits32bits48bits
CNNH [173]0.4390.5110.5090.5220.6110.6180.6250.608
DNNH [92]0.5520.5660.5580.5810.6740.6970.7130.715
DHN [200]0.5550.5940.6030.6210.7080.7350.7480.758
DRSCH [189]0.6140.6210.6280.6300.6180.6220.6220.627
DSCH [189]0.6080.6130.6170.6190.5910.5970.6100.608
DSRH [197]0.6080.6100.6170.6170.6090.6170.6210.630
DSH-GAN [133]0.7350.7810.7870.8020.8380.8560.8610.863
DTSH [169]0.7100.7500.7650.7740.7730.8080.8120.824
DPSH [101]0.7130.7270.7440.7570.7520.7900.7940.812
DSDH [100]0.7400.7860.8010.8200.7760.8080.8200.829
DQN [17]0.5540.5580.5640.5800.7680.7760.7830.792
DSH [111]0.6440.7420.7700.7990.7120.7310.7400.748
DCEH [172]0.7450.7880.8020.8060.7810.8160.8270.839
DDSH [82]0.7530.7760.8030.8110.7760.8030.8100.817
DFH [103]0.8030.8250.8310.8440.7950.8230.8330.842
Greedy Hash [156]0.7740.7950.8100.822----
MIHash [8]0.7380.7750.7910.8160.7730.8200.8310.843
HBMP [9]0.7990.8040.8300.8310.7570.8050.8220.840
VDSH [195]0.5380.5410.5450.5480.7690.7960.8030.807
NMLayer [43]0.7860.8130.8210.8280.8010.8240.8320.840
HashNet [19]0.6850.7070.7050.7050.7700.8020.8060.816
AnDSH [198]0.7540.7800.7860.7950.7800.8080.8150.823
DISH [193]0.7380.7920.8220.8410.7810.8230.8370.840
SRE [194]0.7710.8170.8390.8580.8010.8330.8490.861
MLDH [124]0.8050.8250.8290.8320.8000.8280.8320.835
SDH [140]0.2850.3290.3410.3560.5680.6000.6080.638
KSH [113]0.3030.3370.3460.3560.5560.5720.5810.588
ITQ [48]0.1270.1280.1260.1290.4540.4060.4050.400
Table 4. MAP for Different Hashing Methods on CIFAR-10 and NUS-WIDE
Table 5.
 ImageNetMS COCO
Method16bits32bits64bits16bits32bits64bits
DBH [106]0.3500.3790.4060.6020.6390.658
DHN [200]0.3110.4720.5730.6770.7010.694
CNNH [173]0.2810.4490.5530.5640.5740.567
DNNH [92]0.2900.4600.5650.5930.6030.609
DTSH [169]0.4420.5280.5810.6990.7320.753
HashNet [19]0.5050.6300.6830.6870.7180.736
SDH [40]0.5840.6490.6640.6710.7100.733
DPSH [101]0.3260.5460.6540.6340.6760.726
DSH [111]0.3480.5500.665---
HashGAN [14]---0.6870.7180.736
DCH [15]0.7170.7630.7870.7000.6910.680
HashMI [8]0.5690.6610.694---
Greedy Hash [156]0.5700.6390.6590.6770.7220.740
JMLH [144]0.5170.6210.6620.6890.7330.758
DPN [41]0.5920.6700.7030.6680.7210.752
CSQ [185]0.7170.7630.8040.7420.8060.829
OrthHash [64]0.6140.6810.7090.7080.7620.785
DSEH [99]0.7150.7530.7600.7350.7730.781
PSLDH [158]0.7340.7920.8170.7820.8350.853
SDH [140]0.2980.4550.5850.5540.5640.579
KSH [113]0.1590.2970.3940.5210.5340.536
ITQ-CCA [48]0.2650.4360.5760.5650.5620.501
ITQ [48]0.3250.4620.5520.5810.6240.657
BRE [91]0.0620.2520.3570.5920.6220.633
SH [171]0.2060.3280.4190.4950.5090.510
LSH [46]0.1000.2350.3590.4590.4850.584
Table 5. MAP for Different Hashing Methods on ImageNet and MS COCO
Deep supervised hashing greatly outperforms traditional hashing methods (SDH and KSH) overall, validating the strong representation-learning capacity of deep learning.
Similarity information is necessary for deep hashing. For deep supervised hashing methods in the early period (i.e., before 2016), hash codes are mostly obtained by transferring classification models without supervised similarity information while the methods with pairwise and ranking information outperform them.
Label information helps to increase the performance of deep hashing. This point can be shown from the fact that DSDH outperforms DPSH evidently and the superiority of LabNet. Moreover, several pointwise methods (CSQ, OrthHash, and PSLDH) show comparable performance recently by mapping the labels into Hamming space, achieving impressive performance on large-scale datasets.
Several skills including regularization term, bit balance, ensemble learning, and bit independence help to obtain accurate and robust performance, which can be seen from ablation studies in some papers [36].
Although supervised hashing methods have achieved remarkable performance, they are difficult to be applied in practice since large-scale data annotations are unaffordable. To address this problem, deep learning-based unsupervised methods provide a cost-effective solution for more practical applications.

6.3.2 Performance Comparison of Deep Unsupervised Methods.

This part presents the results of representative deep unsupervised hashing approaches over CIFAR-10, NUS-WIDE, and MS COCO. We follows the setting in prior works [121, 134, 145]. The dataset splits for training, testing, and database are the same as Section 6.3.1. Part of records are quoted from [134, 145, 146].
The compared results are shown in Table 6. From the results, we have the following observations:
Table 6.
 CIFAR-10NUS-WIDEMS COCO
Method16bits32bits64bits16bits32bits64bits16bits32bits64bits
DeepBit [105]0.1940.2490.2770.3920.4030.4290.3990.4100.475
SGH [31]0.4350.4370.4330.5930.5900.6070.5940.6100.618
BGAN [152]0.5250.5310.5620.6840.7140.7300.6450.6820.707
BinGAN [201]0.4760.5120.5200.6540.7090.7130.6510.6730.696
Greedy Hash [156]0.4480.4730.5010.6330.6910.7310.5820.6680.710
HashGAN [45]0.4470.4630.481------
UH-BDNN [36]0.3010.3090.312------
UTH [105]0.2870.3070.3240.4500.4950.5490.4380.4650.508
SSDH [177]---0.5800.5930.6100.5400.5660.593
DistillHash [179]0.2850.2940.3080.6270.6560.6710.5460.5660.593
MLS\(^3\)RUDH [159]---0.7130.7270.7500.6070.6220.641
TBH [145]0.5320.5730.5780.7170.7250.7350.7060.7350.722
GLC [120]---0.7590.7720.7830.7150.7230.731
DVB [143]0.4030.4220.4460.6040.6320.6650.5700.6290.623
CIBHash [134]0.5900.6220.6410.7900.8070.8150.7370.7600.775
DATE [121]0.5770.6290.6470.7930.8090.815---
ITQ [48]0.3050.3250.3490.6270.6450.6640.5980.6240.648
AGH [114]0.3330.3570.3580.5920.6150.6160.5960.6250.631
DGH [112]0.3350.3530.3610.5720.6070.6270.6130.6310.638
Table 6. MAP for Different Unsupervised Methods on CIFAR-10, NUS-WIDE, and MS COCO
Deep unsupervised hashing methods generally perform better than the traditional approaches (ITQ, BRE, SDH, KSH, and LSH), which suggests that the powerful representation learning capacity of deep learning is beneficial to the retrieval performance of generated binary codes in most cases.
The methods that only adopt regularization terms (DeepBits and UTH) obtain poor results among compared methods, demonstrating that the exploration of semantic information is indispensable for discriminative hash codes.
The methods that explore more accurate similarity structures (DATE and TBH) outperform early approaches that obtain similarity structure in a coarse manner (SSDH and DistillHash). The potential reason is that false similarity signals will result in error propagation during subsequent hash code learning, implying suboptimal performance.
The methods utilizing contrastive learning (CIBHash and DATE) achieve superb performance among compared methods, which implies that contrastive learning is an effective tool for discriminative hash code learning. As the research moves along, deep unsupervised learning methods can even outperform part of deep supervised methods, which is really inspiring.

6.4 Training Time Cost

In this part, we investigate the training efficiency of different deep hashing methods. A total of 10 representative methods are selected. These methods are parameterized by different network backbones (e.g., AlexNet and VGG-F), and these backbones could bring in a larger difference of computational cost in training and inference compared with core hashing techniques. Hence, all the hashing networks are parameterized by VGG-F and optimized on a single NVIDIA GeForce GTX TITAN X GPU for fair comparison of the efficiency. In Figure 4, we report the running time of each epoch during the training phase of different compared approaches. From the results, we have the following observations. First, the efficiency difference between these methods is limited. The potential reason is that the computational cost for hashing methods mainly depends on the forwarding and back propagation of the network backbone. The specific optimization manners have limited impacts on the computational cost. Second, OrthHash is the most effective among different methods, which is because that OrthHash only leverages one brief objective during optimization.
Fig. 4.
Fig. 4. Computational time cost of different hashing methods.

7 Conclusion

In this survey, we present a comprehensive review of the articles on deep hashing, including deep supervised hashing, deep unsupervised hashing, and other related topics. Based on how measuring the similarities of hash codes, we divide deep supervised hashing methods into four categories: pairwise methods, ranking-based methods, pointwise methods, and quantization. In addition, we categorize deep unsupervised hashing into three classes based on semantics learning manners, i.e., reconstruction-based methods, pseudo-label-based methods, and prediction-free self-supervised learning-based methods. We also explore three important topics including semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing. We observe that the existing deep hashing methods mainly focus on the public datasets designed for classification and detection, which do not fully address the nearest neighbor search problem. Future works could attempt to combine downstream approximate nearest neighbor search algorithms to design specific deep hashing methods. In this way, researchers will propose more practical deep hashing methods for real-world applications. Furthermore, cutting-edge deep neural network techniques and representation learning techniques will be integrated into deep hashing and promote the development of large-scale image retrieval.

Acknowledgments

We thank Zeyu Ma, Huasong Zhong, and Xiaokang Chen who discussed with us and provided instructive suggestions.

Footnotes

1
In our survey, \(\lbrace \lambda _1, \lambda _2, \lambda _3, \ldots \rbrace\) always denote the balance coefficients.
2
They can be also called target codes.

References

[1]
Alexandr Andoni and Piotr Indyk. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual IEEE Symposium on Foundations of Computer Science. 459–468.
[2]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432. Retrieved from https://arxiv.org/abs/1308.3432.
[3]
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When is “nearest neighbor” meaningful?. In Proceedings of the International Conference on Database Theory. 217–235.
[4]
Christian Böhm, Stefan Berchtold, and Daniel A. Keim. 2001. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. Computing Surveys 33, 3 (2001), 322–373.
[5]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3, 1 (2011), 1–122.
[6]
Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequence. 21–29.
[7]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems 29, 8–13 (1997), 1157–1166.
[8]
Fatih Cakir, Kun He, Sarah Adel Bargal, and Stan Sclaroff. 2019. Hashing with mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 10 (2019), 2424–2437.
[9]
Fatih Cakir, Kun He, and Stan Sclaroff. 2018. Hashing with binary matrix pursuit. In Proceedings of the European Conference on Computer Vision.332–348.
[10]
Riccardo Cantini, Fabrizio Marozzo, Giovanni Bruno, and Paolo Trunfio. 2021. Learning sentence-to-hashtags semantic mapping for hashtag recommendation on microblogs. ACM Transactions on Knowledge Discovery from Data 16, 2 (2021), 1–26.
[11]
Wenming Cao, Wenshuo Feng, Qiubin Lin, Guitao Cao, and Zhihai He. 2020. A review of hashing methods for multimodal retrieval. IEEE Access 8, (2020), 15377–15391.
[12]
Wenming Cao, Qiubin Lin, Zhihai He, and Zhiquan He. 2019. Hybrid representation learning for cross-modal retrieval. Neurocomputing 345 (2019), 45–57.
[13]
Yue Cao, Bin Liu, Mingsheng Long, and Jianmin Wang. 2018. Cross-modal hamming hashing. In Proceedings of the European Conference on Computer Vision.202–218.
[14]
Yue Cao, Bin Liu, Mingsheng Long, and Jianmin Wang. 2018. Hashgan: Deep learning to hash with pair conditional wasserstein gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1287–1296.
[15]
Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep cauchy hashing for hamming space retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1229–1237.
[16]
Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Deep visual-semantic quantization for efficient image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1328–1337.
[17]
Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, and Qingfu Wen. 2016. Deep quantization network for efficient image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence.
[18]
Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li, Xiulong Liu, and Jie Gui. 2017. Binary hashing for approximate nearest neighbor search on big data: A survey. IEEE Access 6 (2017), 2039–2054.
[19]
Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. Hashnet: Deep learning to hash by continuation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5608–5617.
[20]
Zhangjie Cao, Ziping Sun, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2018. Deep priority hashing. In Proceedings of the ACM International Conference on Multimedia. 1653–1661.
[21]
Miguel A. Carreira-Perpinán and Ramin Raziperchikolaei. 2015. Hashing with binary autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 557–566.
[22]
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the Annual ACM Symposium on Theory of Computing. 380–388.
[23]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the British Machine Vision Conference.
[24]
Junjie Chen and William K. Cheung. 2019. Similarity preserving deep asymmetric quantization for image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8183–8190.
[25]
Shen Chen, Liujuan Cao, Mingbao Lin, Yan Wang, Xiaoshuai Sun, Chenglin Wu, Jingfei Qiu, and Rongrong Ji. 2019. Hadamard codebook-based deep hashing. arXiv:1910.09182. Retrieved from https://arxiv.org/abs/1910.09182.
[26]
Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, and Wai Keung Wong. 2019. Deep supervised hashing with anchor graph. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9796–9804.
[27]
Yaxiong Chen and Xiaoqiang Lu. 2019. Deep discrete hashing with pairwise correlation learning. Neurocomputing 385, 2020 (2019), 111–121.
[28]
Zhikui Chen, Fangming Zhong, Geyong Min, Yonglin Leng, and Yiming Ying. 2018. Supervised intra-and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6 (2018), 27796–27808.
[29]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 48.
[30]
Hui Cui, Lei Zhu, and Wentao Tan. 2021. Efficient inter-image relation graph neural network hashing for scalable image retrieval. In Proceedings of the ACM International Conference on Multimedia in Asia. 1–8.
[31]
Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, and Le Song. 2017. Stochastic generative hashing. In Proceedings of the International Conference on Machine Learning. 913–922.
[32]
Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. 2011. Fast locality-sensitive hashing. In Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1073–1081.
[33]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Annual Symposium on Computational Geometry. 253–262.
[34]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 248–255.
[35]
Kun Ding, Bin Fan, Chunlei Huo, Shiming Xiang, and Chunhong Pan. 2016. Cross-modal hashing via rank-order preserving. IEEE Transactions on Multimedia 19, 3 (2016), 571–585.
[36]
Thanh-Toan Do, Anh-Dzung Doan, and Ngai-Man Cheung. 2016. Learning to hash with binary deep neural network. In Proceedings of the European Conference on Computer Vision.219–234.
[37]
Xiao Dong, Li Liu, Lei Zhu, Zhiyong Cheng, and Huaxiang Zhang. 2020. Unsupervised deep k-means hashing for efficient image retrieval and clustering. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2020), 3266–3277.
[38]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations.
[39]
Sepehr Eghbali and Ladan Tahvildari. 2019. Deep spherical quantization for image search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11690–11699.
[40]
Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2475–2483.
[41]
Lixin Fan, Kam Woh Ng, Ce Ju, Tianyu Zhang, and Chee Seng Chan. 2020. Deep polarized network for supervised learning of accurate binary hashing codes. In Proceedings of the International Joint Conference on Artificial Intelligence. 825–831.
[42]
Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 3 (1977), 209–226.
[43]
Chaoyou Fu, Liangchen Song, Xiang Wu, Guoli Wang, and Ran He. 2019. Neurons merging layer: Towards progressive redundancy reduction for deep supervised hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 2322–2328.
[44]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 4 (2013), 744–755.
[45]
Kamran Ghasedi Dizaji, Feng Zheng, Najmeh Sadoughi, Yanhua Yang, Cheng Deng, and Heng Huang. 2018. Unsupervised deep generative adversarial hashing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3664–3673.
[46]
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, Vol. 99. 518–529.
[47]
Yunchao Gong, Sanjiv Kumar, Vishal Verma, and Svetlana Lazebnik. 2012. Angular quantization-based binary codes for fast similarity search. In Proceedings of the Conference on Neural Information Processing Systems, Vol. 25.
[48]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2012. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12 (2012), 2916–2929.
[49]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference on Neural Information Processing Systems. 2672–2680.
[50]
Wen Gu, Xiaoyan Gu, Jingzi Gu, Bo Li, Zhi Xiong, and Weiping Wang. 2019. Adversary guided asymmetric hashing for cross-modal retrieval. In Proceedings of the International Conference on Multimedia Retrieval. 159–167.
[51]
Yifan Gu, Shidong Wang, Haofeng Zhang, Yazhou Yao, Wankou Yang, and Li Liu. 2019. Clustering-driven unsupervised deep hashing for image retrieval. Neurocomputing 368 (2019), 114–123.
[52]
Chaoyu Guan, Xiting Wang, Quanshi Zhang, Runjin Chen, Di He, and Xing Xie. 2019. Towards a deep and unified understanding of deep neural models in nlp. In Proceedings of the International Conference on Machine Learning. 2454–2463.
[53]
Jie Gui, Tongliang Liu, Zhenan Sun, Dacheng Tao, and Tieniu Tan. 2017. Fast supervised discrete hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2017), 490–496.
[54]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of wasserstein gans. In Proceedings of the Conference on Neural Information Processing Systems. 5767–5777.
[55]
Yuchen Guo, Xin Zhao, Guiguang Ding, and Jungong Han. 2018. On trivial solution and high correlation problems in deep supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence.
[56]
Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In Proceedings of the International Joint Conference on Artificial Intelligence.
[57]
Junwei Han, Dingwen Zhang, Gong Cheng, Nian Liu, and Dong Xu. 2018. Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine 35, 1 (2018), 84–100.
[58]
Junfeng He, Wei Liu, and Shih-Fu Chang. 2010. Scalable similarity search with optimized kernel hashing. In Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1129–1138.
[59]
Kun He, Fatih Cakir, Sarah Adel Bargal, and Stan Sclaroff. 2018. Hashing as tie-aware learning to rank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4023–4032.
[60]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.
[61]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.
[62]
Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, and Jingkuan Song. 2019. One network for multi-domains: Domain adaptive hashing with intersectant generative adversarial networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 2477–2483.
[63]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527–1554.
[64]
Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang, Chee Seng Chan, Yi-Zhe Song, and Tao Xiang. 2021. One loss for all: Deep hashing with a single cosine similarity based learning objective. In Proceedings of the Conference on Neural Information Processing Systems.
[65]
Di Hu, Feiping Nie, and Xuelong Li. 2018. Deep binary reconstruction for cross-modal hashing. IEEE Transactions on Multimedia 21, 4 (2018), 973–985.
[66]
Hengtong Hu, Lingxi Xie, Richang Hong, and Qi Tian. 2020. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3123–3132.
[67]
Qinghao Hu, Jiaxiang Wu, Jian Cheng, Lifang Wu, and Hanqing Lu. 2017. Pseudo label based unsupervised deep discriminative hashing for image retrieval. In Proceedings of the ACM International Conference on Multimedia. 1584–1590.
[68]
Yupeng Hu, Meng Liu, Xiaobin Su, Zan Gao, and Liqiang Nie. 2021. Video moment localization via deep cross-modal hashing. IEEE Transactions on Image Processing 30 (2021), 4667–4677.
[69]
Chen Huang, Chen Change Loy, and Xiaoou Tang. 2016. Unsupervised learning of discriminative attributes and visual representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5175–5184.
[70]
Fuxiang Huang, Lei Zhang, and Xinbo Gao. 2021. Domain adaptation preconceived hashing for unconstrained visual retrieval. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–15. DOI:
[71]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4700–4708.
[72]
Long-Kai Huang, Jianda Chen, and Sinno Jialin Pan. 2019. Accelerate learning of deep hashing with gradient attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5271–5280.
[73]
Shanshan Huang, Yichao Xiong, Ya Zhang, and Jia Wang. 2017. Unsupervised triplet hashing for fast image retrieval. In Proceedings of the on Thematic Workshops of ACM Multimedia. 84–92.
[74]
Aapo Hyvärinen, Jarmo Hurri, and Patrick O. Hoyer. 2009. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, Vol. 39. Springer Science & Business Media.
[75]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Annual ACM Symposium on Theory of Computing. 604–613.
[76]
Himalaya Jain, Joaquin Zepeda, Patrick Pérez, and Rémi Gribonval. 2017. Subic: A supervised, structured binary code for image search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 833–842.
[77]
Young Kyun Jang and Nam Ik Cho. 2021. Self-supervised product quantization for deep unsupervised image retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12085–12094.
[78]
Kalervo Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the International ACM SIGIR Conference on Research & Development in Information Retrieval, Vol. 51. 243–250.
[79]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128.
[80]
Tianxu Ji, Xianglong Liu, Cheng Deng, Lei Huang, and Bo Lang. 2014. Query-adaptive hash code ranking for fast nearest neighbor search. In Proceedings of the ACM International Conference on Multimedia. 1005–1008.
[81]
Zhenyan Ji, Weina Yao, Wei Wei, Houbing Song, and Huaiyu Pi. 2019. Deep multi-level semantic hashing for cross-modal retrieval. IEEE Access 7, 2021 (2019), 23667–23674.
[82]
Qing-Yuan Jiang, Xue Cui, and Wu-Jun Li. 2018. Deep discrete supervised hashing. IEEE Transactions on Image Processing 27, 12 (2018), 5996–6009.
[83]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3232–3240.
[84]
Qing-Yuan Jiang and Wu-Jun Li. 2018. Asymmetric deep supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence.
[85]
Lu Jin, Xiangbo Shu, Kai Li, Zechao Li, Guo-Jun Qi, and Jinhui Tang. 2018. Deep ordinal hashing with spatial attention. IEEE Transactions on Image Processing 28, 5 (2018), 2173–2186.
[86]
Yannis Kalantidis and Yannis Avrithis. 2014. Locally optimized product quantization for approximate nearest neighbor search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2321–2328.
[87]
Rong Kang, Yue Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2019. Maximum-margin hamming hashing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8252–8261.
[88]
Benjamin Klein and Lior Wolf. 2019. End-to-end supervised product quantization for image search and retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5041–5050.
[89]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Citeseer (2009).
[90]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Neural Information Processing Systems. 1097–1105.
[91]
Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Proceedings of the Conference on Neural Information Processing Systems. 1042–1050.
[92]
Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3270–3278.
[93]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[94]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
[95]
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4242–4251.
[96]
Dagang Li, Junmao Li, and Zheng Du. 2016. Deterministic and efficient hash table lookup using discriminated vectors. In Proceeding of the IEEE Global Communications Conference. 1–6.
[97]
Fengling Li, Tong Wang, Lei Zhu, Zheng Zhang, and Xinhua Wang. 2021. Task-adaptive asymmetric deep cross-modal hashing. Knowledge-Based Systems 219 (2021), 106851.
[98]
Jiayong Li, Wing W. Y. Ng, Xing Tian, Sam Kwong, and Hui Wang. 2019. Weighted multi-deep ranking supervised hashing for efficient image retrieval. International Journal of Machine Learning and Cybernetics 11, 4 (2019), 883–897.
[99]
Ning Li, Chao Li, Cheng Deng, Xianglong Liu, and Xinbo Gao. 2018. Deep joint semantic-embedding hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 2397–2403.
[100]
Qi Li, Zhenan Sun, Ran He, and Tieniu Tan. 2017. Deep supervised discrete hashing. In Proceedings of the Conference on Neural Information Processing Systems. 2482–2491.
[101]
Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2016. Feature learning based deep supervised hashing with pairwise labels. In Proceedings of the AAAI Conference on Artificial Intelligence. 1711–1717.
[102]
Yunfan Li, Peng Hu, Zitao Liu, Dezhong Peng, Joey Tianyi Zhou, and Xi Peng. 2021. Contrastive clustering. In Proceedings of the AAAI Conference on Artificial Intelligence.
[103]
Yunqiang Li, Wenjie Pei, and Jan van Gemert. 2020. Push for quantization: Deep fisher hashing. In Proceedings of the British Machine Vision Conference.
[104]
Yunqiang Li and Jan van Gemert. 2021. Deep unsupervised image hashing by maximizing bit entropy. In Proceedings of the AAAI Conference on Artificial Intelligence. 2002–2010.
[105]
Kevin Lin, Jiwen Lu, Chu-Song Chen, and Jie Zhou. 2016. Learning compact binary descriptors with unsupervised deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1183–1192.
[106]
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition Workshops. 27–35.
[107]
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In Proceedings of the International Conference on Learning Representations.
[108]
Qinghong Lin, Xiaojun Chen, Qin Zhang, Shangxuan Tian, and Yudong Chen. 2021. Deep self-adaptive hashing for image retrieval. In Proceedings of the ACM International Conference on Information & Knowledge Management. 1028–1037.
[109]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision.740–755.
[110]
Bin Liu, Yue Cao, Mingsheng Long, Jianmin Wang, and Jingdong Wang. 2018. Deep triplet quantization. In Proceedings of the ACM International Conference on Multimedia. 755–763.
[111]
Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2064–2072.
[112]
Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. In Proceedings of the Conference on Neural Information Processing Systems, Vol. 27.
[113]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2074–2081.
[114]
Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with graphs. In Proceedings of the International Conference on Machine Learning.
[115]
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 212–220.
[116]
Xianglong Liu, Lei Huang, Cheng Deng, Bo Lang, and Dacheng Tao. 2016. Query-adaptive hash code ranking for large-scale multi-view visual search. IEEE Transactions on Image Processing 25, 10 (2016), 4514–4524.
[117]
Xingbo Liu, Xiushan Nie, and Yilong Yin. 2019. Mutual linear regression-based discrete hashing. arXiv:1904.00744. Retrieved from https://arxiv.org/abs/1904.00744.
[118]
Fuchen Long, Ting Yao, Qi Dai, Xinmei Tian, Jiebo Luo, and Tao Mei. 2018. Deep domain adaptation hashing with adversarial learning. In Proceedings of the International ACM SIGIR Conference on Research & Development in Information Retrieval. 725–734.
[119]
Xiao Luo, Zeyu Ma, Wei Cheng, and Minghua Deng. 2022. Improve deep unsupervised hashing via structural and intrinsic similarity learning. IEEE Signal Processing Letters 29 (2022), 602–606.
[120]
Xiao Luo, Daqing Wu, Chong Chen, Jinwen Ma, and Minghua Deng. 2021. Deep unsupervised hashing by global and local consistency. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1–6.
[121]
Xiao Luo, Daqing Wu, Zeyu Ma, Chong Chen, Minghua Deng, Jianqiang Huang, and Xian-Sheng Hua. 2021. A statistical approach to mining semantic similarity for deep unsupervised hashing. In Proceedings of the ACM International Conference on Multimedia. 4306–4314.
[122]
Xiao Luo, Daqing Wu, Zeyu Ma, Chong Chen, Huasong Zhong, Minghua Deng, Jianqiang Huang, and Xian-sheng Hua. 2021. CIMON: Towards high-quality hash codes. In Proceedings of the International Joint Conference on Artificial Intelligence.
[123]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In Proceedings of the International Conference on Very Large Data Bases. 950–961.
[124]
Lei Ma, Hongliang Li, Qingbo Wu, Chao Shang, and Kingngi Ngan. 2018. Multi-task learning for deep semantic hashing. In Proceedings of the IEEE Visual Communications and Image Processing. 1–4.
[125]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (2014), 61–68.
[126]
Yury A. Malkov and Dmitry A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824–836.
[127]
Rajeev Motwani, Assaf Naor, and Rina Panigrahi. 2006. Lower bounds on locality sensitive hashing. In Proceedings of the Annual Symposium on Computational Geometry. 154–157.
[128]
Marius Muja and David G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration.VISAPP (1) 2, 331–340 (2009), 2.
[129]
Andrew Ng. 2011. Sparse autoencoder. CS294A Lecture notes 72, 2011 (2011), 1–19.
[130]
Mohammad Norouzi, David J. Fleet, and Russ R. Salakhutdinov. 2012. Hamming distance metric learning. In Proceedings of the Conference on Neural Information Processing Systems. 1061–1069.
[131]
Ryan O’Donnell, Yi Wu, and Yuan Zhou. 2014. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Transactions on Computation Theory 6, 1 (2014), 1–13.
[132]
Qibing Qin, Lei Huang, Zhiqiang Wei, Kezhen Xie, and Wenfeng Zhang. 2021. Unsupervised deep multi-similarity hashing with semantic structure for image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 31, 7 (2021), 2852–2865.
[133]
Zhaofan Qiu, Yingwei Pan, Ting Yao, and Tao Mei. 2017. Deep semantic hashing with generative adversarial networks. In Proceedings of the International ACM SIGIR Conference on Research & Development in Information Retrieval. 225–234.
[134]
Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, and Changyou Chen. 2021. Unsupervised hashing with contrastive information bottleneck. In Proceedings of the International Joint Conference on Artificial Intelligence.
[135]
Alexandre Sablayrolles, Matthijs Douze, Nicolas Usunier, and Hervé Jégou. 2017. How should we evaluate supervised hashing? In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1732–1736.
[136]
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969–978.
[137]
Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2016. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems 28, 11 (2016), 2660–2673.
[138]
Robert E. Schapire. 2013. Explaining adaboost. In Proceedings of the Empirical Inference. 37–52.
[139]
Fumin Shen, Xin Gao, Li Liu, Yang Yang, and Heng Tao Shen. 2017. Deep asymmetric pairwise hashing. In Proceedings of the ACM International Conference on Multimedia. 1522–1530.
[140]
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 37–45.
[141]
Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, and Heng Tao Shen. 2018. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2018), 3034–3044.
[142]
Fumin Shen, Xiang Zhou, Yang Yang, Jingkuan Song, Heng Tao Shen, and Dacheng Tao. 2016. A fast optimization method for general binary code learning. IEEE Transactions on Image Processing 25, 12 (2016), 5610–5621.
[143]
Yuming Shen, Li Liu, and Ling Shao. 2019. Unsupervised binary representation learning with deep variational networks. International Journal of Computer Vision 127, 11 (2019), 1614–1628.
[144]
Yuming Shen, Jie Qin, Jiaxin Chen, Li Liu, Fan Zhu, and Ziyi Shen. 2019. Embarrassingly simple binary representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
[145]
Yuming Shen, Jie Qin, Jiaxin Chen, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, and Ling Shao. 2020. Auto-encoding twin-bottleneck hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2818–2827.
[146]
Yuming Shen, Jiaguo Yu, Haofeng Zhang, Philip H. S. Torr, and Menghan Wang. 2022. Learning to hash naturally sorts. In Proceedings of the International Joint Conference on Artificial Intelligence. arXiv:2201.13322. Retrieved from https://arxiv.org/abs/2201.13322.
[147]
Weiwei Shi, Yihong Gong, Badong Chen, and Xinhong Hei. 2021. Transductive semisupervised deep hashing. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–14. DOI:
[148]
Xiaoshuang Shi, Zhenhua Guo, Fuyong Xing, Yun Liang, and Lin Yang. 2020. Anchor-based self-ensembling for semi-supervised deep pairwise hashing. International Journal of Computer Vision 128, 8 (2020), 2307–2324.
[149]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.
[150]
Saurabh Singh, Abhinav Gupta, and Alexei A. Efros. 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision.73–86.
[151]
Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast hash table lookup using extended bloom filter: an aid to network processing. ACM SIGCOMM Computer Communication Review 35, 4 (2005), 181–192.
[152]
Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary generative adversarial networks for image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence.
[153]
Christoph Strecha, Alex Bronstein, Michael Bronstein, and Pascal Fua. 2011. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 1 (2011), 66–78.
[154]
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and policy considerations for deep learning in NLP. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 3645–3650.
[155]
Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. 2021. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7262–7272.
[156]
Shupeng Su, Chao Zhang, Kai Han, and Yonghong Tian. 2018. Greedy hash: Towards fast optimization for accurate hash coding in CNN. In Proceedings of the Conference on Neural Information Processing Systems. 798–807.
[157]
Qiaoyu Tan, Ninghao Liu, Xing Zhao, Hongxia Yang, Jingren Zhou, and Xia Hu. 2020. Learning to hash with graph neural networks for recommender systems. In Proceedings of the Web Conference. 1988–1998.
[158]
Rong-Cheng Tu, Xian-Ling Mao, Jia-Nan Guo, Wei Wei, and Heyan Huang. 2021. Partial-softmax loss based deep hashing. In Proceedings of the Web Conference. 2869–2878.
[159]
Rong-Cheng Tu, Xian-Ling Mao, and Wei Wei. 2020. MLS3RDUH: Deep unsupervised hashing via manifold based local semantic similarity structure reconstructing. In Proceedings of the International Joint Conference on Artificial Intelligence. 3466–3472.
[160]
Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. 2017. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5018–5027.
[161]
Guan’an Wang, Qinghao Hu, Jian Cheng, and Zengguang Hou. 2018. Semi-supervised generative adversarial hashing for image retrieval. In Proceedings of the European Conference on Computer Vision.469–485.
[162]
Guan’an Wang, Qinghao Hu, Yang Yang, Jian Cheng, and Zeng-Guang Hou. 2021. Adversarial binary mutual learning for semi-supervised deep hashing. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–15. DOI:
[163]
Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2015. Learning to hash for indexing big data—A survey. Proceedings of the IEEE 104, 1 (2015), 34–57.
[164]
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv:1408.2927. Retrieved from https://arxiv.org/abs/1408.2927.
[165]
Jingdong Wang, Ting Zhang, Nicu Sebe, and Heng Tao Shen. 2017. A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 769–790.
[166]
Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the International Conference on Machine Learning.
[167]
Wei Wang, Junyu Gao, Xiaoshan Yang, and Changsheng Xu. 2020. Learning coarse-to-fine graph neural networks for video-text retrieval. IEEE Transactions on Multimedia 23 (2020), 2386–2397.
[168]
Weiwei Wang, Yuming Shen, Haofeng Zhang, Yazhou Yao, and Li Liu. 2021. Set and rebase: Determining the semantic graph connectivity for unsupervised cross-modal hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 853–859.
[169]
Xiaofang Wang, Yi Shi, and Kris M. Kitani. 2016. Deep supervised hashing with triplet labels. In Proceedings of the Asian Conference on Computer Vision. 70–84.
[170]
Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–25.
[171]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Proceedings of the Conference on Neural Information Processing Systems. 1753–1760.
[172]
Dayan Wu, Qi Dai, Jing Liu, Bo Li, and Weiping Wang. 2019. Deep incremental hashing network for efficient image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9069–9077.
[173]
Rongkai Xia, Yan Pan, Hanjiang Lai, Cong Liu, and Shuicheng Yan. 2014. Supervised hashing for image retrieval via image representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
[174]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In Proceedings of the International Conference on Machine Learning. 478–487.
[175]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1492–1500.
[176]
Xinyu Yan, Lijun Zhang, and Wu-Jun Li. 2017. Semi-supervised deep hashing with a bipartite graph. In Proceedings of the International Joint Conference on Artificial Intelligence. 3238–3244.
[177]
Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, and Dacheng Tao. 2018. Semantic structure-based unsupervised deep hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 1064–1070.
[178]
Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise relationship guided deep hashing for cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence.
[179]
Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, and Dacheng Tao. 2019. Distillhash: Unsupervised deep hashing by distilling data pairs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2946–2955.
[180]
Huei-Fang Yang, Kevin Lin, and Chu-Song Chen. 2017. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2017), 437–451.
[181]
Zhan Yang, Osolo Ian Raymond, Wuqing Sun, and Jun Long. 2019. Asymmetric deep semantic quantization for image retrieval. IEEE Access 7 (2019), 72684–72695.
[182]
Zhan Yang, Osolo Ian Raymond, Wuqing Sun, and Jun Long. 2019. Deep attention-guided hashing. IEEE Access 7 (2019), 11209–11221.
[183]
Tao Yao, Zhiwang Zhang, Lianshan Yan, Jun Yue, and Qi Tian. 2019. Discrete robust supervised hashing for cross-modal retrieval. IEEE Access 7 (2019), 39806–39814.
[184]
Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In Proceedings of the AAAI Conference on Artificial Intelligence. 4626–4634.
[185]
Li Yuan, Tao Wang, Xiaopeng Zhang, Francis E. H. Tay, Zequn Jie, Wei Liu, and Jiashi Feng. 2020. Central similarity quantization for efficient image and video retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[186]
Haofeng Zhang, Li Liu, Yang Long, and Ling Shao. 2017. Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Transactions on Image Processing 27, 4 (2017), 1626–1638.
[187]
Jian Zhang and Yuxin Peng. 2017. SSDH: Semi-supervised deep hashing for large scale image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 29, 1 (2017), 212–225.
[188]
Lei Zhang, Ji Liu, Yang Yang, Fuxiang Huang, Feiping Nie, and David Zhang. 2019. Optimal projection guided transfer hashing for image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3788–3802.
[189]
Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing 24, 12 (2015), 4766–4779.
[190]
Ting Zhang, Chao Du, and Jingdong Wang. 2014. Composite quantization for approximate nearest neighbor search. In Proceedings of the IEEE International Conference on Multimedia and Expo.
[191]
Wanqian Zhang, Dayan Wu, Yu Zhou, Bo Li, Weiping Wang, and Dan Meng. 2020. Deep unsupervised hybrid-similarity hadamard hashing. In Proceedings of the ACM International Conference on Multimedia. 3274–3282.
[192]
Xi Zhang, Hanjiang Lai, and Jiashi Feng. 2018. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European Conference on Computer Vision.591–606.
[193]
Xueni Zhang, Lei Zhou, Xiao Bai, and Edwin Hancock. 2018. Deep supervised hashing with information loss. In Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition. 395–405.
[194]
Xueni Zhang, Lei Zhou, Xiao Bai, Xiushu Luan, Jie Luo, and Edwin R. Hancock. 2019. Deep supervised hashing using symmetric relative entropy. Pattern Recognition Letters 125 (2019), 677–683.
[195]
Ziming Zhang, Yuting Chen, and Venkatesh Saligrama. 2016. Efficient training of very deep neural networks for supervised hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1487–1495.
[196]
Zheng Zhang, Xiaofeng Zhu, Guangming Lu, and Yudong Zhang. 2021. Probability ordinal-preserving semantic hashing for large-scale image retrieval. ACM Transactions on Knowledge Discovery from Data 15, 3 (2021), 1–22.
[197]
Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1556–1564.
[198]
Chang Zhou, Lai-Man Po, Wilson Y. F. Yuen, Kwok Wai Cheung, Xuyuan Xu, Kin Wai Lau, Yuzhi Zhao, Mengyang Liu, and Peter H. W. Wong. 2019. Angular deep supervised hashing for image retrieval. IEEE Access 7 (2019), 127521–127532.
[199]
Joey Tianyi Zhou, Heng Zhao, Xi Peng, Meng Fang, Zheng Qin, and Rick Siow Mong Goh. 2018. Transfer hashing: From shallow to deep. IEEE Transactions on Neural Networks and Learning Systems 29, 12 (2018), 6191–6201.
[200]
Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence.
[201]
Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, and Tomasz Trzcinski. 2018. Bingan: Learning compact binary descriptors with a regularized gan. In Proceedings of the Conference on Neural Information Processing Systems.

Cited By

View all
  • (2025)Deep multi-similarity hashing via label-guided network for cross-modal retrievalNeurocomputing10.1016/j.neucom.2024.128830616(128830)Online publication date: Feb-2025
  • (2025)Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrievalInformation Processing & Management10.1016/j.ipm.2024.10395862:2(103958)Online publication date: Mar-2025
  • (2025)An efficient cross-view image fusion method based on selected state space and hashing for promoting urban perceptionInformation Fusion10.1016/j.inffus.2024.102737115(102737)Online publication date: Mar-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 1
January 2023
375 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3572846
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2023
Online AM: 27 April 2022
Accepted: 16 April 2022
Revised: 18 March 2022
Received: 13 November 2021
Published in TKDD Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate nearest neighbor search
  2. learning to hash
  3. top-k retrieval
  4. similarity preserving
  5. deep supervised hashing

Qualifiers

  • Tutorial

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5,302
  • Downloads (Last 6 weeks)749
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Deep multi-similarity hashing via label-guided network for cross-modal retrievalNeurocomputing10.1016/j.neucom.2024.128830616(128830)Online publication date: Feb-2025
  • (2025)Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrievalInformation Processing & Management10.1016/j.ipm.2024.10395862:2(103958)Online publication date: Mar-2025
  • (2025)An efficient cross-view image fusion method based on selected state space and hashing for promoting urban perceptionInformation Fusion10.1016/j.inffus.2024.102737115(102737)Online publication date: Mar-2025
  • (2025)Deep multi-negative supervised hashing for large-scale image retrievalExpert Systems with Applications10.1016/j.eswa.2024.125795264(125795)Online publication date: Mar-2025
  • (2024)CHAINS: CHAIN-Based Fusion Safety System Framework for Intelligent Connected VehicleCHAIN10.23919/CHAIN.2024.0000061:1(2-45)Online publication date: Mar-2024
  • (2024)Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365958020:10(1-19)Online publication date: 12-Sep-2024
  • (2024)Visual TuningACM Computing Surveys10.1145/365763256:12(1-38)Online publication date: 12-Apr-2024
  • (2024)Deep Neighborhood-aware Proxy Hashing with Uniform Distribution Constraint for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364363920:6(1-23)Online publication date: 8-Mar-2024
  • (2024)Hypergraph Hash Learning for Efficient Trajectory Similarity ComputationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679555(175-186)Online publication date: 21-Oct-2024
  • (2024)HARR: Learning Discriminative and High-Quality Hash Codes for Image RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362716220:5(1-23)Online publication date: 22-Jan-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media