924
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
Relevance Feedback in Content-Based Image
Retrieval: Bayesian Framework, Feature Subspaces,
and Progressive Learning
Zhong Su, Hongjiang Zhang, Stan Li, and Shaoping Ma
Abstract—Research has been devoted in the past few years to relevance feedback as an effective solution to improve performance of
content-based image retrieval (CBIR). In this paper, we propose a
new feedback approach with progressive learning capability combined with a novel method for the feature subspace extraction. The
proposed approach is based on a Bayesian classifier and treats positive and negative feedback examples with different strategies. Positive examples are used to estimate a Gaussian distribution that
represents the desired images for a given query; while the negative examples are used to modify the ranking of the retrieved candidates. In addition, feature subspace is extracted and updated
during the feedback process using a Principal Component Analysis
(PCA) technique and based on user’s feedback. That is, in addition
to reducing the dimensionality of feature spaces, a proper subspace
for each type of features is obtained in the feedback process to further improve the retrieval accuracy. Experiments demonstrate that
the proposed method increases the retrieval speed, reduces the required memory and improves the retrieval accuracy significantly.
Index Terms—Bayesian estimation, content-based image retrieval, principal component analysis (PCA), relevance feedback
(RF).
I. INTRODUCTION
to balance the relative importance of different feature type and
there is no universal formula for all queries. The relevance feedback technique can be used to bridge the gap [3], [12], [13], [17],
[24]–[26].
Relevance feedback, originally developed for information
retrieval [16], is a supervised learning technique used to improve
the effectiveness of information retrieval systems. The main idea
of relevance feedback is using positive and negative examples
provided by the user to improve the system’s performance.
For a given query, the system first retrieves a list of ranked
images according to predefined similarity metrics, which are
often defined as the distance between feature vectors of images.
Then, the user selects a set of positive and/or negative examples
from the retrieved images, and the system subsequently refines
the query and retrieves a new list of images. The key issue is
how to incorporate positive and negative examples to refine
the query and how to adjust the similarity measure according
to the feedback.
The original relevance feedback method, in which the vector
model [1], [20], [21] is used for document retrieval, can be illustrated by the Rocchio’s formula [16] as
C
ONTENT-BASED image retrieval (CBIR) is a process to
find images similar in visual content to a given query from
an image database. It is usually performed based on a comparison of low level features, such as color, texture or shape
features, extracted from the images themselves. While there is
much research effort addressing content-based image retrieval
issues [1], [11], [19], the performance of content-based image
retrieval methods are still limited, especially in the two aspects
of retrieval accuracy and response time.
The limited retrieval accuracy is because of the big gap between semantic concepts and low-level image features, which
is the biggest problem in content-based image retrieval. For example, for different queries, different types of features have different significance; an issue is how to derive a weighting scheme
Manuscript received July 2, 2001; revised March 26, 2003. The associate editor coordinating the review of this manuscript and approving it for publication
was Dr. Christine Guillemot.
Z. Su is with the State Key Lab of Intelligent Tech. and Systems, Tsinghua
University, Beijing 100084, China (e-mail: suzhong_bj@hotmail.com).
H. Zhang is with the Microsoft Research Asia, Beijing 100080, China (e-mail:
hjzhang@microsoft.com).
S. Li is with the Microsoft Research Asia, 5F, Beijing Sigma Center, Beijing
100080, China (e-mail: szli@microsoft.com).
S. Ma is with the State Key Lab of Intelligent Tech. and Systems, Tsinghua
University, Beijing 100084, China (e-mail: msp@tsinghua.edu.cn).
Digital Object Identifier 10.1109/TIP.2003.815254
(1)
, and are suitable constants and
and
are
where
and
, respectively. That is,
the number of documents in
for a given initial query , and a set of relevant documents
and nonrelevant documents
given by the user, the optimal
new query, , is the one that is moved toward positive example
points and away from negative example points. This technique
is also implemented in many content-based image retrieval systems [9], [12]. Experiments show that the retrieval performance
can be improved considerably by using this approach. Generally speaking, previous relevance feedback methods can be
classified into two approaches: “the weighing approach” and
“the probability approach.” Most existing work in content-based
image retrieval uses the former approach.
The weighting method [9], [17], [19] associates larger
weights with more important dimensions and smaller weights
with less important ones. For example, [19] generalizes a
relevance feedback framework of the low-level feature-based
relevance feedback methods. An ideal query vector for each
1057-7149/03$17.00 © 2003 IEEE
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
feature is described by the weighted sum of all positive
feedback images as
(2)
is the
( is the length of feature ) training
where
sample matrix for feature obtained by stacking the posiinto a matrix. The element
tive feedback training vectors
represents the degree of relevance (to the
vector
query) of each of the positive feedback images, which can be
determined by the user at each feedback iteration. The system
then uses as the optimal query to evaluate the relevance of the
images in database. This strategy is widely used by many other
image retrieval and relevance feedback systems [9], [17], [19].
Bayesian estimation methods have been used in the probabilistic approaches to relevance feedback. Cox et al. [3], Vasconcelos and Lippman [26], Meilhac and Nastar [13] all used
Bayesian learning to incorporate user feedbacks to update the
probability distribution of all the images in the database. They
consider the feedback examples as a sequence of independent
queries and try to minimize the retrieval error by Bayesian rules.
That is, given a sequence of queries, they try to minimize the
probability of retrieval error as
(3)
is a sequence of queries (feedback examwhere
is a prior belief about the ability
ples) and
of the th image class to explain the queries.
Efforts have also been made to address the problem of slow
response time in content-based image retrieval, the problem
being caused mainly by the high dimensionality of the feature
space, typically hundreds to thousands. Ng and Sedighian [14]
made direct use of eigenimages, a method from face recognition
[10], to carry out the dimension reduction. Faloutsos and Lin
[6], Chandrasekaren et al. [2] and Brunelli and Mich [15] used
principal component analysis (PCA) to perform the dimension
reduction in feature spaces. Experimental results in these works
show that most real image feature sets can be considerably
reduced in dimension without significant degradation in retrieval quality. However, there are two problems with the use
of PCA in these works. Firstly, they adopted a fixed number
for the dimension size. This strategy is questionable because
for images of different complexity, the intrinsic dimensions
are usually different. Secondly, the subspaces are fixed once
the PCA is performed the first time and do not adapt to users’
subjectivity. Generally, this kind of blind dimension reduction
can be dangerous, since information can be lost if the reduction
is below the embedded dimension.
In this paper, we propose a new relevance feedback approach
[24], [25], which is based on a Bayesian classifier and treats
positive and negative feedback examples with different strategies. Not only can the retrieval performance be improved for
the current user, but the improvements can also help subsequent
users. Moreover, by applying the Principal Component Analysis
(PCA) technique, the feature subspace is extracted and updated
925
during the feedback process, so as to reduce the dimensionality
of feature spaces, reduce noise contained in the original feature
representation, and hence to define a proper subspace for each
type of feature as implied in the feedback. These are performed
according to positive feedbacks and hence consistently with the
subjective image content.
The proposed new feedback algorithm uses a Gaussian estimator to incorporate positive examples in refining retrieval
results. By assuming that all of the positive examples in one retrieval iteration belong to the same semantic class with common
semantic object(s) or meaning(s) and the features from that semantic class follow a Gaussian distribution, we use features
of all positive examples in a query iteration to calculate and
update the parameters of its corresponding semantic Gaussian
class. That is, the image retrieval becomes a process to estimate
Gaussian parameters of a semantic class and the query refinement becomes a process of updating the Gaussian parameters.
Then we use a Bayesian classifier to re-rank the images in the
database. This process is progressive so that every feedback can
have an impact on later retrieval processes. Experimental results
show that such a feedback approach improves the retrieval accuracy significantly compared with previous methods.
Another objective of the work presented in this paper is to
extract more effective, lower-dimensional features from the
originally given ones, by constructing proper feature subspaces
from the original spaces, to improve the retrieval performance
in terms of speed, storage and accuracy. In this paper, we
propose to use the PCA technique to derive a more effective,
efficient and compact feature representation supervised by the
relevance feedback process.
PCA is a statistical tool for data analysis [8]. It decorrelates
second order moments corresponding to low frequencies, and
identifies directions of principal variations in the data. We
incorporate PCA into the relevance feedback framework to
extract feature subspaces in order to represent the subjective
class implied in the positive feedback examples. This leads to
the following benefits: 1) whitening feature distributions so that
distance metrics can be defined more rationally; 2) reducing
possible noise contained in the original feature representation; 3) reducing dimensionality of feature spaces, and hence
4) defining a proper subspace for each type of feature, as
implied in the feedback. As mentioned earlier, different types
of features have different significance for different queries,
and their subspaces should have varying dimensionalities. A
procedure is provided to adjust the dimensionalities based on
the evidences provided by the feedback.
Benefits brought about by the PCA-based extraction of feature subspaces are much lower storage, much faster speed and
higher accuracy. Our analysis shows that the computational time
and memory requirements in online retrieval are linear with the
total feature dimension. When only about 30% of the original
total dimensions are used, which is the case in our system, only
30% of the memory is needed and the proposed method is three
times faster than of those methods without the PCA dimension
reduction. In other words, by applying PCA dimension reduction, the retrieval system can afford nine times more images.
Experiments on more than 10 000 Corel images show that such a
speed-up is achieved without sacrifice in the retrieval accuracy.
926
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
TABLE I
LOW LEVEL FEATURES USED IN THE EVALUATION
This paper is organized as follows. Section II describes our
method of relevance feedback based on Bayesian estimation.
Section III first introduces the PCA process, followed by a detailed presentation on the Bayesian relevance feedback algorithm in PCA feature subspaces. Also, a complexity analysis of
the algorithm is given in this section. The experimental results
are shown in Section IV. The concluding remarks will be given
in the final section.
II. RELEVENCE FEEDBACK BASED ON BAYESIAN ESTIMATION
In the proposed relevance feedback approach, positive and
negative feedback examples are incorporated in the query refinement process with different strategies. To incorporate positive feedback in refining image retrieval, we assume that all
of the positive examples in a feedback iteration belong to the
same semantic class whose features follow a Gaussian distribution. Features of all positive examples are used to calculate and
update the parameters of its corresponding semantic Gaussian
class and we use a Bayesian classifier to re-rank the images in
the database. To incorporate negative feedback examples, we
apply a penalty function in calculating the final ranking of an
image to the query image. That is, if an image is similar to a
negative example, its rank will be lowed depending on the degree of the similarity to the negative example. The details of
these two strategies are described in detail in this section.
A. Positive Feedbacks and Progressive Learning Process
The Gaussian density is often used for characterizing probability because of its computational tractability and the fact that
it adequately models a large number of cases. Consider a vector
that obeys a Gaussian distribution; then, the probability density function is
(4)
is the covariance matrix of feature and
where
is used to normalize the probability density function.
Let us define some notations before going further. Denote the
image database by and let be the total number of positive
feedback examples for query image . Suppose that
types
of features are used in the retrieval and so an image is repre, where
is the feature
sented by
vector in the -dimensional feature space for type features.
We assume that each type of feature is Gaussian distributed,
, where
(more generally
in (4)) is the
dimension covariance matrix and (more generally,
in (4)) is the dimensional mean vector. That is, we calculate
a Gaussian distribution for each type of feature used to represent images in the image database. We use a diagonal matrix
to represent the intra-component correlation, for simplicity. This is to assume that the feature components we use to
represent visual content of an image are orthogonal and independent to each other. The practical reason of this simplification is that the intra-component correlation cannot be estimated
practically and reliably, especially when there are not enough
feedback examples. We understand this is a strong assumption.
However, for the features we used in our system as listed in
Table I, this assumption is appropriate.
Our idea about relevance feedback is the following: It is
reasonable to assume that all the positive examples belong to
the class of images containing the desired object or semantic
meaning and the features of images belonging to the semantic
classes obey the Gaussian distribution. The parameters for a
semantic Gaussian class can be estimated using the feature
vectors of all the positive examples. Hence, the image retrieval
becomes a process of estimating the probability of belonging
to a semantic class and the query refinement by relevance
feedback becomes a process of updating the Gaussian distribution parameters. The log posterior probability that the
feature vector x belongs to the semantic class implied in the
positive examples is estimated using the following Bayesian
formulation:
(5)
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
where const are fixed in one feedback iteration. Assuming that
different feature types are independent of each other, the log
posterior probability is calculated as the sum of the individual
’s. Experiments show that the use of this posterior probability as the ranking metric improves the performance of relevance feedback in content-based image retrieval [17], [19], [26].
There are three parameters in the Bayesian estimator in (5):
and the probability of the semantic class
. They need
to be updated when more positive examples are provided by the
user through relevance feedback. Actually such a process could
be considered as the combination of two Gaussian classes. So
it is easy to get the updating process. Denote the current set
of positive examples by , in which each example is denoted
positive examples
by . In other words, there are totally
in the current feedback iteration. The updating of the Gaussian
parameters is performed as follows:
(6)
(7)
(8)
where and are the total number of positive examples accumulated before and after the current feedback iteration, respecrepresents the mean of the positive examples in
tively;
the current iteration.
In the current implementation, the probability of a semantic
is assumed equal for all semantic classes and reclass
mains constant in the relevance feedback process.
The following describes how positive feedback is performed.
1) Initialization of System:
1) Feature Normalization: This allows equal emphasis on all the feature components. For
the normalized vector is
,
where
and
.
satisfies the Gaussian distribution, it is easy to
If
being in the range of
prove that the probability of
is 99%.
(identity matrix) and
2) Initialization: Initialize
.
2) Retrieval and Feedback:
, and according to
1) Update the retrieval parameters
(6)–(8) using the information provided by the current set
of positive examples .
, its dis2) Distance Calculation: For each image
tance
is calculated using (5) in the retrieval after the
. That is, the similarity of each
feedback,
image in the database to the refined query is determined
by (5) based on the positive examples.
3) Sorting by distances and provide the new ranking list to
the user.
927
Fig. 1. The left figure shows the contours of the Bayesian classifier function
in 2-D space, where “c” is the class distribution center. Right-hand side shows
the effect of considering the negative examples as belonging to the same
distribution. The contours of the classifier turn out to be a set of hyperbola in
2-D space, where “n” is the center of the negative examples.
Our feedback algorithm is incremental. This has the following advantages: Not only can the retrieval performance be
improved for the current user, but the improvements can also
help the subsequent users. The parameters of the Gaussian
semantic class corresponding to each query will be refined by
its positive examples according to (6)–(8), and could be used
for other users. The updating process is an online process, so
the computational complexity is quite important. However, the
updating process only needs to deal with the current positive
examples, so the computation cost can be neglected compared
to the whole retrieval process.
B. Negative Feedbacks: “Dibbling” Process
Most methods previously described in the literature apply the
same methodology to negative and positive examples, based on
the assumption that the negative examples have the same feature distribution as the positive ones; or otherwise ignore the
negative examples completely in the feedback process. In our
opinion, negative examples should be treated differently. Positive examples are usually considered to belong to the same
semantic class. However, according to many experiment observations, negative examples are often not semantically related.
Moreover, as we have discussed above, the features of images
from a certain semantic class can be considered as Gaussian distribution. If we consider the negative examples to be a Gaussian
distribution as well and use the difference between the distance
to the positive center and that to the negative one as the final
distance measure[17], the contours of the classifier will change
from closed ellipses to open hyperboloids in feature space, and
this will induce a conflict between the original assumptions of
the data distribution. Fig. 1 shows such a change in 2-D space.
Negative examples are often isolated and independent. Thus
they need to be treated differently from the positive examples.
We apply the following method to deal with the negative examples. We only “punish” those images in the database that
are very near to the negative samples and do not let the negative samples influence the other images. Under this strategy,
we penalize images near the negative examples by increasing
their distance to the query. Such a penalty function seems like
a ‘dibbling’ process in feature space. By extensive simulation,
we have found that the penalty function can be approximated by
928
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
Fig. 2. Flow chart of the whole feedback process.
a Gaussian function for each negative example. Denote the current set of negative examples by for each image
as before, the distance punish function is defined as
(9)
is the Gaussian function whose parameters are dewhere
is the distance between image
termined experimentally; and
and a negative example,
. can be calculated using
the Euclidian distance between the feature vectors of image
and the negative example.
sums up penalty contributions
from all negative examples to the image . That is, if an image
in the database is close to all negative examples, the penalty is
high; in contrast, if the image is far away from all negative examples, the penalty function will decreased to zero, according
to the Gaussian distribution.
So the distance after negative feedback is defined as
(10)
III. BAYESIAN RELEVENCE FEEDBACK
FEATURE SUBSPACES
IN THE
PCA
The other major contribution in the proposed relevance feedback approach to content-based image retrieval is to apply the
principal component analysis technique to select and updated a
proper feature subspace during the feedback process. This algorithm extracts more effective, lower-dimensional features from
the originally given ones, by constructing proper feature subspaces from the original spaces, to improve the retrieval performance in terms of speed, storage requirement and accuracy. In
this section, we first present the PCA algorithm, followed by a
detailed description of how we apply PCA in relevance feedback in content-based image retrieval.
A. Principal Component Analysis
-dimensional vectors
ensemble of
whose distribution is centered at
. The covariance between each pair of
the origin
variable is
, where E
can be arranged
is the expectation operator. The parameters
covariance matrix
to form the
Consider
an
(11)
That is, if an image in the database is close to negative examas defined
ples, its distance to the query is increased by
in (10).
Fig. 2 shows the whole feedback process of using both the
positive and negative ones.
Assuming
composition,
matrices
, then by applying eigenvector decan be decomposed into the product of three
(12)
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
929
Fig. 3. Retrieval and feedback interface of MiAlbum. Example image is shown
on the bottom left.
Fig. 6.
Retrieval accuracy for top 100 results in original feature space.
onto the eigenvector basis (without dimension reduction), obtaining the coordinates , which is equivalent to rotating the feature basis; then rescale the coordinates by the factor of
to obtain the whitened feature vector and
. After the
whitening, we are able to calculate the Mahalanobis distance
and
in the original feature space by the simple
between
Euclidean distance between the corresponding and , in the
whitened feature space, i.e.,
(13)
Fig. 4. Retrieval accuracy for top ten results in original feature space.
If we only select the first eigenvectors as the orthonormal
basis vectors to form a subspace
, then any
vector in the original space can be linearly transformed to
with the new representation
(14)
An approximation to the original
the projection as
reconstruction error is
can be reconstructed from
. The mean squared
(15)
Fig. 5.
where
Retrieval accuracy for top 20 results in original feature space.
are the eigenvalues and
are the corresponding eigenvectors. W is orthogonal in that
. So the columns of
form a new orthogonal basis that is a linear transformation from of the original
basis.
The eigenvector decomposition can be used to whiten the feature distributions as follows. Project the original feature vectors
We can choose the set of eigenvectors used for the reconstruction to minimize this: Sort eigen values in descending order so
, where
; this also sorts the corresponding
that
eigenvectors in the descending order of their significance. The
can thus be minimized.
mean square reconstruction error
There are two advantages of using PCA: 1) dimension reand is represented by the
duction is achieved when
projected coefficients; 2) noise reduction is achieved because as
indicated in our experiments that high frequency components
corresponding the smallest eigenvalues often correspond noises
in image retrieval applications; thus, removing these high frequency components in PCA process will reduce the noise in
retrieval.
An issue should be mentioned here: Addition of new images
may require re-learning of the PCA basis vectors. For retrieval
systems that entirely rely on low-level image features, adding
new images simply involves extracting various feature vectors
930
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
TABLE II
RETRIEVAL ACCURACY AFTER NEGATIVE FEEDBACK
Fig. 7. Accumulate eigenvalues of individual low-level features after PCA process.
for the set of new images. However, the insertion of new images
may need some processing in the approach proposed here. This
is because PCA subspaces need to be re-learned if new images
significantly change the covariance matrix.
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
931
TABLE II
RETRIEVAL ACCURACY AFTER NEGATIVE FEEDBACK
B. Relevance Feedback in PCA Feature Subspaces
As described in the last subsection, PCA can be used to reduce
the dimensionality of the feature space, and to extract the principal lower-dimensional subspace of the original feature space.
A reasonable deduction in dimensionality causes little decrease
in performance. This is especially true in content-based image
retrieval since the components removed from the original image
feature space often correspond to noise. According to our experimental results on a large amount of data, dropping 80% of the
feature dimensions leads to only about 5% reconstruction error;
dropping 90% dimensions gives only about 10% reconstruction
error. Yet the retrieval speed has been improved significantly as
a result of such dimension reductions.
A key issue is how to determine the intrinsic dimensionality
of the feature subspace for each feature, given that the rate of
dimension reduction should vary for different types of features.
We propose to use the following idea to perform the PCA in the
feedback framework.
In the first round of retrieval, we use only a few significant
components for each feature. That is, for each feature type ,
most significant eigenvectors for the type
select the first
feature, where
is quite small comparing with its original feature dimension . The initial value of
of the type of feature
where is the dimension reis calculated by:
. In other words, for simplicity, the
duction rate; e.g.,
PCA process in our implementation is performed for each feature type, instead of all features together. It will be a much more
complicated process to perform the PCA process for all types
of features concatenated together. Also, as the Gaussian class
discussed in Section 2 is also defined for each type of feature, it
is much easier to embed the PCA process in the relevance feedback and retrieval process. Therefore, instead of doing PCA for
all features together, we introduce the following scheme for the
determination of the dimensionality of feature subspaces for the
subsequent iterations after initialization.
Let the dimension reduction rate initially set to be the same,
, for all features before the relevance feedback bee.g.,
gins. As iterative relevance feedback continues, the dimensions
to be retained for good feature types are increased, while the
number of dimensions for bad feature types is decreased. This
according to a goodness measure for a
is achieved by adjust
feature type as follows.
be an image in the
As before, let be the query,
image database and the current set of positive examples be ,
then the evaluation measurement can be defined as follows:
(16)
where
refers to the distance between two features in
the PCA subspace. We can see from the above equation that
the good feature will have small distances from its positive
examples to the query class center compared to other images
in databases.
for each feature type , we sort the
in
After calculating
. This rank can reflect
descending order to get the rank of
the effectiveness of the feature in the current retrieval session
relative to the others. It will be high for a good feature type and
low for a bad one. So we can update the number of dimensions,
of feature type based on the calculated rank value as
(17)
where is a constant factor, is the floor operation, and is
the total number of feature types and
. From the above
equation we can see that for a good feature, its corresponding
will be increased so that more dimensionality could be availed
of in the next retrieval iteration. The choice of depends on the
number of feature types, the original dimension of each feature
type, and the desired increment scale. In our experiment, we set
to 1 and
. That is, if the feature type is ranked higher
than 3, then its dimension number will be increased; otherwise,
it will be lowered.
The following describes the PCA embedded algorithm.
1) Initialization of System: For each feature type
, do the following:
1) Perform PCA on all the images in the original feature
space S , obtaining the eigenvalues and the corresponding
eigenvectors calculated by (11) and (12). Sort the eigenvectors in the order of descending eigenvalues.
2) Whiten the distribution of the feature vectors by first rotating the feature space and then dividing the coordinate
by the square root of the corresponding eigenvalues (refer
to Section 3.1). Such pre-processing does not lose any information because the dimensionalities are not reduced in
the process.
and as before.
3) Initialize the parameters of
The following describes how to update the retrieval and feedback process.
2) Retrieval and Feedback:
1) Update the parameter of the Gaussian class according to
(6)–(8).
according to (16), where
,
2) Calculate
in decreasing order to get the rank of
sorting the
.
according (17).
3) Update the
932
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
Fig. 7. Accumulate eigenvalues of individual low-level features after PCA process.
4) Calculate the distance between all images in the database
and the current query in the
dimensional feature
subspace.
5) Sort by distances and provide the new ranking list to the
user.
The differences between the current retrieval and relevance
feedback method and that described in Section II are the
following.
1) The retrieval and feedback are performed based on the
features in the extracted feature subspaces. For feature
dimensions to perform
type , we only use the first
the retrieval and feedback process.
2) In addition to updating the parameters of Gaussian class
as before (this costs little because the number of positive
are also updated so that
examples is usually small), the
better features will have more impact than bad ones.
In this way, the system learns from the user’s feedback in
a feature-based manner. The dimensionalities of the individual
feature subspaces are adjusted dynamically according to the retrieval performance of the corresponding features in the current
feedback session.
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
Fig. 8.
933
Comparing the retrieval accuracy of top 20 results in PCA feature space between the dimension updating and no dimension updating method.
C. Complexity Analysis of the PCA Subspace Relevance
Feedback Algorithm
In this subsection, we briefly analyze the complexity of the
proposed PCA subspace relevance feedback method and show
that the proposed method can save memory and speed up the
computation significantly.
The requirements for memory consist of two parts: the basis
vectors and the feature coordinates in the feature subspaces. The
size of the former part is fixed regardless of the database size
and hence can be ignored when the database is large. Therefore,
the memory requirement depends on the total number of images
in the database, which in turn depends, linearly, on the total dimension of the feature vectors, as we need to load the feature
vector of each image in the database, at least one at a time, when
computing the similarities. The computation process of covariin The PCA learning is very time consuming,
ance matrix of
, where
is the
which is O
feature dimensions of feature . Fortunately, this process is an
offline procedure.
The biggest bottleneck of online retrieval is the distance calculation. The computation complexity of distance measure is
where is the number of the vector dimension. For exmultiplication and
ample, the Euclidean distance needs
addition operation, which is the same as our method.
Therefore, the total computation complexity in the online retrieval is
Therefore, the time and memory costs the relevance feedback
process are linear with the total feature dimension. That is, if we
% of the original total dimensions, only
use only about
10% of the memory is required and the speed of a PCA method
will be nine times faster than methods without PCA dimension
reduction. This clearly shows the advantage of using PCA in the
relevance feedback process.
IV. EXPERIMENTAL RESULTS
A. Test Data and Features
The image set we used in our evaluation is the Corel Image
Gallery. 10 000 images of 79 semantic categories are selected
to calculate the performance statistics, of which 80% images
are used for learning PCA basis vectors and 20% for testing.
Whether a retrieved image is correct or incorrect is judged
according to the ground truth class.
Three types of color features and three types of texture features are used in our system, as shown in Table I. The total
dimension is 435.
B. Retrieval Interface
The MiAlbum image retrieval system implements the framework discussed in this paper. It is an image retrieval system
for PC users. In this paper, we only focus on retrieval based
on low-level features. So search by example is the interaction
mode we focus on here. However, the MiAlbum system also supports other two modes of interaction: keyword-based search and
browsing the entire image database.
934
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
Fig. 9. Comparing the retrieval accuracy in top ten results between the original feature space and PCA feature space.
Fig. 10.
Comparing the retrieval accuracy in top 20 results between the original feature space and PCA feature space.
The main user interface is shown in Fig. 3. It has a simple
interface to search images by examples. Users can select the example image from the database or from the file system. Results
are returned as a ranked list show in the right image view. The
ranking sequence is from left to right and from top to bottom.
During the retrieval process user can provide relevance feedback by clicking the “Yes” or “No” icon below the results. As
we can see from Fig. 3, there is an Option Dialog in the main
interface. The user can select the features and methods, as well
as the choice between original features space and PCA feature
space.
C. Experimental Results
The feedback process in the evaluation is carried out as follows. Given a query example from the test set, one different
test image of the same category as the query is used in each
round of feedback iteration as the positive example for updating
the Gaussian parameters. For the negative feedback process, the
first two irrelevant images are assigned as negative examples.
We only pick one positive example at each iteration because
we want to illustrate the worst case scenarios in which the semantic representation power of image features is low such that
the positive examples at each iteration is rare. Also, we only
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
Fig. 11.
935
Comparing the retrieval accuracy in top 100 results between the original feature space and PCA feature space.
consider two negative examples at each iteration because we assume users of image retrieval systems are impatient in general
and not willing to pick up more negative examples.
The accuracy is defined as
(18)
A number of experiments have been performed as follows.
For all the experiments, the accuracy numbers are averaged over
all test queries, which is totally 20% of the image database of
20 000 images.
Firstly, our Bayesian feedback scheme is compared with
previous feedback approaches presented by Nuno [26] and
Rui [17], [19]. This comparison is done in the original feature space. Figs. 4, 5 and 6 show that the accuracy of our
Bayesian feedback method becomes higher than the other
two methods after two feedback iterations. This demonstrates
that the incorporated Bayesian estimation with the Gaussian
parameter-updating scheme can improve the performance of
image retrieval with relevance feedback.
Table II shows the experimental results after one iteration of
negative feedback. Clearly, our strategy on negative feedback
increases the retrieval accuracy much more than the previous
method [17].
Fig. 7 demonstrates the fidelity of the reconstruction, which
(see Section III-A), as a function of
can be defined as
retained dimension , for the six types of features. It is shown
that significant dimension reductions cause little reconstruction
errors, for all the six cases: Dropping 80% of the dimensions
causes only about 5% reconstruction error; cutting off 90% of
the dimensions causes about 10% of loss. The discarded information can be due to noise because it corresponds to the least
significant or high frequency components. However, keeping
the reconstruction fidelity fixed, the minimum dimension for
an individual subspace can vary when the type of the features
is different, as can be seen from the differences of the six plots.
Also, the dimensions should be adjusted dynamically according
to the evidence gathered thus far. Therefore, the PCA dimension
reduction must be completed by an appropriate method for the
dynamic adjustment of subspace dimensions in order to achieve
significant reduction with little loss of accuracy.
Next, we compare the results obtained with and without the
dynamic adjustment of the dimensionalities of feature subspaces. As we can see from Fig. 8, the retrieval accuracy can be
improved by the dynamic adjustment scheme, especially when
the feature dimensions are significantly reduced, e.g., lower
than 30% of the original dimensions. As the final dimension
resulted from each query image after the dynamic dimension
updating varies according to (17), the final dimension numbers
listed in Fig. 8 are for references.
Finally, the accuracy after PCA reduction is compared
with that obtained in the original feature space. The results in
Figs. 9–11 show that the accuracy can be maintained when
the retained total dimension is 30% of the original or above;
the benefit brought about by this reduction is that the retrieval
speed is tripled and that the memory requirement is one third
of the original.
V. CONCLUSION
In this paper, we have presented a new relevance feedback approach to content-based image retrieval by integrating a feature
subspace extraction process into a Bayesian feedback process.
Principal Component Analysis is used to reduce feature subspace dimensionalities. When multiple types of features (e.g.,
color and texture) are used, a method is proposed to adjust the
936
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 8, AUGUST 2003
subspace dimensionality for each type of feature, based on evidence obtained, to account for differences between individual
feature subspaces as reflected in the recent feedback. This dynamic dimension adjusting method is especially effective when
the feature dimensions are significantly reduced, e.g., lower than
30% of the original dimensions. The feedback process plays two
roles: providing information for updating the Gaussian parameters in the Bayesian feedback, and providing evidence for the
adjustment of feature subspace dimensionalities. Experimental
results show that the proposed method can significantly improve
the retrieval performance in speed, memory and the accuracy. In
principle, the proposed feature subspace extraction method can
be incorporated in any other content-based retrieval methods to
save memory and to speed-up computation.
[16] J. J. Rocchio Jr., “Relevance feedback in information retrieval,” in The
SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1971, pp.
313–323.
[17] Y. Rui and T. S. Huang, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Circuits Syst. Video Technol.,
vol. 8, no. 5, 1999.
[18] Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval
with relevance feedback in MARS,” Proc. IEEE Int. Conf. Image Processing, 1997.
[19] Y. Rui and T. S. Huang, “A novel relevance feedback technique in image
retrieval,” ACM Multimedia, 1999.
[20] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.
[21] W. M. Shaw, “Term-relevance computation and perfect retrieval performance,” Inform. Process. Manage, vol. 31, pp. 491–498, 1995.
[22] G. Sheikholeslami, W. Chang, and A. Zhang, “Semantic clustering and
querying on heterogeneous features for visual data,” in Proc. 6th ACM
Int. Multimedia Conf. (ACM MULTIMEDIA’98), Bristol, U.K., Sept.
1998.
[23] H. S. Stone and C. S. Li, “Image matching by means of intensity and
texture matching in the Fourier domain,” Proc. IEEE Int. Conf. Image
Processing, Oct. 1997.
[24] Z. Su, H. Zhang, and S. Ma, “Using Bayesian classifier in relevant feedback of image retrieval,” in IEEE Int. Conf. Tools with Artificial Intelligence, Vancouver, BC, Canada, Nov. 2000.
, “Relevant feedback using a Bayesian classifier in content-based
[25]
image retrieval,” in Proc. SPIE Electronic Imaging 2001, San Jose, CA,
Jan. 2001.
[26] N. Vasconcelos and A. Lippman, “Learning from user feedback in image
retrieval systems,” in Proc. NIPS’99, Denver, CO, 1999.
[27] L. Zhu and A. Zhang, “Supporting multi-example image queries in
image databases,” in IEEE Int. Conf. Multimedia and Expo, New York,
July.
ACKNOWLEDGMENT
The work presented in this paper was performed at Microsoft
Research Asia. The authors thank Y. Sun and C. Hu of MSRC
Software Engineer Group for their great help in system work.
They also thank J. Shen, M. Li, and W.-Y. Liu, who have gave
them valuable comments and suggestions for this paper.
REFERENCES
[1] C. Buckley and G. Salton, “Optimization of relevance feedback
weights,” in Proc. SIGIR’95.
[2] S. Chandrasekaran et al., “An eigenspace update algorithm for image
analysis,” CVGIP: Graph. Models Image Process. J., 1997.
[3] I. J. Cox, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The
Bayesian image retrieval system, pichunter: Theory, implementation,
and psychophysical experiments,” IEEE Trans. Image Processing—Special Issue on Digital Libraries, 2000.
[4] I. Diamantaras and S. Y. Kung, Principal Component Neural Networks,
Theory and Applications. New York: Wiley, 1996.
[5] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York, NY: Wiley, 1973.
[6] C. Faloutsos and K. Lin, “Fastmap: A fast algorithm for indexing,
data-mining and visualization of traditional and multimedia,” in Proc.
SIGMOD, 1995, pp. 163–174.
[7] M. Flickner et al., “Query by image and video content: The QBIC
system,” Computer, vol. 28, pp. 23–32, 1995.
[8] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd
ed. New York: Academic, 1990.
[9] Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Query
databases through multiple examples,” in Proc. 24th VLDB Conf., New
York, 1998.
[10] M. Kirby and L. Sirovich, “Application of the Karhunen-Loeve procedure for the characterization of human faces,” IEEE Trans. Pattern Anal.
Machine Intell., vol. 12, pp. 103–108, Jan. 1990.
[11] C. Lee, W. Y. Ma, and H. J. Zhang, “Information Embedding Based on
User’s Relevance Feedback for Image Retrieval,” HP Labs, Tech. Rep.,
1998.
[12] Y. Lu, C. Hu, X. Zhu, H. Zhang, and Q. Yang, “A unified framework
for semantics and feature based relevance feedback in image retrieval
systems,” in Proc. 8th ACM Multimedia Int. Conf., Los Angeles, CA,
Nov. 2000.
[13] C. Meilhac and C. Nastar, “Relevance feedback and category search in
image databases,” in IEEE Int. Conf. Multimedia Computing and Systems, 1999.
[14] R. Ng and A. Sedighian, “Evaluating multi-dimensional indexing structures for images transformed by principal component analysis,” in Proc.
SPIE Storage and Retrieval for Image and Video Databases, 1996.
[15] B. Roberto and M. Ornella, “Image retrieval by examples,” IEEE Trans.
Multimedia, vol. 2, Sept. 2000.
Zhong Su received the B.E. and Ph.D. degrees
from the Department of Computer Science and
Technology, Tsinghua University, China, in 1998
and 2002, respectively.
During 1999–2001, he was a Research Intern at
the Media Computing Group, Microsoft Research
Asia, Beijing, China. He is now a Research Staff at
IBM China Research Lab. His currently research interests include knowledge management, information
retrieval, and pattern recognition.
Hongjiang Zhang received the Ph.D. degree from
the Technical University of Denmark and the B.S. degree from Zhengzhou University, China, both in electrical engineering, in 1982 and 1991, respectively.
From 1992 to 1995, he was with the Institute of
Systems Science, National University of Singapore,
where he led several projects in video and image content analysis and retrieval and computer vision. He
also worked at MIT Media Laboratory in 1994 as a
Visiting Researcher. From 1995 to 1999, he was a
Research Manager at Hewlett-Packard Labs, where
he was responsible for research and technology transfers in the areas of multimedia management; intelligent image processing, and Internet media. In 1999,
he joined Microsoft Research Asia, Beijing, China, where he is currently a Senior Researcher and Assistant Managing Director in charge of media computing
and information processing research. His current research interests are in the
area of image and video processing, content-based media retrieval, and computer vision. He has published three books and over 240 refereed papers, and
has also filed over 40 patents.
Dr. Zhang is a a member of ACM. He currently serves on the editorial
boards of five IEEE and ACM journals and committees of a dozen international
conferences.
SU et al.: RELEVANCE FEEDBACK IN CONTENT-BASED IMAGE RETRIEVAL
Stan Li received the B.Eng. degree from Hunan University, the M.Eng. degree from National University
of Defense Technology, and the Ph.D. degree from
Surrey University where he also worked as a research
fellow. All the degrees are in electrical and electronic
engineering.
He is a Researcher at Microsoft Research Asia,
Beijing, China. He joint Microsoft in May 2000
from his post as an Associate Professor of Nanyang
Technological University Singapore. His current
research interest is in pattern recognition and
machine learning, image analysis, and face technologies. He has published two
books, including Markov Random Field Modeling in Image Analysis (Berlin,
Germany: Springer-Verlag, 2nd ed., 2001), and over 100 refereed papers and
book chapters in these areas. He currently serves on the editorial board of
Pattern Recognition.
Dr. Li is on the program committees of various international conferences.
937
Shaoping Ma received the B.E., M.E., and Ph.D. degrees from the Department of Computer Science and
Technology, Tsinghua University, Beijing, China, in
1982, 1984, and 1997, respectively.
He is a Professor at the Department of Computer
Science and Technology, Tsinghua University.
His current research interests include knowledge
engineering, information retrieval, and pattern
recognition. He has published two books and more
than 50 refereed papers in these areas. Currently,
he is the Deputy Director of the State Key Lab of
Intelligent Technology and Systems, Director of Machine Learning Institute of
China, and Assistant Director of Beijing Computer Institute.