Indexing Based On Scale Invariant Interest Points

Indexing based on scale invariant interest points
Krystian Mikolajczyk Cordelia Schmid
INRIA Rhône-Alpes GRAVIR-CNRS

655 av. de l’Europe, 38330 Montbonnot, France
Krystian.Mikolajczyk,Cordelia.Schmid @inrialpes.fr

Abstract ant descriptors which are combinations of Gaussian deriva-

This paper presents a new method for detecting scale in- tives. Robustness to scale changes is obtained by comput-
variant interest points. The method is based on two recent ing Gaussian derivatives at several scales. Lowe [8] extends
results on scale space: 1) Interest points can be adapted these ideas to scale invariance by maximizing the output
to scale and give repeatable results (geometrically stable). of difference-of-Gaussian filters in scale-space. Tuytelaars
2) Local extrema over scale of normalized derivatives in- et al. [13] have developed affine invariant descriptors by
dicate the presence of characteristic local structures. Our searching for affine invariant regions and describing them
method first computes a multi-scale representation for the by color invariants. To find these regions they simultane-
Harris interest point detector. We then select points at which ously use interest points and contours. Instead of using an
a local measure (the Laplacian) is maximal over scales. initial set of features, Chomat et al. [2] select the appropriate
This allows a selection of distinctive points for which the scale for every point in the image and compute descriptors
characteristic scale is known. These points are invariant to at these scales. An object is represented by the set of these
scale, rotation and translation as well as robust to illumina- descriptors. All of the above methods are limited to a scale
tion changes and limited changes of viewpoint. factor of 2.
For indexing, the image is characterized by a set of scale Similar approaches exist for wide-baseline matching [1,
invariant points; the scale associated with each point al- 3, 5, 9, 12]. The problem is however more restricted. Addi-
lows the computation of a scale invariant descriptor. Our tional constraints can be imposed and the search complexity
descriptors are, in addition, invariant to image rotation, is less prohibitive. For example, Prichett and Zisserman [9]
to affine illumination changes and robust to small perspec- first match regions bound by four line segments. They then
tive deformations. Experimental results for indexing show use corresponding regions to compute the homography and
an excellent performance up to a scale factor of 4 for a grow the regions. Such an approach is clearly difficult to ex-
database with more than 5000 images. tend to the problem of indexing. Two of the papers on wide-
baseline matching have specifically addressed the problem
1 Introduction of scale. Hansen et al. [5] present a method that uses cor-
relation of scale traces through multi-resolution images to
The difficulty in object indexing is to determine the iden- find correspondence between images. A scale trace is a set
tity of an object under arbitrary viewing conditions in the of values for a pixel at different scales of computation. Du-
presence of cluttered real-world scenes or occlusions. Lo- fournaud et al. [3] use a robust multi-scale framework to
cal characterization has shown to be well adapted to this match images. Interest points and descriptors are computed
problem. The small size of the characteristic regions makes at different scale levels. A robust homography based match-
them robust against occlusion and background changes. To ing algorithm allows to select the correct scale. These two
obtain robustness to changes of viewing conditions they approaches are not usable in the context of indexing, as im-
should also be invariant to image transformations. Recent age to image comparison is necessary. In the context of
methods for indexing differ in the type of invariants used. indexing we need discriminant features which can be ac-
Rotation invariants have been presented by [10], rotation cessed directly. Storage of several levels of scale is pro-
and scale invariants by [8] and affine invariants by [13]. hibitive, as it gives rise to additional mismatches and in-
Schmid and Mohr [10] extract a set of interest points creases the necessary storage space.
and characterize each of the points by rotationally invari-
In this papers we propose an approach which allows in-
dexing in the presence of scale changes up to a factor 4.

This work was supported by the French project RNRT AGIR.

The success of this method is based on a repeatable and Thus, for normalized derivatives we obtain:
discriminant point detector. The detector is based on two !#"$%$%$ 1&(
)F /"7 0$%$%$ '& G@

results on scale space: 1) Interest points can be adapted
to scale and give repeatable results [3]. 2) Local extrema We can see that the same values are obtained at corre-
over scale of normalized derivatives indicate the presence of sponding relative scales.
characteristic local structures [7]. The first step of our ap- To maintain uniform information change between suc-
proach is to compute interest points at several scale levels. cessive levels of resolution the scale factor must be dis-
We then select points at which a local measure (the Lapla- tributed exponentially. Let H be a function used to build the
cian) is maximal over scales. This allows to select a subset I
scale-space and normalized with respect to scale. The set of
responses for a point is then H!
I with ,I!*J ,K . ,K
of the points computed in scale space. For these points we
know their scale of computation, that is their characteris- is the initial scale factor at the finest level of resolution and
tic scale. Moreover, it allows to select the most distinctive ,I denotes successive levels of the scale-space representa-
points. Points are invariant to scale, rotation and translation with J the factor of scale change between successive
tion as well as robust to illumination changes and limited levels.
changes of viewpoint. This detector is the main contribution Characteristic scale. The properties of local characteristic
of this paper. We show that its repeatability is better than scales were extensively studied in [7]. The idea is to select a
the one of other approaches proposed in the literature and characteristic scale by searching for a local extremum over
therefore allows to obtain better indexing results. The sec- scales. Given a point in an image we compute the func-
ond contribution is the quality of our indexing and matching tion responses for several scale factors LI , see Figure 1.
results. The characteristic scale is the local maximum of the func-
Overview. This paper is organized as follows. In section 2 tion. Note that there might be several maxima, therefore
we introduce scale selection. In section 3 our scale invariant several characteristic scales. The characteristic scale is rel-
interest point detector is described and section 4 presents al- atively independent of the image scale. The ratio of the
gorithms for matching and indexing. Experimental results scales, at which the extrema were found for corresponding
are given in section 5. points in two rescaled images, is equal to the scale factor
between the images. Instead of detecting extrema we can
2. Scale selection also look for other easy recognizable signal shapes such as
zero-crossings of the second derivative.
In the following we briefly introduce the concept of
scale-space and show how to select the characteristic scale.
We then present experimental results for scale selection.
Scale-space. The scale-space representation is a set of im-
ages represented at different levels of resolutions [14]. Dif-
ferent levels of resolution are in general created by convolu-
tion with the Gaussian kernel:

with
the image and . We can represent a feature (i.e.
edges, corners) at different resolutions by applying the ap-
propriate function (combinations of derivatives) at different
scales. The amplitude of spatial derivatives, in general, de-
creases with scale. In the case of scale invariant forms, like Figure 1: The top row shows two images taken with dif-
step-edge, the derivatives should be constant over scales. ferent focal lengths. The bottom row shows the response
In order to maintain the property of scale invariance the H!0MI over scales where H is the normalized Laplacian
derivative function must be normalized with respect to the (cf. eq.2). The characteristic scales are at 10.1 and 3.89 for
scale of observation. The scale normalized derivative of the left and right image, respectively. The ratio corresponds
order is defined by: to the scale factor (2.5) between the two images.
!#"$%$%$ '&(
)*,+-./"0$%$%$ 1&(
)2,+( (/"3$%$%$ '&-4
5.6 Several derivative based functions H can be used to com-
Normalized derivatives behave nicely under scaling of pute a scale representation of an image. These functions
the intensity pattern. Consider two images and 87 imaged should be rotation invariant. Illumination invariance is less
at different scales. The relation between the two images is critical because we are looking for extrema. In the follow-
then defined by: 6 9: 7 7 09;-<=,>
=? 7 :@A . Image ing we present the differential expressions used for our ex-
derivatives are then related by: periments. Note that all expressions are scale normalized.
,+B (#"0$%$%$ '&(
5 C@D+B,+( (/"E$%$%$ '&-@
5.6 7 Square gradient LNONPQ
5RSNTU0
(1)
Laplacian ,N MP P
5RS T3T
(2) of a scale factor of 4.3. Results are averaged over several se-
Difference-of-Gaussian LI 6 ,I6 quences. The first row shows the function used. The second
shows the percentage of points for which a characteristic
(3)
Harris function
M ! trace N
scale is detected. We can observe that most points are de-
(4)
tected by the Laplacian. The percentage of correct points
with 0
) with respect to detected points is given in row three. The
N O NPQ
P T 0L Laplacian and the DOG obtain the highest percentage. The
P T 0
NTU0L last row shows the overall percentage of correct detection.
Experimental results. The scale selection technique based Most correct points are detected by the Laplacian. The per-
on local maxima has been evaluated for functions (1),(2),(3) centage is twice as high as for the gradient, and four times
and (4). The evaluation was conducted on several sequences higher than for the Harris function. Results are similar to
with scale changes. The characteristic scale was selected for those of the DOG which is not surprising as this function is
every point in the image. Figure 2 displays image points for very similar to the Laplacian.
which scale selection is possible (white and grey). Black
points are points for which the function (Laplacian) has no Laplacian DOG gradient Harris
maximum. Note that these points lie in homogeneous re- detected 46% 38% 30% 16%
gions and have no maximum in the range of considered correct/
scales. detected 29% 28% 22% 23%
The selected scale for a point is correct if the ratio correct 13.3% 10.6% 6.6% 3.4%
between characteristic scales in corresponding points is
equal to the scale factor between the images. Correspond- Table 1: Row 2: percentage of points for which a charac-
ing points are determined by projection with the estimated teristic scale is detected. Row 3: percentage of points for
transformation matrix. In the case of multiple scale max- which a correct scale is detected with respect to detected
ima, the point is considered correct, if one of the maxima points. Row 4: percentage of correct / total.
corresponds to the correct ratio. Points with correctly se-
We have observed that the performance degrades in the
lected scales are displayed in white (cf. Figure 2).
presence of large scale changes. This can be explained by
original scale=1.2, 80% the fixed search range of scales, which must be the same
for all images if we have no a priori knowledge about the
scale factor between the images. If the characteristic scale
found in a coarse resolution image is near the upper limit
of the scale range, the corresponding point at a finer scale
is likely to be too far from significant signal changes to be
detected in our scale limits. Our experiments shows, that
characteristic scale found by searching for extrema only in
scale=2.5, 35% scale=4.3, 16% the scale direction, are sensitive to this fact. Furthermore,
we cannot apply too large a range of scales as we lose the
local character, and the effect of image borders becomes too
important.
3. Scale invariant interest points

The previous section shows that using all points gives
unstable results. Feature points permit stabilizing the re-
sults.
Figure 2: Characteristic scale of points. Black–no charac-
teristic scale is detected. Gray–a characteristic scale is de- Existing methods search for maxima in the 3D representa-
tected. White–a characteristic scale is detected and is cor- tion of an image ( and
= ). A feature point represents
#"0 I$&%
rect. The scale of the images is given above the images and a local maximum in the surrounding 3D cube and its value
corresponds to
= !')( $&%+*!, . The scaled images were has to be higher than a certain threshold. In Figure 3 the
enlarged to increase the visibility. point is a feature point, if -.-H! LI 0/ H! .8% with
21435687 95!5 R:7; and H! ,I6 </ @ .
We can observe that only a small percentage of selected Lindeberg [7] searches for 3D maxima of the Laplacian, as
scales are correct for large scale factors. In table 1 we have well as the magnitude of the gradient and Lowe [8] uses the
compared results for different functions H in the presence difference-of-Gaussian.
location but on different levels. Our points are therefore
characteristic to the image plane and the scale dimension.
A comparative evaluation of different scale invariant in-
terest point detectors is presented in the following. We
compare the approaches of Lindeberg (Laplacian and gra-
dient), Lowe as well as our Harris-Laplacian detector. To
show the gain compared to the non-scale invariant method,
we also present the results of the standard Harris detector.
Figure 3: Searching for maxima in scale-space. The stability of detectors is evaluated using the repeatabil-
ity criteria introduced in [11]. The repeatability score is
Our approach does not use a single function to search in computed as a ratio between the number of point-to-point
3D, but uses the Harris function (cf. eq. 4) to localize points correspondences that can be established for detected points
number

>
N *)$3I " " where O N denotes the number
in 2D and then selects points for which the Laplacian attains and the mean of points detected in two images:
a maximum over scales. In the following, it is referred to as
+ + +
of corresponding couples and B
the Harris-Laplacian. N the numbers of de-
tected points in the images. Two points correspond if the
The Harris detector is used for 2D localization as it has
shown to be most reliable in the presence of image rota- error in relative location does not exceed 7 pixel in the
tion, illumination transformations and perspective deforma- coarse resolution image and the ratio of detected scales for
tions as shown in a comparative evaluation [11]. However, these points does not differ from the real scale ratio by more
the repeatability of this detector fails when the resolution than 20%. Figure 4 presents the repeatability score for the
of images changes significantly. In order to deal with such compared methods. The experiments were done on 10 se-
changes, the Harris detector has to be adapted to the scale quences of real images. Each sequence consists of scaled
factor [3]. Repeatability results for such an adapted ver- and rotated images for which the scale factor varies from 1.2
sion are excellent. The remaining problem is scale selec- up to 4.5. Best results are obtained for the Harris-Laplacian
tion. During our experiments we noticed that the adapted method. The results are 10% better than those of the second
Harris function rarely attains maxima in 3D space. If too best detector, the Laplacian.
few points are detected, the image representation is not ro-
bust. Therefore, we propose to use a different function, the
Laplacian, for scale maxima detection. We have seen in the
previous section that this function allows to find the highest
percentage of correct maxima.
Our detection algorithm works as follows. We first build
a scale-space representation for the Harris function. At each
level of the scale-space we detect interest points by detect-
ing the local maxima in the image plane:

H!0MI </ ! H ,I - 1

H!0,I /S@
where denotes the 8-neighbourhood of the point .
In order to obtain a more compact representation, we ver-
ify for each of the candidate points found on different levels Figure 4: Repeatability of interest point detectors with re-
if it forms a maximum in the scale direction. The Laplacian spect to scale changes.
is used for selection.
H!0MI / H!0,I , ?H!0MI / H!0,I , 4. Robust matching and indexing
H!,I6 </ @)% In the following we briefly describe our robust matching
Figure 5 shows the scale-space representation for two and indexing algorithms. The two algorithms are based on
real images with points detected by the Harris-Laplacian the same initial steps:
method. For these two images of the same object imaged at 1. Extraction of Harris-Laplacian interest points (cf. sec-
different scales we present for each scale level the selected tion 3).
points. There are many point-to-point correspondences be- 2. Computation of a descriptor for each point at its char-
tween the levels for which the scale ratio corresponds to the acteristic scale. Descriptors are invariant to image ro-
real scale change between the images (indicated by point- tation and affine illumination changes. They are robust
ers). Additionally, very few points are detected in the same to small perspective deformations.
1.92 s=1.2 s=2.4 s=4.8 s=9.6
s=1.2 s=2.4 s=4.8 s=9.6
Figure 5: Points detected on different resolution levels with the Harris-Laplacian method.
3. Comparison of descriptors based on the Mahalanobis trix. A model selection algorithm [6] can of course be used
distance. to automatically decide what transformation is the most ap-
Interest points. To extract interest points we have used propriate one.
a scale representation with 17 resolution levels. The initial Indexing. A voting algorithm is used to select the most
scale K is 1.5 and the factor J between two levels of resolu- similar images in the database. This makes retrieval robust
tion is 1.2. The parameter is set to 0.06 and the thresholds to mismatches as well as outliers. For each point of a query
@ and @ % are set to 1500 and 10, respectively. image, its descriptor is compared to the descriptors in the
Descriptors. Our descriptors are Gaussian derivatives database. If the distance is less than a fixed threshold , a vote
which are computed at the characteristic scale. Invariance is added to the corresponding database image. Note that a
to rotation is obtained by “steering” the derivatives in the point cannot vote several times for the same database im-
direction of the gradient [4]. To obtain a stable estimation age. The database image with the highest number of votes
of the gradient direction, we use the peak in a histogram of is the most similar one.
local gradient orientations. Invariance to the affine inten-
5. Experimental results
sity changes is obtained by dividing the derivatives by the
steered first derivative. Using up to 4th order derivatives, In the following, we validate our detection algorithm by
we obtain descriptors of dimension 12. matching and indexing results. Figure 6 illustrates the dif-
ferent steps of our matching algorithm. In this example the
Comparison of descriptors. The similarity of descriptors
two images are taken from the same viewpoint, but with a
is measured by the Mahalanobis distance. This distance
change in focal length and image orientation. The top row
requires the estimation of the covariance matrix which
shows the detected interest points. There are 190 and 213
encapsulates signal noise, variations in photometry, inaccu-
points detected in the left and right images, respectively.
racy of interest point location, and so forth. is estimated
The number of detected points is about equivalent to results
statistically over a large set of image samples.
obtained by a standard interest point detector. This clearly
Robust matching. To robustly match two images, we first shows the selectivity of our point detection method. If no
determine point-to-point correspondences. We select for scale peak selection had been used, more than 2000 points
each descriptor in the first image the most similar descrip- would be detected. The middle row shows the 58 matches
tor in the second image based on the Mahalanobis distance. obtained during the initial matching phase. The bottom row
If the distance is below a threshold the match is kept. This displays the 32 inliers to the estimated homography, all of
allows us to obtain a set of initial matches. A robust esti- which are correct. The estimated scale factor between the
mation of the transformation between the two images based two images is 4.9 and the estimated rotation angle is 19 de-
on RANdom SAmple Consensus (RANSAC) allows to re- grees.
ject inconsistent matches. For our experimental results the Figure 7 shows an example for a 3D scene where the
transformation is either a homography or a fundamental ma- fundamental matrix is used for verification. There are 180
and 176 detected points detected in the left and right im- database (second row) was correctly retrieved, that is it was
ages. The number of initial matches is 23 and there are 14 the most similar one. The approximate scale factor is given
inliers to the robustly estimated fundamental matrix, all of in row three. The changes between the image pairs (first and
them correct. Note that the images are taken from different second row) include important changes in the focal length,
viewpoints, the transformation includes a scale change, an for example 5.8 for the image pair (a). They also include
image rotation as well as a change in the viewing angle. The important changes in viewpoint, for example for pair (b).
building in the middle is almost half occluded. Furthermore, they include important illumination changes
(image pair (e)).
Extracted interest points
Initial points matches
Inliers to the estimated homography
Figure 6: Robust matching: there are 190 and 213 points

detected in the left and right images, respectively (top). 58
points are initially matched (middle). There are 32 inliers
to the estimated homography (bottom), all of which are cor-
rect. The estimated scale factor is and the estimated
rotation angle is 7 degrees.
Figure 7: Example of images taken from different view
In the following we show the results for retrieval from a points. There are 14 inliers to a robustly estimated funda-
database with more than 5000 images. The images in the mental matrix, all of them are correct. The estimated scale
database are extracted from 16 hours of video sequences factor is .
which include movies, sport events and news reports. Sim-
ilar images are excluded by taking one image per 300 The test sequences where used to systematically evalu-
frames. Furthermore, the database contains one image from ate the performance of retrieval. Results are shown in ta-
each of our 10 test sequences. The total number of descrip- ble 2. For each of the 10 test sequences, we have evaluated
tors in our database is 2539342. the performance at different scale factors (1.4 to 4.4). For
The second row of figure 8 shows five images of the test each scale factor, we have evaluated the percentage that the
sequences which are contained in the database. The top row corresponding image is the most similar one or among the
displays images for which the corresponding image in the five or ten most similar images. We can see that up to a
(a) 1/5.8 (b) 3.7 (c) 1/4.4 (d) 1/4.1 (e) 5.7
Figure 8: The first row shows some of the query images. The second row shows the most similar images in the database, all
of them are correct. The approximative scale factor between query image and database image is given in row three.
scale factor of 4.4, the performance is very good. At the References

scale of 4.4, 30% of the images are correctly retrieved, 50%
[1] A. Baumberg. Reliable feature matching across widely sep-
are among the 5 best matches and 70% are among the 10 arated views. In CVPR, pages 774–781, 2000.
best matches. These results were obtained with 12 dimen- [2] O. Chomat, V. C. de Verdière, D. Hall, and J. Crowley. Local
sional descriptors. If we use derivatives up to order 3, that is scale selection for Gaussian based description techniques. In
7 dimensional descriptors, the results degrade significantly. ECCV, pages 117–133, 2000.
This justifies using the fourth order derivatives. [3] Y. Dufournaud, C. Schmid, and R. Horaud. Matching im-
ages with different resolutions. In CVPR, pages 612–618,
# retrieved scale factor 2000.
1.4 1.8 2.4 2.8 3.4 4.4 [4] W. Freeman and E. Adelson. The design and use of steerable
filters. PAMI, 13(9):891–906, 1991.
1 60 60 60 50 30 30 [5] B. B. Hansen and B. S. Morse. Multiscale image registra-
5 100 90 60 80 50 50 tion using scale trace correlation. In CVPR, pages 202–208,
10 100 100 90 90 80 70 1999.
[6] K. Kanatani. Geometric information criterion for model se-
Table 2: Indexing results for our test sequences at different lection. IJCV, 26(3):171–189, 1998.
scale factors. The first row of the table gives the percentage [7] T. Lindeberg. Feature detection with automatic scale selec-
of correct retrieval, that is the corresponding image is re- tion. IJCV, 30(2):79–116, 1998.
[8] D. G. Lowe. Object recognition from local scale-invariant
trieved as the most similar one. The second/third row give
features. In ICCV, pages 1150–1157, 1999.
percentages that the corresponding image is among the 5/10 [9] P. Pritchett and A. Zisserman. Wide baseline stereo match-
most similar images. ing. In ICCV, pages 754–760, 1998.
[10] C. Schmid and R. Mohr. Local grayvalue invariants for im-
6. Conclusions and perspectives age retrieval. PAMI, 19(5):530–534, 1997.
[11] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and
We have presented an algorithm for interest point detec- evaluating interest points. In ICCV, pages 230–235, 1998.
tion that is invariant to important scale changes. A com- [12] D. Tell and S. Carlsson. Wide baseline point matching using
parison with existing detectors shows that our interest point affine invariants computed from intensity profiles. In ECCV,
detector gives better results. Experimental validation for pages 814–828, 2000.
matching and indexing was carried out on a significant [13] T. Tuytelaars and L. V. Gool. Content-based image retrieval
amount of data. Matching and indexing results are very based on local affinely invariant regions. In Visual99, pages
good up to a scale factor of 4. To our knowledge none of 493–500, 1999.
[14] A. Witkin. Scale-space filtering. In IJCAI, pages 1019–
the existing approach allows to deal with such scale factors
1023, 1983.
in the context of indexing. Furthermore, our approach is in-
variant to image rotation and translation as well as robust to
illumination changes and limited changes in viewpoint. Per-
formance could be further improved by using more robust
point descriptors. In our future research, we intend to focus
on the problem of affine invariance of point descriptors.

Indexing Based On Scale Invariant Interest Points

Uploaded by

Copyright:

Available Formats

Indexing Based On Scale Invariant Interest Points

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Indexing Based On Scale Invariant Interest Points

Uploaded by

Copyright:

Available Formats

Indexing based on scale invariant interest points

Krystian Mikolajczyk Cordelia Schmid

INRIA Rhône-Alpes GRAVIR-CNRS

Abstract ant descriptors which are combinations of Gaussian deriva-

This work was supported by the French project RNRT AGIR.

3. Scale invariant interest points

s=1.2 s=2.4 s=4.8 s=9.6

Initial points matches

Inliers to the estimated homography

Figure 6: Robust matching: there are 190 and 213 points

scale factor of 4.4, the performance is very good. At the References

You might also like