Hierarchical Graph-Based Segmentation in Detection of Object-Related Regions

Rafael Machado Ribeiro¹⁷,
Silvio Jamil Ferzoli Guimarães¹⁷ &
Zenilton Kleber G. Patrocínio Jr.¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11401))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2260 Accesses

Abstract

Object detection is an important task in computer vision. Recently, several unsupervised approaches have been proposed to cope with this problem in a category-independent manner. This work evaluates the adoption of a hierarchical graph-based segmentation along with an state-of-the-art method to detect object-related regions. A hierarchical segmentation approach produces a set of partitions at different detail levels, in a way that a coarser level can be obtained by a simple merge of finer ones. Experimental results show that our proposal obtains an increase of 11% in object detection rate.

This work received funding PUC Minas, CAPES (PVE 125000/2014-00), FAPEMIG (PPM 00006-16), and CNPq (Universal 421521/2016-3 and PQ 307062/2016-3).

You have full access to this open access chapter, Download conference paper PDF

Object Detection in Images Based on Homogeneous Region Segmentation

A Graph-Based Hierarchical Image Segmentation Method Based on a Statistical Merging Predicate

Multi-scale region composition of hierarchical image segmentation

Article 29 August 2020

Keywords

1 Introduction

Recently, several models have been proposed to identify some categories of objects in images, such as face identification in digital cameras [13]. The difficulty of these approaches is that distinct models may be needed to identify different object classes. An idea to cope with this scenario is to create some strategy which is capable to identify image regions with a high probability of containing objects, i.e. object-related regions. According to [12], an approach to object detection should group pixels into uniform and homogeneous regions with respect to some characteristic(s) in a way that leads to the separation between the foreground objects and the background, making this task closely related to figure-ground segmentation and/or to semantic image segmentation [13].

Segmentation methods divide images into regions (or disjoint segments) that, in an ideal situation, are equivalent to (parts of) real objects portrayed by them. State-of-the-art methods to detect object-related regions generally depend on image segmentation [2, 3, 5]. In this work, we hypothesize that having an accurate segmentation method is essential for improving object detection rates. A hierarchical segmentation method generates multiple disjoint segments at different levels of detail (see Fig. 1 for an example). A coarse level of detail can be generated by simple merges of segments belonging to finer detail levels [9]. According to [8], a hierarchical method must respect two principles of multi-scale analysis: (i) causality principle, which states that a contour of a region on a given scale (or level of detail) $k_1$ must be present on any other scale $k_2 < k_1$; (ii) location principle, which says that contours of the regions should not change or deform when there is a change of scale. This work evaluates the adoption of the Hierarchical Graph-Based segmentation method (hGB) [9, 10] to compute Superpixel Straddling (SS) measure used in the state-of-the-art object detection approach proposed by [2]. We also evaluate the use of SLIC [1] (a state-of-the-art method to generate superpixels), since SS measure is intended to quantify the relationship between an object-related window and the superpixels intersected by it. The main contributions of this work are two-fold: (i) it shows that the use of hierarchical segmentation outperforms the non-hierarchical one; and (ii) it gives evidence that the use of a state-of-the-art method to generate superpixels does not seem to obtain the same improvement.

The paper is organized as follows. Section 2 presents related works, while Sect. 3 describes objectness measurement in details. Section 4 presents concepts about hierarchical graph-based segmentation used in this work. Section 5 presents experimental results of our approach together with a comparative analysis. Finally, we draw some conclusions in Sect. 6.

2 Related Work

In [3], the authors use foreground and background seeds, which are group of pixels with a regular size. The input image is mapped onto a weighted graph. So, graph cuts are used with different parameters and seeds to obtain regions. Each generated segment is ranked based a learned scoring function. Similarly, in [5], the method generates background and foreground segments which are then ranked. In [11], multiple segmentations are computed using the Graph-Based method (GB) [7] on different colour spaces. Afterwards, these segments are grouped and reported as object-related regions. Finally, in [2] the authors proposed a method which creates a random collection of windows over the image. Then, windows are scored and ranked based on a objectness measurement. Since this measurement is used in this work, it will be describe with more details in Sect. 3.

3 Objectness Measurement

In [2], the authors proposed a window score, named objectness measurement, which combines some measures in order to positively identify an object inside a given window. Several measures were investigated in [2], but the authors were able to demonstrate that the three most important are as follows.

Multiscale Saliency (MS): this is based on a saliency measure and analyzes the spectral residual of the FFT image. The saliency map $I$ of image $f$ is calculated (for each pixel $p$) using Eq. 1.

$$\begin{aligned} I(p) = g(p) * \mathcal{F}^{-1}[exp(\mathcal{R}(f) + \mathcal{P}(f))]^2 \end{aligned}$$

(1)

in which $\mathcal F$ is the FFT, $\mathcal{R}(f)$ and $\mathcal{P}(f)$ are the spectral residual and the phase spectrum of image $f$, respectively; and $g$ is a Gaussian filter. This calculation is done in different scales (see Fig. 2a) and for each color channel. For each scale $s$, Eq. 1 was used to calculate saliency $I_{MS}^s(p)$. Then, the saliency of a window $w$ at scale s is calculated using Eq. 2.

$$\begin{aligned} MS(w, \theta _{MS}^s) = \sum _{\{p\in w \ | \ I_{MS}^s(S)\ge \theta _{MS}^s\}} \frac{I_{MS}^s(p) \times |\{p \in w \ | \ I_{MS}^s(p) \ge \theta _{MS}^s\}|}{|w|} \end{aligned}$$

(2)

Color Contrast (CC): this measure calculates the dissimilarity between a window $w$ and a surrounding window $Surr(w, \theta _{CC})$ which is bigger than $w$ by a factor of $\theta _{CC}$ – see Eq. 3.

$$\begin{aligned} \frac{|Surr(w, \theta _{CC})|}{|w|} = \theta _{CC}^2 - 1 \end{aligned}$$

(3)

The CC score is calculated by $\chi ^2$ distance between their LAB histograms h, as stated by Eq. 4.

$$\begin{aligned} CC(w,\theta _{CC}) = \chi ^2(h(w), h(Surr(w, \theta _{CC}))) \end{aligned}$$

(4)

Superpixel Straddling (SS): according to [2], a “bad” superpixel $s_p$ (defined as a small region of uniform color or texture, that ideally preserves object boundaries) occurs when it straddles (surpasses) the borders of a window $w$. Thus, a window with fewer number of straddling superpixels should have a higher score. This score is calculated using Eq. 5.

$$\begin{aligned} SS(w,\theta _{SS}) = 1 - \sum _{s_p\in S(\theta _{SS})} \frac{min(|s_p \backslash w|, |s_p \cap w|)}{|w|} \end{aligned}$$

(5)

in which $S(\theta _{SS})$ is the set of superpixels obtained using a segmentation method (the authors in [2] used GB method described in [7]) with a “scale” parameter $\theta _{SS}$, $|s_p \backslash w|$ represents the area of $s_p$ outside $w$, while $|s_p \cap w|$ stands for the area of $s_p$ inside $w$.

Learning the Parameters: in [2], using a supervised training, the best parameter values $\theta _{MS}^s,\theta _{CC},\theta _{SS}$ related to each measure were calculated in order to maximize the object detection rate. Finally, several combinations of measures were tested, and the combination using all three measures described before (MS+CC+SS) has obtained the best object detection results.

4 Hierarchical Graph-Based Segmentation

According to [8], a hierarchical segmentation method must respect two principles of multi-scale analysis. Thus, hierarchical segmentation is able to maintain spatial and neighborhood information between segments, even when changing the scale [9]. Actually, in [9], the authors proposed a strategy for transforming a non-hierarchical method into a completely hierarchical one; and they have applied that approach to GB in order to obtained a new method called hGB, which is able to compute a hierarchy of partitions for all scales, as shown in Fig. 1. A comparison between GB and hGB segmentation results is shown in Fig. 3. One can easily noticed that, for GB method [7], the number of regions does not decrease when scale parameter increases, neither contours of the regions are stable (see Fig. 3b–d). In contrast, hGB method [9, 10] respects both causality and location principles (see Fig. 3e–g).

In both methods, an image is transformed into an undirected graph $G = (V,E)$, in which $V$ is a finite set (representing pixels) and $E$ is a subset of $\{\{x,y\} \subseteq V \ |\ x \ne y\}$ (representing pixel neighborhood). To every tree $T$ spanning the set $V$ of the image pixels, to every map $w:E\mapsto \mathbb {N}$ that weights the edges of $T$ and to every threshold $\lambda \in \mathbb {N}$, one may associate the partition $\mathcal{P}_{\lambda }^{w}$ of $V$ induced by the connected components of the graph made from $V$ and the edges with weight below $\lambda $. It is well known [4] that for any two values $\lambda _1$ and $\lambda _2$ such that $\lambda _1\ge \lambda _2$, the partitions $\mathcal{P}_{\lambda _1}^{w}$ and $\mathcal{P}_{\lambda _2}^{w}$ are nested and $\mathcal{P}_{\lambda _1}^{w}$ is coarser than $\mathcal{P}_{\lambda _2}^{w}$. Hence, the set $\mathcal{H}^{w}=\{\mathcal{P}_{\lambda }^{w}~|~\lambda \in \mathbb {N}\}$ is a hierarchy of partitions induced by the weight map $w$.

The method hGB [9] does not explicitly produce a hierarchy of partitions, but instead produces a weight map ${{L}}$ (scales of observations) from which the desired hierarchy $\mathcal{H}^{{{L}}}$ can be inferred on a given $T$. It starts from a minimum spanning tree $T$ of the edge-weighted graph built from the image. In order to compute the scale ${{L}({e})}$ associated with each edge of $T$, hGB method iteratively considers the edges of $T$ in a non-decreasing order of their original weights $w$. For every edge e, the new weight map ${{L}({e})}$ is initialized to $\infty $; then for each edge e linking two vertices x and y the following steps are performed:

(i)
Find the region $X$ of $\mathcal{P}_{w(e)}^{w}$ that contains x;
(ii)
Find the region $Y$ of $\mathcal{P}_{w(e)}^{w}$ that contains y;
(iii)
Compute the hierarchical observation scale ${L}(e)$.

At step (iii), the hierarchical scale ${S}_{Y}({X})$ of $X$ relative to $Y$ is needed to obtain the value$~{L}(e)$. Intuitively, ${S}_{Y}({X})$ is the lowest observation scale at which some sub-region of $X$, namely $X^{*}$, will be merged to $Y$. Then, the hierarchical scale ${{L}}(e)$ is simply set to:

$$\begin{aligned} {{L}}(e) = \max \{{S}_{Y}({X}),{S}_{X}({Y})\}, \end{aligned}$$

(6)

in which,

$$\begin{aligned} S_{Y}(X) = [Dif(X,Y) - Int(X)]\times |X| \end{aligned}$$

(7)

with $Dif(X,Y)$ and $Int(X)$ defined (analogously to [7]). Thus, the internal difference Int(X) of a region X is the highest edge weight among all the edges linking two vertices of X in MST; and the difference Dif(X, Y) between two neighboring regions X and Y is the smallest edge weight among all the edges that link X to Y. Due to lack of space, additional details were omitted and the reader should refer to [9].

5 Experimental Results

In order to evaluate the adoption of hGB method [9] as a tool for solving object detection problem, we use the well-known PASCAL VOC 2007 dataset [6] with 20 object categories, which is also used in [2]. Similar to [2], during supervised training for obtaining parameter values, the train+val subsets were used and only images belonging to 06 categories (bird, car, cat, cow, dog, and sheep) were used, ignoring images marked as difficult. Therefore, 1,183 images from a total of 1,587 were selected for training. The SS measure was trained 03 times, i.e. one distinct training for each segmentation approach GB, hGB and SLIC in order to get the best $\theta _{SS}$ value for each one (for GB and hGB, $\theta _{SS}$ represents scale value used to obtain a partition; while, in SLIC, it is the number of superpixels).

During testing, the test subset of the dataset was used, selecting only images belonging to categories that were not used in training. According to [2], this is a way to show the effectiveness of the proposed approach. In addition, images marked as difficult have also been ignored during testing. Thus, 2,941 images from a total of 7,610 were selected. Since windows are chosen randomly for each new image, during tests, we use the same list of random windows to evaluate all combinations of measures for a given image.

Following the criterion adopted on PASCAL VOC 2007 challenge [6], if $W$ and $O$ represent the window area to be tested and the object area in the ground-truth image, respectively; an object appearing in the ground-truth image is considered covered if $|W \cap O|/|W \cup O|$ is greater than 0.5. To assess the results as in [2], we adopt the same test methodology. So, the evaluation was executed changing the number of random windows up to the limit of 1,000 windows; and the detection rate DR (i.e., the percentage of objects covered by random windows) is calculated. The evaluation was executes 10 times; and average results are shown in Fig. 4, where the average DR values are presented for distinct combinations of measures as a function of the number of random windows (#WIN). The area under the curve (AUC) for each combination is also shown.

As one can see in the Fig. 4, the hGB method helps improving object detection rate when 100 or more windows are used. Despite of this, a big difference can be noticed when reaching 1000 windows, which could indicate that segments obtained by hGB (which respects causality and location principles) are more likely to have a positive impact on the result as the number of windows increases. In contrast, the use of SLIC method does not seem to have a significant impact on detection rate, when it is compared with GB for 100 or more windows.

As shown by AUC values, hGB outperforms both GB and SLIC by 11% and 6.25%, respectively. However, hGB (and also SLIC) presents a higher average execution time compared to GB – see Table 1.

Finally, a qualitative assessment indicates that hGB seems to be better for detecting more than one object in the same image. This can be seen in Fig. 5, where heatmaps are used to show the object probability distribution (warmer regions represent stronger evidence of object presence in the original image).

Table 1. Average execution time per image of each combination (in seconds).

Full size table

6 Conclusion and Future Work

This paper proposed the use of a hierarchical graph-based segmentation to improve object detection. Experimental results have shown that the modified approach outperforms by 11% the previous state-of-the-art version, but with a high average execution time. When compared to a state-of-the-art method to generate superpixels, the modified approach consumes almost the same amount of time per image, still exhibiting an improvement of 6.25% in detection rate.

As future work, we plan to investigate the use of segments of different detail levels, to produce the best score for a current window. This could help dealing with partially occluded objects and with cluttered scenes.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE TPAMI 34(11), 2274–2282 (2012)
Article Google Scholar
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE TPAMI 34(11), 2189–2202 (2012)
Article Google Scholar
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI 34(7), 1312–1328 (2012)
Article Google Scholar
Cousty, J., Najman, L.: Incremental algorithm for hierarchical minimum spanning forests and saliency of watershed cuts. In: Soille, P., Pesaresi, M., Ouzounis, G.K. (eds.) ISMM 2011. LNCS, vol. 6671, pp. 272–283. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21569-8_24
Chapter MATH Google Scholar
Endries, I., Hoiem, D.: Category-independent object proposals with diverse ranking. IEEE TPAMI 36(2), 222–234 (2014)
Article Google Scholar
Everingham, M., Van Gool, C., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge (2007). http://host.robots.ox.ac.uk/pascal/VOC/voc2007/
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59(2), 167–181 (2004)
Article Google Scholar
Guiges, L., Cocquerez, J., Men, H.L.: Scale-sets image analysis. IJCV 68(3), 289–317 (2006)
Article Google Scholar
Guimarães, S.J.F., Kenmochi, Y., Cousty, J., Patrocínio Jr., Z.K.G., Najman, L.: Hierarchizing graph-based image segmentation algorithms relying on region dissimilarity: the case of the Felzenszwalb-Huttenlocher method. Math. Morphol. Theory Appl. 2, 55–75 (2017)
Google Scholar
Guimarães, S.J.F., Cousty, J., Kenmochi, Y., Najman, L.: A hierarchical image segmentation algorithm based on an observation scale. In: Gimel’farb, G., et al. (eds.) SSPR/SPR 2012. LNCS, vol. 7626. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_13
Chapter Google Scholar
van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV 2011, pp. 1879–1886 (2011)
Google Scholar
Zhang, H., Fritts, J.E., Goldman, S.A.: Image segmentation evaluation: a survey of unsupervised methods. CVIU 110(3), 260–280 (2008)
Google Scholar
Zhang, X., Yang, Y.H., Han, Z., Wang, H., Gao, C.: Object class detection: a survey. ACM Comput. Surv. 46(1), 10:1–10:53 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Pontifical Catholic University of Minas Gerais, Belo Horizonte, MG, Brazil
Rafael Machado Ribeiro, Silvio Jamil Ferzoli Guimarães & Zenilton Kleber G. Patrocínio Jr.

Authors

Rafael Machado Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Jamil Ferzoli Guimarães
View author publications
You can also search for this author in PubMed Google Scholar
Zenilton Kleber G. Patrocínio Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zenilton Kleber G. Patrocínio Jr. .

Editor information

Editors and Affiliations

Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Ruben Vera-Rodriguez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Julian Fierrez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Aythami Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ribeiro, R.M., Guimarães, S.J.F., Patrocínio, Z.K.G. (2019). Hierarchical Graph-Based Segmentation in Detection of Object-Related Regions. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-13469-3_15
Published: 03 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Hierarchical Graph-Based Segmentation in Detection of Object-Related Regions

Abstract

Similar content being viewed by others

Object Detection in Images Based on Homogeneous Region Segmentation

A Graph-Based Hierarchical Image Segmentation Method Based on a Statistical Merging Predicate

Multi-scale region composition of hierarchical image segmentation

Keywords

1 Introduction

2 Related Work

3 Objectness Measurement

4 Hierarchical Graph-Based Segmentation

5 Experimental Results

6 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Hierarchical Graph-Based Segmentation in Detection of Object-Related Regions

Abstract

Similar content being viewed by others

Object Detection in Images Based on Homogeneous Region Segmentation

A Graph-Based Hierarchical Image Segmentation Method Based on a Statistical Merging Predicate

Multi-scale region composition of hierarchical image segmentation

Keywords

1 Introduction

2 Related Work

3 Objectness Measurement

4 Hierarchical Graph-Based Segmentation

5 Experimental Results

6 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation