Identification of Orchid Species Using Content-Based Flower Image Retrieval
Identification of Orchid Species Using Content-Based Flower Image Retrieval
Identification of Orchid Species Using Content-Based Flower Image Retrieval
net/publication/263011987
CITATIONS READS
6 713
3 authors:
L.T. Handoko
Indonesian Institute of Sciences
88 PUBLICATIONS 856 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by L.T. Handoko on 18 March 2015.
Abstract—In this paper, we developed the system for [3,4,5,6,7,8]. It is because orchid has a unique part of flower
recognizing the orchid species by using the images of flower. We called lip (labellum) that distinguishes it from other flowers
used MSRM (Maximal Similarity based on Region Merging) even from other orchids [9,10] as shown in Fig. 1.
method for segmenting the flower object from the background
and extracting the shape feature such as the distance from the In this research, we propose semi automatic Content-based
edge to the centroid point of the flower, aspect ratio, roundness, Image Retrieval (CBIR) of orchid species using MSRM
moment invariant, fractal dimension and also extract color (Maximal Similarity based on Region Merging) method for
feature. We used HSV color feature with ignoring the V value. segmentation, shape and color feature for feature extraction,
To retrieve the image, we used Support Vector Machine (SVM) and SVM (Support Vector Macine) method for retrieving the
method. Orchid is a unique flower. It has a part of flower called images. We choose MSRM method for segmentation because
lip (labellum) that distinguishes it from other flowers even from MSRM method is easy to use and their segmentation is more
other types of orchids. Thus, in this paper, we proposed to do accurate than another high level segmentation such as Graph
feature extraction not only on flower region but also on lip Cut method [11]. Taxonomists usually use color and shape
(labellum) region. The result shows that our proposed method feature to manually distinguish one flower to another flower.
can increase the accuracy value of content based flower image The researchers used SVM in this research because it is proven
retrieval for orchid species up to ± 14%. The most dominant to be very effective in former implementations [12,13,14].
feature is Centroid Contour Distance, Moment Invariant and
HSV Color. The system accuracy is 85,33% in validation phase
and 79,33% in testing phase.
I. INTRODUCTION
Manual identification of plant needs skill, more
information, more thoroughness, and also more time. We Fig. 1 Orchid flower
depend so much on experts to find out what plant’s name is.
Right now, there is limited number of taxonomists in Indonesia We analyzed the influence of CBIR orchid species system
[1]. with lip of flower and without lip of flower. We conduct the
analysis about the feature performance in order to find out the
Orchid is one of the biggest families in the class of flowers. significant feature influencing the performance system. We
Not only using leaf, stem, and root, but in general we can also also analyzed the performance system for validation phase and
use flower as the parameter in identifying orchid species. testing phase.
Indonesia is suspected to have high diversity of orchids as there
are so many species of orchids that are not revealed yet in the
world of science [2]. Identification of orchid is slightly
different from previous research of plant identification
II. MATERIAL AND METHOD
(2)
This section explains about the data and the proposed
method including the pre processing, feature extraction,
method of image retrieval and feature analysis. where R10 and R90 are respectively the average distances
among all di’s which are smaller than 10th percentile and
A. Data larger than 90th percentile of all di’s:
We use 300 images of orchids, consist of 10 genus. Each
genus consists of 3 species, and each species consist of 10
flower images taken from the front side. We obtain the data
from various sources such as personal collection photos, (3)
colleague’s photos, and photos from the internet with different
size, resolution, distance shoot and light intensity. We used
5-cross validation for training phase to divide 300 images into
2 data sets, training sets and validations sets. We also use 30
images of orchids for testing. This 30 images are new images (4)
that have never been used for training/validation phase.
B. Pre-Processing
Size of flower images are not more than 600x500 pixel in
order to adjust with MSRM segmentation software [11] and (5)
the image resolution is 96 dpi (default result of imresize()
function in Matlab). We did a little modification in the output N is the number of pixels on the flower boundary, x i and yi are
of MSRM source code. If the MSRM output is white respectively the x and y coordinates of the i-th boundary pixel.
segmented object with black background, our output is color Di in (2) is the normalized distance defined as follows:
segmented object with black background. Segmentation was
conducted twice, on flower region and lip region. After
segmentation, we did morphological operation such as erosion,
dilation, filling holes and 8-connected component to refine the
image. The pre-processed images were ready for feature
extraction process.
(6)
C. Feature Extraction
We did the feature extraction on the flower region and lip Aspect ratio is the ratio of physiological width and
region. The various data made us use the color and shape physiological length [16]. It is one of manual identification that
features that are invariant to scale, rotation, translation, light is used by taxonomists. It is also invariant to scale, rotation and
intensity, etc. Shape features that we used are distance from translation.
center to edge of flower/lip region, aspect ratio, roundness, Roundness is taken as a feature because of its many variety
moment invariant, and fractal dimension, while color features of flower shapes so that each of orchid species may have
is HSV color with ignoring the V value. There are three different roundness value. Roundness can be computed by
features on behalf of distance from the center to the edge, that [8,16]
are CCD (Centroid Contour Distance), SF1, and SF2.
(7)
We chose CCD because it represents the flower shape by
curve and is invariant to scale and rotation [4]. To be invariant where A is flower/lip area and P is perimeter of edge’s
to rotation, 0o start from the edge point which have the farthest flower/lip.
distance from center of flower/lip region. In this research, we
used edge point with multiple of 10 o[15]. CCD algorithm Moment Invariant (MI) was chosen because of its reliable
usually cannot run on convex shape (center point outer of the capability as a shape feature. It was invariant to rotation,
object), so we did a little modified algorithm. If there were no translation and dilatation. We used seven moment invariants
10o or multiply, we use 0 for the distance. If there were 10 o that deriving from second moment and third moment [17,18].
cuts off two edge points, then the edge point that we use is the Fractal Dimension also has good performance for object
farthest. recognition. Fractal Dimension can be computed as follows
According to [8], SF1 and SF2 can recognize flower well. (8)
We used SF1 to find out the sharpness of sepal and petal shape.
We used SF2 to find out the pattern/shape of the flower based
on average normalized distance. SF2 is invariant to scale. SF1
and SF2 can be computed by:
where N(s) is the number of boxes—which size is s—which
(1) filled of object information (pixel). D(s) is fractal dimension of
object boxes—which size is s. We used dimension value of 4 th,
5th, 6th, 7th and the mean of its four dimension [18].
We chose HSV color because in this case, its performance E. Feature Analysis
is better than RGB color [19]. While the variation in To analyze the significant feature, we used feature selection
illumination will greatly affect the recognition result, we used by using Weka tool and also manually analyzing it by
HSV color with the discard illumination (V) and then divided searching features with better performance than other features,
HS color space into 12x6 color [8] as shown in Fig. 2 and then combined some dominant features as selected features to
represented by Ci, 1≤ i ≤ 72. find out the performance system.