Object Detection and Recognition For A Pick and Place Robot: Rahul Kumar Sanjesh Kumar
Object Detection and Recognition For A Pick and Place Robot: Rahul Kumar Sanjesh Kumar
Object Detection and Recognition For A Pick and Place Robot: Rahul Kumar Sanjesh Kumar
Place Robot
Rahul Kumar Sanjesh Kumar
The University of the South Pacific The University of the South Pacific
Suva, Fiji Suva, Fiji
Email: rahul.kumar@usp.ac.fj Email: s11065712@student.usp.ac.fj
Abstract—Controlling a Robotic arm for applications such as error approach). Moreover, the works of [2], [3] and [4]
object sorting with the use of vision sensors would need a robust presents the IP algorithms and approaches to reduce response
image processing algorithm to recognize and detect the target time and increase in the efficiency for the object recognition
object. This paper is directed towards the development of the tasks. In [8], discussion is based on the reduction of
image processing algorithm which is a pre-requisite for the full
computation time using Trainarp algorithm (derived from
operation of a pick and place Robotic arm intended for object
sorting task. For this type of task, first the objects are detected, ANN). It also presents the method to migrate from the
and this is accomplished by feature extraction algorithm. Next, statistical approach to Artificial Neural Networks (ANN). The
the extracted image (parameters in compliance with the author has stated the efficiency as 95% and response time of
classifier) is sent to the classifier to recognize what object it is and 94ms. Likewise, [3] has conversed on employing a parallel
once this is finalized, the output would be the type of the object programming approach called object surface reconstruction
along with it’s coordinates to be ready for the Robotic Arm to method. Upon comparison with serial approach, parallel
execute the pick and place task. The major challenge faced in programming method is ten times faster. To reduce cost and
developing this image processing algorithm was that upon improve on performance, [4] has presented the communication
making the test subjects in compliance with the classifier
of vision system via USB. The vision system used was a
parameters, resizing of the images conceded in the loss of pixel
data. Therefore, a centered image approach was taken. The webcam for which via MATLAB, the system is enabled to
accuracy of the classifier developed in this paper was 99.33% and perceive environment through artificial vision via IP
for the feature extraction algorithm, the accuracy was 83.6443%. algorithms.
Finally, the overall system performance of the image processing
algorithm developed after experimentation was 82.7162%. Along with the classification part, the concept of Feature
Extraction (FE) is also studied. FE mostly acts as a pre-
Keywords – Object Detection, Object Recognition, Feature processing algorithm to furnish the dataset for the classifier to
Extraction, Classifier. make important decisions/classification. The work of [5]
elaborated on the usage of multi-stereo vision technique for
I. INTRODUCTION the detection of 3D Object. Eliminating the background i.e.
Vision based control of the robotic system is the use of the objects of least interest, using opening and closing
visual sensors as a feedback information to control the morphological techniques, 3D detection of a particular object
operation of the robot. Integration of the vision based was achieved. Similarly, [6] conversed about one of the
algorithms can enhance the performance and the efficiency of Robust Object detection algorithm. This algorithm is known
the system. Vision based configurations have been as the Viola-Jones IP method, a state of the art face detector.
implemented to mimic human visual sensors. Orienting The robustness of this algorithm was due to cascaded
towards robotic arms, object recognition is vital for the architecture of the strong classifiers arranged in the order of
operation of arms for navigation and grasping tasks. Often it complexity. This approach was incorporated to reduce the
has been the case that image processing (IP) algorithms processing time. Lastly, feature extraction via Contour
require huge processing time for the successful matching is also one of the best methods to detect objects [7].
implementation of object recognition. The trained shape is matched according to a probabilistically
motivated distance measure which enhances the shape
The work presented in [1] critically explains the basic comparisons within the framework. The works in [7] also
algorithms to be addressed before applying image processing presented on the noise reduction and other image optimization
techniques. These techniques include; image enhancement, via segmentation and other IP techniques.
noise reduction and a visual loop algorithm (based on trial and
The goal of this paper is to develop IP technique which will III. THE FEATURE EXTRACTION ALGORITHM
involve the FE and classification algorithms suitable for object The feature extraction part in the development of this model
sorting task. Additionally, the system to be developed needs to plays a vital role as it furnishes the raw image and complies it
be robust as it will be tested on a real time basis. It is planned according to the classifier’s specifications.
to use the developed IP technique on SCORBOT ER-4U
(robotic arm platform) [9] which will be refurbished and
utilized to sort electronic components such as resistors and
capacitors for laboratory technicians. The remainder of this
paper covers the algorithms of feature extraction and
classification. Further discusses on the determination of object
location and also portrays all the results carried out for the Figure 2: Feature Extraction before Classification
development of the algorithms. Finally, making the
concluding remarks on the results and further The above framework represents the images in the cluttered
recommendations on how to improve the developed model. scene (test subjects) to be tested for correct classifications. For
accurate classification of the objects, feature extraction
algorithm needs to be considered prudently.
II. CONCEPTUAL FRAMEWORK OF THE ENTIRE SYSTEM
The above figure shows how the image processing algorithm A. Algorithm
will work. The constant variable in this case is the x-y 1. The image is read and converted to grayscale. The
dimensions of the workspace. The image taken is first grayscale conversion is achieved by eliminating hue
standardized according to the workspace dimensions. This is and saturation information while preserving the
achieved by resizing the taken image according to the luminance. The colored image (RGB image) is 3
dimensions of the workspace. dimensional. To convert to a 2 dimensional grayscale
width of image (pixels) = (37.875275591) w s (1) image, the following equation represents the correct
height of image (pixels) = (37.875275591) h s (2) proportion of RED, GREEN and BLUE pixels to be
taken into account:
Note : w s and h s are width and height of the workspace 0.2989 R + 0.5870G + 0.1140 B (3)
(where the components are) in centimeters
G = Gx + G y (5)
1 m K (i ) (i ) (i ) (i )
J (θ ) = ∑ ∑ [ − yk log( hθ ( x )) k − (1 − y k ) log(1 − hθ ( x )) k + ...
m i =1 k =1
λ H 400 (1) 2 2 H (2) 2
... + ∑ ∑ (θ ) + ∑ ∑ (θ
2m j ,k j ,k ) (8)
Figure 8: Object Detection j =1 k =1 j =1 k =1
Images from
Batch grayscale
and Resizing to 20
Initialize Random
weighting and
Run Back-
Perform 5-fold
Cross-Validation
hθ ( x (i ) ) is the hypothesis function whereby:
Feature Extraction by 20 Pixels and Propagation to
perform Forward And select best
Algorithm write image data to optimize weights
Propagation model
.txt file
1
hθ ( x ) = (9)
θT x
1+ e
Figure 9: Conceptual Framework for the training phase
The hypothesis function is a sigmoid function which has 1 as
its upper bound and 0 as its lower bound. In addition to that
The above block diagram shows the process by which the
the sigmoid function is differentiable at all points.
training data is manipulated and trained by the classifier. The
JPEG formatted images are the training data which consists of
6. The above process computes the cost of the feed forward
the images of Resistor and the capacitors equally weighted
(same number). These images are first converted to grayscale propagation. To determine the weights of the model, first
a random run must be carried out to start off with the
optimization via Back-propagation. Therefore, in this
initial stage, theta values (weights) are randomized, for
the input and hidden layers.
7. Implement Back-Propagation for optimization of theta
values: Once the hypothesis/prediction is made according
to the initialized random weights. The Back-propagation
starts off with the output layer; it measures the difference
between the networks activation value and the true target
value and further goes towards in the direction of input
layer assigning the error to each neuron.
Figure 12: Model 2 Block Diagram
8. For higher accuracies, the Neural Network is trained Model No. of Neurons % Accuracy
using higher number of iterations. For the development of 1 25 99.33333
this model 20000 iterations were run to denote the best 1 30 99.66667
values of the weights (theta values). 1 40 99.00000
1 50 98.66667
1 70 99.00000
VI. MODELS
2 25 99.00000
For the object sorting task, in-terms of image processing, two 2 30 98.00000
models of the classifier were developed. The differences in the 2 40 98.66667
model are specified below:
2 50 98.66667
2 70 98.66667
MODEL 1 Table 1: Cross Validation Results
Convert to
Object Cropped Send to the
Cluttered Scene Feature Extraction and Object
Grayscale and
Resize image to
classifier for The Bold numerics in the table represent the best model.
location filed Testing
20 by 20 pixels
Best Model: Model 1
• 400 Input Layers
Figure 11: Model 1 Block Diagram
• 25 Hidden Layers
• 2 Output Layers
The Model 1 technique converts the cropped image to
grayscale and resizes it to comply with the testing standards VIII. LOCATION OF THE EXTRACTED IMAGE (COORDINATES)
(20 by 20 pixels images used for testing). However, in Model
2, considering that resizing of an image detoriates the quality The location of the object is a very important parameter
of the image, the cropped image is placed on the center of a because without the coordinate of the classified object, the
white background. object sorting task would be impossible. Before applying
feature extraction, the whole image was resized as per the
workspace dimensions (converting the metric dimensions to
pixels and this will become the size of the image). From the
scene, during the feature extraction, the locations of the objects
were filed and upon testing (classifier) the program gave out
the center coordinates of the object also assuming the robotic
C. Iterations and Learning Rate Used for training
arm is confined within the boundary of the scene.
For training of the dataset, 20000 iterations were first ran with
The locations of the objects were obtained in form of a a randomized values of weights, once via Back-Propagation
bounding box. In MATLAB this was in form of [xmin, ymin, new weights (theta) are determined, the iterations (20000) were
width, height]. The (xmin, ymin) were actually the coordinate rerun to get optimized values of weights (theta). Moreover, to
of the top-left most edge of the rectangular bounding box and avoid large deviations in the gradient descend algorithm, the
the (width,height) were actually the width in x direction and value of learning rate was kept to 0.1.
height in y direction.
X. RESULTS
B. Classifier Accuracy
No. True Positive + No. True Negative 100
x (13)
Total Samples 1
The classifier accuracy was determined using equation 13.
C. Final Results
Figure 13: Location for the bounding box Below is the result table of 32 scenes which altogether has a
total of 448 objects inclusive of both capacitor and non-
Center of the object is given by: capacitor images. The operations regarding the testing starts
1 from the feature extraction to the classification task and this
xc = x min + width (10) test images are a separate set of data from the training
2
examples.
1
yc = y min + height (11)
2 Non FE Overall
Scene Capacitors TP TN
whereby xc and yc are in pixels Capacitors Accuracy Accuracy
2 3 0 1 2 1 1
[ xn, yn ] = 0.0264583333(xc,yc) (12)
3 0 6 1 0 6 1
where 0.026458333 is the conversion factor to obtain cm
4 12 13 0.36 6 3 1
values from pixels.
5 25 3 0.57 16 0 1
IX. EXPERIMENTATION
6 7 8 0.8 4 7 0.916667
A. Training Dataset Description
7 9 0 1 1 3 0.444444
The training data consisted of 312 images which comprised of
156 individual capacitor images and 156 individual non- 8 13 5 0.83 11 0 0.733333
capacitor images. The capacitor images were taken on a white
background and the positioning of the images were varied. The 9 1 6 0.57 1 3 1
positioning were not only centered but also cornered in the
10 6 12 0.5 3 5 0.888889
training images. The training data was converted to grayscale
and resized to 20 by 20 pixels and then sent in for testing. 11 8 1 1 4 1 0.555556
12 5 0 1 3 2 1
B. Test Dataset Description
The test data had 32 images with cluttered scene. The test data 13 10 0 0.9 8 0 0.888889
was converted to gray scale, complied with the classifier 14 7 1 1 5 3 1
specifications and then sent for testing.
15 6 0 1 3 0 0.5 Model 1 had their accuracies ranging towards 100%. The
choice of the best model was not only made according to the
16 58 0 1 40 0 0.689655 accuracy but the selection was made upon going over the
following criterions: least cost to attain favorable results (i.e.
17 0 13 0.38 0 5 1
less number of neurons), processing and execution time and
18 0 6 0.5 0 2 0.666667 simplicity of the model. Once the model is classified, the
MATLAB program also outputs the coordinate of the
19 10 7 0.59 6 4 1 classified object to be ready for the robotic arm to execute the
sorting task.
20 4 3 0.86 3 2 0.833333
21 12 0 0.83 10 0 1
XII. RECOMMENDATION
22 24 0 0.67 14 2 1 The major challenge when developing the models was that
23 2 1 1 1 2 1 upon resizing of the images, there’s loss in the pixel data as the
images were in raster formats. The potential solution to this
24 18 17 0.6 10 11 1 predicament would be to invoke the concept of Scalable Vector
Graphics (SVG) formatting of the images. However, since
25 4 0 1 3 0 0.75
MATLAB is not able to process the vector graphics file, the
26 8 9 1 6 5 0.647059 first proposition would be to write a function file which enables
MATLAB to read and modify the .svg/ or any other vector
27 1 0 1 1 0 1 files. Having formed this foundation would solve many
problems in terms of scaling the images.
28 6 7 1 4 6 0.769231
REFERENCES
29 6 5 1 1 5 0.545455