Abstract
How to obtain the spatial coordinates of kiwi fruit has been one of the key techniques for kiwi fruit harvesting robot. In this paper, the writer proposes a unique way to obtain the spatial coordinates of the features of kiwi fruit from the bottom of the target fruit based on the growth characteristics and scaffolding cultivation pattern characteristics of kiwi fruit, plus the help of Microsoft camera and Kinect sensor. Also included in this paper is the coordinate conversion between the images come from Microsoft camera and the images of the Kinect sensor, which is followed by an analysis of the precision of the spatial coordinates of Kiwi fruit captured by the Microsoft camera and Kinect sensor. The process is like this: first, capture images of the target fruit from the bottom of the fruit with Microsoft camera, and then extract coordinates of the target fruits’ feature points to determine the corresponding target fruit feature point coordinates in the Kinect sensor; second, analyze the correspondence between the Microsoft camera image coordinate system and the Kinect sensor image coordinate system so as to establish a mathematical model for the image coordinate conversion; finally, capture target feature points’ spatial coordinates with Kinect sensor and conduct tests. The results show that the precision of coordinate conversion mode and Kiwifruit spatial coordinates can meet the requirements of the harvesting robots.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The acreage and production of China’s kiwi fruit rank first in the world. However, at present, the kiwi fruit is mainly harvested manually, which is highly labor-intensive. With the progress of urbanization and industrialization, more and more young and middle aged people are attracted to work in cities. As a result, the loss of labor force in agriculture is becoming serious, which in turn raises the cost of agricultural production and lowers market competitiveness of our agricultural products. Therefore, the development of kiwi fruit picking robot is of great significance to the development of China’s kiwi fruit industry.
The key techniques of Kiwi picking robot involve three parts: fruit identification, location and nondestructive picking. The widespread adoption of standardized scaffolding pattern in kiwi fruit production makes robot picking fruit feasible. However, there are still several factors that hinders the development of kiwi fruit picking robot. Firstly, kiwi plants grow in clusters, each of which is usually composed of 3–5 fruits, and the fruits usually grow too close to one another and even overlap. Moreover, foliage sheltering and similar color between fruits and the background make the harvesting robot difficult to perform precise fruit identification and separation as well as feature extraction of fruits. Secondly, kiwi fruit positioning and spatial coordinate acquisition are also problems to be solved for the development of the harvesting robot. The existing fruit and vegetable harvesting robot positioning system is low in precision, time-consuming, complex in structure and high in cost. So it is imperative to develop a new, efficient positioning system.
Among the existing fruit and vegetable harvesting robots at home and abroad, some can harvest fruits whose colors are greatly different from the background colors, such as strawberry picking robot [1], tomato picking robot [2–4], citrus harvesting robots [5, 6], and some can harvest the target fruits whose colors are similar to the background colors, such as cucumber picking robots [7]. Such robots usually adopt near-infrared spectroscopy or laser technology for detection. In terms of detection and identification of kiwi fruits, Zhan et al. [8] used Adaboot algorithm. Ding et al. [9] used RB color component method to separate kiwi fruits. These two methods can only separate the regions with fruits from the ones without fruits, but they failed to identify the fruit individually. Cui et al. [10] used 0.9R-G color features in the fruit image segmentation, but in a complex background environment, this method involves a large amount of calculation and time. In terms of kiwi fruit positioning and coordinate acquisition, some other methods are used, such as monocular vision, binocular vision, multi-purpose vision, hyper-spectrum, laser etc. However, these methods have drawbacks, such as complex computation, low accuracy, high cost, poor reliability and so on. Meanwhile, the conversion among pixel coordinates, spatial coordinates and mechanical arms is still a problem to be solved.
In on-site investigation, it is found that the places below the fruits are spacious with less sheltering and the background is simple, so the writer proposes that the fruits be identified, positioned, and picked from the bottom parts of the plants. The principle is like this: determine the sequence of fruit identification, feature point extraction and fruit picking by using elliptic Hough conversion; acquire the feature point coordinates of the images with Kinect sensors made by Microsoft Company; obtain the image coordinates of feature points by Kinect sensor referring the foreign research results for Microsoft Kinect sensor in robot navigation [11, 12] and feature recognition [13–15]; finally, conduct coordinates conversion between the camera and sensors and construct mathematical model of the coordinates conversion to obtain the 3D coordinates of the feature points.
2 Information Perception
2.1 Feature Extraction and Image Acquisition
The kiwi fruit pictures were taken in October 2014 during harvest time at Kiwi Experimental Station of Northwest Agriculture and Forestry University and the breed is “Hayward”. Camera used is Microsoft Life Camera studio with COMS sensor and auto-focus. Image pixel acquisition is of 640 × 360, jpg format. Each picture containing 2–5 fruits was taken from the bottom with a distance of 20 cm to the fruits. Image acquisition mainly comes from the side and bottom, as shown in Fig. 1.
In Fig. 1, due to the greater scene depth, and complicated background, it can be seen that the images of fruits taken from the side contain not only the leaves of the near-byplant branches, but also distant non-target fruits, and serious mutual occlusion between the target fruits. All of these affect the accuracy of target fruit segmentation and recognition. In contrast, the picture shot from the bottom has less mutual occlusion between fruits and no interference of other distant non-target fruits, which is favorable for extracting target fruits. Due to the mutual occlusion between fruits, fruit identification can only be performed from outside to inside, picking one by one, which results in low harvest efficiency. In contrast, the shadow area between fruits is less in image shot from the bottom, which makes it possible that all the fruits can be identified at a time. As a result, the picking sequence can be determined and efficiency improved.
In order to improve fruit recognition success rate under complex background, Cui et al. [16], researchers of Northwest Agriculture and Forestry University, presented a comprehensive method to identify fruits and extract fruit features according to kiwi fruit characteristics and color features, and elliptical Hough conversion. This method can minimize the impact of different complex background and illumination on the identification and the extraction of fruit features. The specific steps are shown in Fig. 2.
2.2 Picking Order
Figure 3 is the pixel coordinates of the feature points when each fruit is identified. X-axis is within the range of 0–360, Y axis 0–640.
The picking sequence is determined according to the values of the feature point coordinates. In Fig. 3, 1, 2, 3, 4, 5 represent picking sequence, which is determined according to the Y coordinate values of the feature points from small to big, thus the picking arm of the robot reaches minimum stroke, and maximum efficiency in the whole picking process.
3 Coordinate Conversion
The principle of Kiwi fruit extraction is shown in Fig. 4. On the left is a front view and on the right is atop view, where 1 stands for kiwi fruits, 2 for Microsoft camera, and 3 for Kinect sensors. The camera is used to identify fruits, extract features and determine the picking sequence. Kinect sensors are used to obtain the spatial coordinates of the feature points of kiwi fruits. The intersection between the optical center and the outer surface of the infrared camera is used as the origin of the coordinates; the intersection between Microsoft camera lens surface and the optical center is used as the coordinate origin of its coordinate system as is shown in Fig. 4, point ‘O’ and point ‘E’.
As is shown in Fig. 4, the horizontal distance between the feature point and the Microsoft camera is ‘h’; the distance between the feature point and Kinect sensor is ‘H’, then the distance between the Microsoft camera and Kinect sensor is ‘H-h’. When the distance between the Microsoft camera and Kinect sensor remains unchanged, let the spatial coordinates between the camera and Kinect sensor be (X, Y, Z), then we get Fig. 5.
The coordinate system diagram of Microsoft camera image is shown in Fig. 5(a) with a pixel area of 640 × 360. The value of the center coordinates is (320,180), which is also the projection position of the camera’s optical center in the image. Letthe pixel coordinates of the feature point A recognized by the camera at the image be (x, y), so the value of the relative pixel coordinates in the image is (\( \Delta {\text{x}},\Delta {\text{y}} \)). Then:
The relationship between \( \Delta {\text{x}} \) and \( \Delta {\text{y}} \) is positive and negative, and the image can be divided into our regions, that is, ‘1’, ‘2’, ‘3’, ‘4’. When \( \Delta {\text{x}} < 0 \) and \( \Delta {\text{y}} < 0 \), it corresponds to region ‘1’, representing that the feature point is on the upper left of the projection point. When \( \Delta {\text{x}} < 0 \) and \( \Delta {\text{y}} > 0 \), it corresponds to region ‘2’, showing that the feature point is on the lower left of the projection point. When \( \Delta {\text{x}} > 0 \) and \( \Delta {\text{y}} > 0 \), it corresponds to region ‘3’, meaning that the feature point is at the right bottom of the projection point; When \( \Delta {\text{x}} > 0 \) and \( \Delta {\text{y}} < 0 \), it corresponds to region ‘4’, indicating that the feature point is on the upper right of the projection point.
When the distance between the camera and the feature point is h, letone of the image pixels be a (mm), then the three 3D coordinates \( {\text{Xw}},{\text{Yw}},{\text{Zw}} \) of the feature points recognized relative to the origin are respectively as follows:
In addition, the positive or negative values of \( X_{w} \) and \( Y_{w} \) determines the regions where the feature points distribute.
The Kinect sensor is installed below the Microsoft camera with a fairly great distance between them, so that images acquired by the Kinect sensor include the shooting area of the Microsoft camera. As is shown in Fig. 5 (b), ‘XOY’ is the Kinect image screen coordinates, and ‘\( {\text{xoy}} \)’ is the Microsoft camera image screen coordinates. When the pixel range of the Kinect sensor in capturing image is 640 × 480, the projection of the optical center of the infrared video camera in the image is the pixel coordinates (320 × 240) of the center point of image projected.
When the spatial position of the feature points recognized by the Microsoft camera remains unchanged, and let the coordinates of feature point A in the screen coordinate system of the Kinect sensor be x′, y′, then its relative pixel coordinates to the center point in the image are:
Supposing one pixel of the image on plane H of the Kinect sensor represents the actual length b (mm), the 3D coordinates \( {\text{X}}_{k} \), \( Y_{k} \), \( Z_{k} \) of feature point A relative to the origin of the Kinect sensor are respectively:
Similarly, the positive and negative of \( {\text{X}}_{k} \), \( {\text{Y}}_{k} \) correspond to the locations of four images ① ② ③ ④ and the actual locations of information. Since the spatial position of the Microsoft camera and Kinect sensors remain unchanged, pixel coordinates of feature point A in the imaged captured by Kinect sensor can be derived from the pixel coordinates of the feature point ‘A’ in the camera image shot by the Microsoft camera. That is
From Eqs. (11) and (12), we can get the following formulas:
When the values of distance H and h remain unchanged, the values of X, Y, a and b would be constant, so would be the values of \( \frac{\text{a}}{b} \), \( \frac{X}{b} \), \( \frac{Y}{b} \). When Microsoft camera recognizes and extracts pixel coordinates(x, y) of the feature point of kiwi fruits, the corresponding pixel on the Kinect sensor screen is (\( {\text{x}}^{'} \), \( {\text{y}}^{'} \)) in theory. However, in practice, image acquired by the obtained by RGB video camera on the Kinect sensor is just opposite along the left-right direction. That is to say, it is reversed along the direction of axis X. In this case, the value of the coordinates (x, y) of the point corresponding to the coordinates (x″, y″) is (\( 640 - {\text{X}}^{'} \), \( {\text{y}}^{'} \)).
4 Obtaining Spatial Coordinates
As far as the existing fruit and vegetable harvesting robots at home and abroad are concerned, a variety of methods are adopted for target positioning and coordinates extraction, such as monocular vision, binocular vision, multi-purpose visualization, close infrared spectroscopy, laser scanners etc. but each of them has some problems to be solved.
In this study, Microsoft’s Kinect sensor is used to obtain the spatial coordinates of the feature points of the fruits, and the development platform is Kinect for Windows SDK. The sensing device is shown in Fig. 6(a). It mainly consists of three parts. They are, from left to right, an infrared projector, a RGB camera, and an infrared camera. The function of the infrared projector is to project near-infrared spectrum actively. As is known, when the infrared spectrum is projected ontothe objects with rough surfaces or ground glass, there would be distorted spectrum, which in turn would will generate random points of reflected light (also called speckles). The speckles are then read by the infrared camera. The infrared camera is used to analyze the close infrared spectrum and to create depth images of the objects within our vision. RGB camera is used to shoot colored images within our vision. The measurement range is shown in Fig. 6(b). The range centers on the infrared camera with upper angle 43° and lower angle 43°, 400–4000 mm away in front of the video camera. The precision of the depth images captured within this area can reach millimeter.
The flow chart of Kiwi fruit spatial coordinates acquisition is shown in Fig. 7. First, register color image data and depth image data flow to get respective color image and depth image, and to map RGB image onto the depth image. Then load the mapped image into the Kinect built-in spatial coordinates system, flowed by loading Map Depth Point to Skeleton Point function. Afterwards, judge whether the coordinate values are within the range with higher accuracy. If the values are within the range, then output the spatial coordinates with the infrared camera as its origin; if not, the output distance is beyond range.
5 Test and Analysis
5.1 Test Method
In order to verify the conversion relationship between feature point coordinates acquired by Microsoft camera and the coordinates of the Kinect sensor, as well as the accuracy of the mathematical model, verification test was carried out in the laboratory. Firstly, graph paper was used to calibrate the length represented by one pixel (e.g. numerical values of ‘a’ and ‘b’) in the images captured by Microsoft camera and Kinect sensor in fixed positions. Then the feature point images and 3D coordinates were acquired with the Microsoft camera and the Kinect sensor. The specific steps are as follows:
-
(1)
In order to facilitate verification testing, the whole coordinate conversion system was inverted: Kiwi fruit was placed at the bottom, and Microsoft camera was placed at the upper part with the Kinect sensor on top. The testing platform constructed is shown in Fig. 8(a). Kinect sensor and Microsoft camera are fixed onto the bracket on the same level with desktop supporting Kiwi fruit. In this test, in order to facilitate verification, the kiwi fruit surface was randomly marked with a point as the recognition feature point. Figure 8(b) is an image acquired by the sensor and Fig. 8(c) is the image captured by Microsoft camera.
-
(2)
In this test, the vertical distance from the Microsoft camera to the feature point is 200 mm, and the vertical distance from the Kinect sensor to the feature point is 928 mm. Graph paper was used to calibrate the actual length between the image plane and the place 200 mm away from the Microsoft camera as well as the actual length of one pixel 928 mm away from the Kinect sensor (i.e. the values of a and b). The way of calibration is shown in Fig. 9, where Microsoft camera is fastened to the height gauge parallel to the coordinate plane. Kinect sensor calibration method is same with Microsoft camera calibration method.
-
(3)
Microsoft camera was used to obtain the images of feature points and the pixel coordinates of the feature point. Kinect sensor was used to obtain the images of feature points, pixel coordinates and spatial coordinates.
-
(4)
Equations (13), (14) were verified with the pixel coordinates acquired by Microsoft camera and Kinect sensor and the values of a, b. Errors in coordinate conversion were derived according to the actual pixel coordinates of the feature points while Kinect sensor acquiring images, D-values between the images coordinates derived from equations, D-values between pixel coordinates from Kinect sensor image and calculation, and the actual length represented by each D-value.
5.2 Result and Analysis
Through calibration, it is found that the actual length represented by one pixel 200 mm away from at Microsoft camera is 0.445 mm, that is to say, a = 0.445 m and that the actual length represented by one pixel 928 mm away from the Kinect sensor is 1.32 mm, that is to say, b = 1.778. In this experiment, we got the pixel coordinates of 24 groups of feature points in different positions on the images captured by Microsoft camera and the images captured by Kinect sensor. In addition, with the help of Kinect sensor, we got the spatial coordinates of the feature points at such positions.
By plugging into formulas (13), (14) the pixel coordinates of point 1 and point 24 on the images acquired by Microsoft camera and the symmetrical image of the image acquired by Kinect sensor, we have the following results:
The value of a/b coincides with the previous calibration. By plugging into formula (13), (14) the values of a/b, X/b, Y/b and the coordinate values of the rest 22 points on the images acquired by the Microsoft camera, we can work out the coordinates of these 22 points on the symmetrical images acquired of the Kinect sensor image.
Figure 10 is a diagram drawn with MATLAB to express the calculated coordinate values and the actual coordinate values.
In the diagram, the red curves represent the actual distribution of points, and the blue ones represent the distribution of calculated points. It can be seen that the two curves coincide with each other. Table 1 indicates that the error between the calculated point coordinates and the actual coordinates is less than 3 pixels or 5 mm, thus formulas (13), (14) can accurately reflect the correspondence of a same point on the image acquired by Microsoft camera and on the image acquired by the Kinect sensor.
6 Conclusions
-
(1)
In light of the kiwi fruit growth characteristics, an automatic identification method is studied, including how to acquire fruit image from the bottom and an integrated application of fruit shape and color features in recognition.
-
(2)
Considering the drawbacks of the existing fruit and vegetable harvest robots, a fresh method based on Kinect sensor is proposed to acquire the coordinates of the target kiwi fruits.
-
(3)
This paper discusses the coordinate conversion between Microsoft camera and Kinect sensor and the mathematical model constructed can perform accurate coordinates conversion.
References
Zhang, K., Yang, L., Wang, L., et al.: Design and experiment of elevated substrate culture strawberry picking robot. Trans. Chin. Soc. Agric. Mach. 43(9), 165–172 (2012)
Monta, M., Kondo, N., Shibano, Y.: Agricultural robot in grape production system. In: IEEE International Conference on Robotics and Automation, pp. 2504–2509 (1995)
Arima, S., Kondo, N.: Cucumber harvesting robot and plant training system. J. Robot. Mechatron. 11(3), 208–212 (1999)
Kondo, N., Monta, M., Fujiura, T.: Fruit harvesting robot in Japan. Adv. Space Res. 18(1–2), 181–184 (1996)
Cai, J., Zhou, X., Wang, F., et al.: Obstacle identification of citrus harvesting robot. Trans. Chin. Soc. Agric. Mach. 40(11), 171–175 (2009)
Lu, W., Song, A., Cai, J., et al.: Structural design and kinematics algorithm research for orange harvesting robot. J. Southeast Univ. (Nat. Sci. Ed.) 41(1), 95–100 (2011)
van Henten, E.J., Hemming, J., van Tuijl, B.A.J., et al.: An autonomous robot for harvesting cucumbers in greenhouses. Auton. Robots 13(3), 241–258 (2002)
Zhan, W., He, D., Shi, S., et al.: Recognition of kiwifruit in field based on Adaboost algorithm. Trans. Chin. Soc. Agric. Eng. 23, 140–146 (2013)
Ding, Y., Geng, N., Zhou, Q.: Reacher on the object extraction of kiwifruit based on images. Microcomput. Inf. 25(18), 294–295 (2009)
Cui, Y., Su, S., Lv, Z., et al.: A method for separation of kiwifruit adjacent fruits based on Hough transformation. J. Agric. Mechanization Res. 34(12), 166–169 (2012)
Zainuddin, N.A., Mustafah, Y.M., Shawgi, Y.A.M., Rashid, N.K.A.M.: Autonomous navigation of mobile robot using Kinect sensor. In: 2014 International Conference on Computer and Communication Engineering (ICCCE), pp. 28–31. IEEE (2014)
Ruiz, E., Acuña, R., Certad, N., Terrones, A., Cabrera, M.E.: Development of a control platform for the mobile robot Roomba using ROS and a Kinect sensor. In: Robotics Symposium and Competition (LARS/LARC), 2013 Latin American, pp. 55–60. IEEE (2013)
Clark, M., Feldpausch, D., Tewolde, G.S.: Microsoft Kinect sensor for real-time color tracking robot. In: 2014 IEEE International Conference on Electro/Information Technology (EIT), pp. 416–421. IEEE (2014)
Dutta, T.: Evaluation of the Kinect™ sensor for 3-D kinematic measurement in the workplace. Appl. Ergon. 43, 645–649 (2012)
Sgorbissa, A., Verda, D.: Structure-based object representation and classification in mobile robotics through a microsoft Kinect. Robot. Auton. Syst. 61, 1665–1679 (2013)
Cui, Y., Su, S., Wang, X., et al.: Recognition and feature extraction of kiwifruit in natural environment based on machine vision. Trans. Chin. Soc. Agric. Mach. 44(5), 247–252 (2013)
Acknowledgment
This research was supported by grants from Natural Science Foundation of China (No. 61175099) and Sci-Tech Co-Innovation engineering plan projects of Shaanxi Province (2015KTCQ02-12).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 IFIP International Federation for Information Processing
About this paper
Cite this paper
Wang, B., Chen, Z., Gao, J., Fu, L., Su, B., Cui, Y. (2016). The Acquisition of Kiwifruit Feature Point Coordinates Based on the Spatial Coordinates of Image. In: Li, D., Li, Z. (eds) Computer and Computing Technologies in Agriculture IX. CCTA 2015. IFIP Advances in Information and Communication Technology, vol 478. Springer, Cham. https://doi.org/10.1007/978-3-319-48357-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-48357-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48356-6
Online ISBN: 978-3-319-48357-3
eBook Packages: Computer ScienceComputer Science (R0)