CN118229772B

CN118229772B - Tray pose detection method, system, equipment and medium based on image processing

Info

Publication number: CN118229772B
Application number: CN202410656965.XA
Authority: CN
Inventors: 王蓉; 毛刚挺
Original assignee: Hangzhou Shiteng Technology Co ltd
Current assignee: Hangzhou Shiteng Technology Co ltd
Priority date: 2024-05-24
Filing date: 2024-05-24
Publication date: 2024-08-06
Anticipated expiration: 2044-05-24
Also published as: CN118229772A

Abstract

The embodiment of the invention provides a tray pose detection method, system, equipment and medium based on image processing. The method comprises the following steps: the method comprises the steps of obtaining a depth image of a tray by using a depth camera, generating original point cloud data, preprocessing the original point cloud data to obtain optimized point cloud data, determining a forking plane according to normal vector orientation constraint of the front face of the tray, constructing a point cloud template library of a plane to be forked by using preset tray size information, and determining the center position and the rotation angle of the tray by matching the optimized point cloud data with templates. The invention improves the accuracy and the detection efficiency of the position and the posture detection of the tray.

Description

Tray pose detection method, system, equipment and medium based on image processing

Technical Field

The embodiment of the disclosure relates to the technical field of image data processing, in particular to a tray pose detection method, system, equipment and medium based on image processing.

Background

In the logistics and warehousing fields, pallets are widely used in the handling, storage and transportation of goods. In order to realize an automated operation, it is important to accurately detect the position and posture of the tray. However, the inventors have found that the existing tray pose detection methods have some problems.

Conventional tray pose detection methods typically rely on two-dimensional image recognition techniques. The method captures the image of the tray through the camera, and performs edge detection and feature extraction by utilizing an image processing algorithm, so that the position and the posture of the tray are determined. However, the two-dimensional image recognition technology is affected by various factors such as illumination conditions, angle changes, shielding objects and the like, so that the accuracy and stability of pose detection are limited. In addition, the conventional tray pose detection method has difficulty in handling trays of different sizes and shapes. Because of the different sizes and shapes of trays, conventional approaches often require parameter adjustment or retraining algorithms for different sized trays, which increases the complexity and maintenance costs of the system. The traditional tray pose detection method also faces the problem of environmental interference. In practical applications, trays may be placed in complex environments, such as stacked goods, uneven illumination, etc., which may cause the conventional method to fail to accurately identify the tray or to generate misidentification.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a tray pose detection method, system, device and medium based on image processing to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a tray pose detection method based on image processing, the method including:

a depth camera is used for obtaining a depth image of the tray, and tray original point cloud data are generated according to the depth image;

preprocessing the tray original point cloud data, wherein the preprocessing comprises outlier removal, so as to obtain optimized tray point cloud data;

Determining a forking plane of the tray according to normal vector orientation constraint of the front surface of the tray, and constructing a tray plane point cloud template library to be forked based on preset tray size information;

And matching the optimized tray point cloud data with the tray plane point cloud template to be forked so as to determine the center position and the rotation angle of the tray.

As a further improvement of the present application, the preprocessing the tray original point cloud data further includes:

determining the average distance from each point in the tray original point cloud data to the adjacent point;

Calculating the mean value and standard deviation of the domain average distance between all points in the tray original point cloud data;

Comparing the average distance from each point to the adjacent point with the average value of the average distance in the field to obtain a difference value, and if the difference value exceeds the standard deviation of the average distance in the field by a preset multiple, determining the point corresponding to the difference value as an outlier;

And removing the outliers from the tray original point cloud data to obtain optimized point cloud data.

As a further improvement of the present application, the matching the optimized tray point cloud data with the tray plane point cloud template to determine the center position and the rotation angle of the tray further includes:

Performing preliminary matching on the optimized tray point cloud data and the tray plane point cloud template to be forked through a global search algorithm, and positioning an initial matching position;

based on the initial matching position, adjusting the initial matching position through a local optimization algorithm to obtain a matching result;

and calculating the center position and the rotation angle of the tray according to the matching result.

As a further improvement of the present application, the obtaining a depth image of the tray using the depth camera, and generating tray origin point cloud data according to the depth image further includes:

Performing internal reference calibration on the depth camera by using a calibration plate with a preset geometric shape to obtain an internal reference calibration result;

determining corner coordinates based on the internal reference calibration result;

generating internal parameters of the depth camera through a reprojection error algorithm based on the corner coordinates, wherein the internal parameters comprise focal length, principal point coordinates and distortion coefficients;

and constructing a depth camera model based on the internal parameters, inputting the acquired depth image into the depth camera model, and generating the tray original point cloud data.

As a further improvement of the present application, the method further comprises:

carrying out iterative processing on the tray original point cloud data through a RANSAC algorithm, and fitting a ground plane model based on an iterative result, wherein the ground plane model comprises a fitting plane;

setting a fitting plane threshold value through the fitted ground plane model;

Determining that points in the tray original point cloud data, of which the distance from the fitting plane is smaller than the fitting plane threshold value, are ground point clouds;

and removing the ground point cloud from the tray original point cloud data.

generating a pallet plane three-dimensional model to be forked according to the structural characteristics of the pallet and preset pallet size information;

constructing a pallet to-be-forked plane point cloud template based on the pallet to-be-forked plane three-dimensional model, wherein the template comprises key geometric features and a space structure of the pallet;

And storing the constructed pallet plane point cloud templates to be forked in the pallet plane point cloud template library.

detecting a change in the size or shape of the tray in real time;

When a tray which needs to be adapted to a new size is detected, receiving and processing the new tray size information, regenerating a tray to-be-forked plane point cloud template corresponding to the new size, and adding the newly generated template into the tray to-be-forked plane point cloud template library;

And if the historical tray is confirmed to be eliminated or not used, removing the tray to-be-forked plane point cloud template corresponding to the size from the tray to-be-forked plane point cloud template library.

In a second aspect, some embodiments of the present disclosure provide an image processing-based tray pose detection system, the system comprising:

the data generation module is used for acquiring a depth image of the tray by using the depth camera and generating tray original point cloud data according to the depth image;

the pretreatment module is used for carrying out pretreatment on the tray original point cloud data, wherein the pretreatment comprises outlier removal, and optimized tray point cloud data are obtained;

The template library construction module is used for determining a forking plane of the tray according to the normal vector orientation constraint of the front surface of the tray and constructing a tray plane point cloud template library to be forked based on preset tray size information;

and the determining module is used for matching the optimized tray point cloud data with the tray plane point cloud template to be forked so as to determine the center position and the rotation angle of the tray.

In a third aspect, some embodiments of the present invention provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the invention provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantageous effects: the three-dimensional point cloud data acquired by the depth camera can more accurately capture the space structure and the shape of the tray, effectively resist the interference of environmental factors such as illumination, angles and the like, and meanwhile, the dynamically updated tray template library enables the system to flexibly adapt to trays of different sizes and shapes without complex adjustment, and the operation efficiency of the logistics system is further improved through a matching algorithm.

Drawings

The above and other features, advantages and aspects of embodiments of the present invention will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic flow chart of steps of an embodiment of a tray pose detection method based on image processing according to the present application;

FIG. 2 is a schematic diagram of functional modules of an embodiment of an image processing-based tray pose detection system of the present application;

FIG. 3 is a schematic diagram of an embodiment of an electronic device of the present application;

FIG. 4 is a schematic diagram illustrating the structure of an embodiment of a storage medium according to the present application.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates a flow 100 of some embodiments of an image processing-based tray pose detection method according to the present disclosure. The tray pose detection method based on image processing comprises the following steps:

Step 101, a depth camera is used for obtaining a depth image of a tray, and tray original point cloud data are generated according to the depth image;

It should be noted that, the depth camera can obtain the distance information between each point in the scene and the camera, each pixel value in the output image represents the distance from the camera to the object surface corresponding to the pixel point, the depth image is a special image, wherein each pixel value is not a color, but represents the distance from the object surface corresponding to the pixel point to the camera; a point cloud is a data structure that contains a large number of points in three-dimensional space, typically consisting of X, Y, Z coordinates, possibly along with other information about color, normal, intensity, etc.

Preferably, depth camera type/model: for example, a depth camera such as Microsoft's Kinect V2 or Intel's REALSENSE D, 435 may be used to capture depth images of the tray. When the depth camera photographs the tray, it generates a depth image. In this image, the value of each pixel represents the distance of the camera from that point on the tray (or object behind the tray). Such an image differs from a conventional color image in that it does not provide color information, but rather provides spatial distance information.

Preferably, in the present invention, the point cloud data is a set of points in a three-dimensional space converted from the depth image. Each point has its X, Y, Z coordinates in three dimensions. Based on the principle of parallax or other depth perception techniques, a depth camera can measure the distance from the object surface to the camera corresponding to each pixel. Using these distance information and camera parameters (e.g., focal length, principal point, etc.), we can convert two-dimensional pixel coordinates into three-dimensional spatial coordinates by a certain algorithm (e.g., pinhole camera model), thereby generating point cloud data. The specific steps of applying the algorithm may be: 1. reading a depth image and obtaining a depth value of each pixel; 2. converting the two-dimensional coordinates and the depth value of each pixel into three-dimensional space coordinates according to the internal parameters of the camera; 3. combining all the three-dimensional points obtained by conversion to form point cloud data; 4. optionally, the point cloud data is further filtered and optimized to remove noise and outliers. Through the steps, the original point cloud data of the tray can be generated from the depth image, and a basis is provided for subsequent processing and analysis.

102, Preprocessing the tray original point cloud data, wherein the preprocessing comprises outlier removal, and optimized tray point cloud data are obtained;

preferably, the preprocessing step is to improve the quality of the point cloud data and reduce the influence of noise and abnormal points on subsequent processing. Statistical outlier removal methods are used. For each point, its average distance to its k nearest neighbors is calculated. A distance threshold is set and if the average distance from a point to its k nearest neighbors exceeds this threshold, then the point is considered an outlier and is removed from the point cloud data. By this method, abnormal points due to measurement errors, reflection, or other causes can be effectively removed.

Step 103, determining a forking plane of the tray according to the normal vector orientation constraint of the front surface of the tray, and constructing a tray plane point cloud template library to be forked based on preset tray size information;

Preferably, a virtual model corresponding to a real tray may be created in the three-dimensional modeling software according to known tray sizes (e.g., length, width, height, etc.) based on preset tray size information. And extracting point cloud data of the forking plane from the virtual model to serve as a template. For different types of trays (e.g., different sizes, shapes), it is necessary to create corresponding templates. The templates are stored in a library for subsequent matching operations.

And 104, matching the optimized tray point cloud data with a tray plane point cloud template to be forked so as to determine the center position and the rotation angle of the tray.

Preferably, the matching operation may be performed using an ICP (closest point iteration) algorithm or other point cloud matching algorithm. These algorithms find the best matching location and direction by minimizing the distance between the two point clouds.

Preferably, once the best matching position and orientation is found, the center position of the tray and the rotation angle relative to the template can be calculated. This information is critical to subsequent handling or positioning operations. For example, a forklift can accurately fork pallets based on such information. Through the specific implementation mode, the position and the posture of the tray can be accurately detected, so that the automation level and the efficiency of the logistics and warehousing system are improved.

In the following description, in connection with a specific example, a standard tray of 1200mm x 1000mm was selected in the example, with a depth camera resolution of 640x480 and a shooting distance of 2 meters.

The tray was photographed at a distance of 2 meters using a depth camera, and a 640x480 depth image was acquired. Through a depth image conversion algorithm, tray original point cloud data containing tens of thousands of points can be obtained. These point cloud data are preprocessed, for example, by counting the neighborhood distance for each point to identify and remove outliers that deviate significantly from the average distance.

The pallet's plane of bifurcation is determined based on the structural characteristics (e.g., planarity) and normal vector orientation constraints of the pallet.

Using the pallet size information (1200 mm x 1000 mm), we constructed a corresponding pallet to-be-forked planar point cloud template and added it to the template library.

And matching the preprocessed tray point cloud data with the template, and finding the optimal matching position through an iterative nearest point algorithm.

From the matching results, the center position of the tray was calculated to be at (x=0, y=0, z=2000 mm, relative to the camera coordinate system) and at 0 degrees relative to the rotation angle of the standard template (i.e., the front of the tray was facing the camera). Thus, the space position and posture information of the tray can be accurately acquired.

Further, preprocessing the tray original point cloud data further includes:

Preferably, a suitable k value is selected based on the density of the point cloud data and the expected noise level. This k value represents the number of nearest neighbors to consider in computing the neighborhood of each point. For example, if the point cloud data is denser, the k value may be set slightly larger to better reflect the local neighborhood structure of the point. For each point in the point cloud, a spatial search algorithm (e.g., a k-d tree) is used to find its k nearest neighbors. These nearest neighbors are the k points nearest to the current point. The distances between each point and its k nearest neighbors are averaged to obtain the average distance of that point to its neighbors. This average distance reflects the positional relationship of the point within its local vicinity.

Preferably, the entire point cloud data is traversed, and the domain average distances for all points are accumulated. Dividing the accumulated average distance by the total number of points to obtain the average value of the average distance of the field. This average represents a "typical" value of the average distance from a point to its neighbors throughout the point cloud. The standard deviation is an index for measuring the degree of dispersion of the data distribution. To calculate it, it is necessary to first calculate the square of the difference between the domain mean distance and the mean value for each point, then accumulate these squared differences, divide by the total number of points, and finally take the square root. The standard deviation reflects the fluctuation of the average distance from a point to its neighbors.

Preferably, after outliers are identified, the points are marked or stored in a separate list. Finally, these data marked as outliers are deleted from the original point cloud data. This can be achieved by creating a new point cloud data set, which contains only data that is not marked as outliers. After the outliers are deleted, visual inspection can be performed on the optimized point cloud data to ensure that the outliers have been effectively removed and the overall structure of the data is improved.

In the point cloud data, outliers refer to points that are significantly deviated from other points, which may be caused by measurement errors, noise, or other interference factors. For each point in the point cloud data, its domain mean distance refers to the average of the distances between that point and its k nearest neighbors. The standard deviation is a statistic that measures the degree of dispersion of the data distribution, and the larger the standard deviation is, the more the data distribution is dispersed.

In connection with a specific example, a tray origin point cloud data containing 10000 points is selected. To remove outliers, k=10 can be set, i.e. consider the average distance of each point from its nearest 10 neighbors. For each point, its average distance from the nearest 10 neighbors is first calculated. Next, the average value of the domain mean distances of all points was further calculated to be 0.5 cm, and the standard deviation was calculated to be 0.1 cm. By traversing each point, its domain mean distance is compared to the mean value of 0.5 cm. A point can be considered an outlier if the difference in the domain mean distance from the mean of that point exceeds 3 times the standard deviation (i.e., 0.3 cm).

Finally, all the data identified as outliers are deleted from the original point cloud data, resulting in a cleaner, optimized point cloud dataset containing 9800 points (assuming 200 points are identified as outliers and deleted). Through the steps, outliers in the original point cloud data of the tray can be effectively removed, and a more accurate data basis is provided for subsequent point cloud data processing and analysis.

Further, we take collecting tray point cloud data in the actual environment of the factory as an example, because some noise is often present in the point cloud data, the subsequent plane segmentation will be affected, and because of the memory problem of the industrial personal computer, in order to avoid causing memory waste, filtering processing is performed, voxel downsampling filtering is used on the processing speed, straight-through filtering is used on the distance, the range of limiting point cloud in the x direction is within 1.3m to 4.5m, the range of limiting point cloud in the z direction is beyond-0.2 m to 8.0m, and points with distances not within a reasonable range are filtered.

Meanwhile, according to uneven distribution of point clouds generated by the hardware problem of a camera, outliers with sparse distribution need to be removed. The distances from a general point in the point cloud to other points in the field are approximately gaussian distributed, wherein the probability density function of the average distance of the field can be:

Wherein, Representing the distance between any two points,Represents the average value of the average distance of the arbitrary point domain,Represents the standard deviation of the average distance of any point area. By calculating the average distance from each point to K adjacent points in the fieldAnd gives the average value of the average distance of the fieldAnd standard deviationIf the average distance exceeds the average distance value of the field averageThen the point is considered to be an outlier and culled.

Further, the matching the optimized tray point cloud data with the tray plane point cloud template to be forked to determine the center position and the rotation angle of the tray further includes:

it should be noted that the global search algorithm is a search strategy for finding the best matching position in the entire search space. In point cloud matching, the global search algorithm can quickly locate a rough matching region. And performing preliminary matching on the optimized tray point cloud data and the tray plane point cloud template to be forked by using a global search algorithm. This is accomplished by comparing the characteristics of the point cloud data in order to quickly find an initial matching location. After the global search is completed, the algorithm determines an initial matching location, which is the starting point for the subsequent local optimization algorithm.

Preferably, before matching, it is first necessary to ensure that the optimized pallet point cloud data and pallet to-be-forked planar point cloud templates are ready and format compatible. Key features, such as corner points, edges, etc., are extracted from the tray point cloud data and these features are used to match the templates. Using a global search algorithm (e.g., a coarse registration stage of ICP or other global registration method), the portion most similar to the pallet to-be-forked planar point cloud template is found in the entire search space. In this process, the algorithm will try different alignment to find the best preliminary matching location. After the global search is completed, all possible matching locations are evaluated according to the degree of matching (e.g., distance error, degree of overlap, etc.), and the best one is selected as the initial matching location. Parameters of this initial matching location, including translation vectors and rotation matrices, are recorded, which describe the preliminary alignment state between the tray point cloud data and the templates.

It should be noted that, the local optimization algorithm may be understood as performing fine adjustment and optimization in a rough area determined by the global search to find a more accurate matching position. This typically involves iterative computations and error minimization. And based on the position obtained by the preliminary matching, fine adjustment is carried out on the matching position by using a local optimization algorithm. This process typically involves complex mathematical calculations and iterative optimization to find the best match. After local optimization, the algorithm outputs a more accurate matching result describing the best alignment between the tray point cloud data and the templates.

Preferably, the initial matching position obtained by global search can be used as a starting point to set parameters of local optimization, such as iteration times, convergence threshold value and the like, by adjusting the initial matching position through a local optimization algorithm. Iterative calculations are performed using a local optimization algorithm, such as the fine registration stage of ICP or other local optimization method. At each iteration, the algorithm will fine tune the alignment between the tray point cloud data and the templates to minimize the distance error or other matching cost function between them.

It is checked whether a convergence condition is met (e.g. a maximum number of iterations is reached or the matching cost is less than a certain threshold). If the condition is met, stopping iteration; otherwise, continuing the iterative optimization process. When the local optimization algorithm converges, the final matching position and related parameters (translation vector and rotation matrix) are recorded. These parameters describe the best alignment between the tray point cloud data and the templates. To evaluate the quality of the match by calculating the cost of the match (e.g., distance error). If the matching cost is lower, the matching effect is better.

Preferably, the central position of the tray refers to the coordinates of the tray in three-dimensional space, and the rotation angle indicates the rotation state of the tray relative to a certain reference, and the two positions together define the pose of the tray. From the final matching result, the center position and rotation angle of the tray can be calculated, which is critical for subsequent tray processing and operation. Furthermore, the rotation angles of the trays, which describe the orientation of the trays in space, can be extracted by the final rotation matrix.

Further, in connection with the specific illustration, an ICP (ITERATIVE CLOSEST POINT, closest point iteration) algorithm is used as a method of global search and local optimization. In the global searching stage, the maximum iteration number can be set to be 100 times, and the convergence threshold is 0.001 meter, so that an initial matching position can be quickly found. Then, in the local optimization stage, the number of iterations may be increased to 500 times and the convergence threshold reduced to 0.0001 meters to obtain a more accurate matching result.

Finally, from the matching result, the center position of the tray was calculated as (x=1.5 meters, y=2.0 meters, z=0.5 meters), and the rotation angle of the tray with respect to the reference plane was 30 degrees around the X axis, 20 degrees around the Y axis, and 10 degrees around the Z axis. This information will be used in subsequent robotic grasping or logistic processing flows.

Further, point cloud matching can be performed using an ICP algorithm with both coarse and fine registration stages. In the coarse registration stage (global search), a maximum number of iterations of 50 may be set, with a distance threshold of 0.1 meters, to quickly find a rough matching location. Then, in the fine registration stage (local optimization), the number of iterations is increased to 200 times, and a smaller distance threshold (e.g., 0.01 meters) is set to obtain a more accurate matching result.

The resulting matching result may include a translation vector (e.g., [0.5, -0.3, 0.2 ]) and a rotation matrix (convertible to euler angles such as [10 °, 20 °,30 ° ]). From this information we can calculate the central position (e.g., [1.2, 0.8, 0.5 ]) and specific rotation angle of the tray. These parameters will be used in subsequent robot gripping, path planning or logistics processes.

Further, the obtaining the depth image of the tray by using the depth camera, and generating the tray origin point cloud data according to the depth image further includes:

Preferably, in the present invention, the internal parameter calibration may be a process of determining internal parameters of the camera, which describe characteristics of focal length, principal point coordinates, and lens distortion of the camera. The calibration plate may be a planar plate with known geometric features, typically with a checkerboard of alternating black and white, for camera calibration. First, a calibration plate having known geometric characteristics, such as a checkerboard calibration plate, needs to be prepared. The calibration plate is then placed within the field of view of the depth camera and ensures that different portions of the calibration plate are captured by the camera at a plurality of angles and positions. By taking multiple pictures of the calibration plate, enough data can be collected to perform internal reference calibration.

Preferably, the corner coordinates may be the intersections of the checkerboard on the calibration plate, i.e. the intersections of black and white squares, which are used to calculate the internal parameters of the camera during the internal parameter calibration. After the internal reference calibration is performed, an image processing algorithm (such as Harris corner detection) is used for accurately positioning the corner coordinates on the calibration plate. These corner coordinates will be used in the subsequent re-projection error algorithm. In this step we usually use image processing algorithms to detect corner points on the calibration plate. The Harris corner detection algorithm is a common method that finds corner points by calculating the first and second derivatives of the gray scale of an image. Harris corner detection is based on an autocorrelation function of a local area of an image, and the corner position is determined by searching points with obvious autocorrelation function changes. After the Harris corner detection algorithm is applied, the algorithm outputs a series of candidate corner points. By setting a threshold value and a screening condition (such as the value of the corner response function), the final corner coordinates can be determined. These coordinates are used in the calibration process to match the relationship between the world coordinate system and the image coordinate system.

The re-projection error refers to the difference between the projection of the three-dimensional spatial point onto the two-dimensional image by the camera model and the actual observed two-dimensional image point. By minimizing this difference, the internal parameters of the camera can be optimized, making the camera model more accurate. Internal parameters the internal parameters of the camera can be estimated by an iterative optimization process using internal reference calibration algorithms such as the Zhang's calibration method, in combination with known corner coordinates and corresponding image coordinates, including focal length (fx, fy), principal point coordinates (cx, cy) (i.e. the coordinates of the image center point), and distortion coefficients (such as radial distortion and tangential distortion coefficients), describing the imaging model and geometry of the camera.

Preferably, an accurate depth camera model can be constructed by using calibrated internal parameters. This model can convert two-dimensional image coordinates into corresponding three-dimensional space coordinates or project three-dimensional space points onto a two-dimensional image. When the depth camera captures a depth image of the tray, each pixel corresponds to a depth value. These pixel points can be converted into point cloud data in three-dimensional space in combination with internal parameters of the camera. These point cloud data represent three-dimensional shape and position information of the tray surface, providing basic data for subsequent point cloud processing and analysis (e.g., registration, segmentation, identification, etc.). These steps involve extracting corner coordinates from the image, calibrating the camera with these coordinates to obtain accurate internal parameters, and constructing a camera model based on these parameters to generate point cloud data. These processes are important to ensure accuracy of the subsequent tray pose detection.

Further, in connection with the illustration, a depth camera with a resolution of 640x480 is selected for tray pose detection. Before performing the internal calibration, we prepared an 8x6 checkerboard calibration plate, each having a size of 30mm x 30mm. We placed the calibration plate at different distances and angles from the camera and photographed 20 calibration images.

After processing these images by the internal reference calibration algorithm, we obtain the internal parameters of the camera: focal length [ fx, fy ] = [800, 800] (unit: pixel), principal point coordinates [ cx, cy ] = [320, 240] (unit: pixel), and distortion coefficients k1, k2, p1, p2, and the like. These parameters describe the imaging model and geometry of the camera.

Next, a depth camera model can be constructed by using these internal parameters. When the camera captures a depth image of the tray, we input this image into the camera model and combine the internal parameters to generate the raw point cloud data of the tray. The data contains three-dimensional coordinate information of each point on the surface of the tray, and provides a basis for subsequent point cloud processing and analysis.

Further, the method further comprises the steps of:

It should be noted that the RANSAC algorithm (Random Sample Consensus) is a robust parameter estimation method, which is used to estimate parameters of a mathematical model from a dataset containing a large number of outliers, and is commonly used in the fields of computer vision and robotics. The ground plane model refers to a plane geometric model representing the ground position, which is obtained by fitting through a mathematical method and is used for identifying and segmenting out the ground part in the point cloud data.

Preferably, the iterative processing of the tray origin cloud data by the RANSAC algorithm may be understood as randomly selecting three non-collinear points from the tray origin cloud data to initialize a plane model. All points are calculated to be at a distance from this initial plane, a point whose distance is less than some preset threshold being considered an "inner point" (i.e. a point belonging to the ground), and if the distance is greater than or equal to the preset threshold, the point is considered an "outer point", i.e. this point may be noise or not belonging to the ground plane. The plane model is re-estimated using all the data marked as inliers. This is typically done by least squares or other optimization algorithms to obtain a planar model that better fits the interior point data. The steps of initializing, determining the interior points and the exterior points and re-estimating the model are repeated for a plurality of times, and each iteration can obtain a different plane model and a different interior point set. This process continues until a certain termination condition is met, such as a preset maximum number of iterations, or the number of inliers of a certain model is exceeded by a set threshold. Among all the iterated plane models, the model with the largest number of interior points is selected as the final ground plane model, which is considered to be the model most representative of the actual ground plane.

Setting a fitting plane threshold value through the fitted ground plane model;

Preferably, once the optimal set of interior points is determined by the RANSAC algorithm, these interior points can be used to fit a more accurate planar model. This is typically done by least squares, which finds a plane equation such that the sum of the perpendicular distances from all interior points to this plane is minimized. And setting a maximum allowable distance from one point to the plane as a fitting plane threshold according to the obtained ground plane model (comprising the normal vector and the position parameter of the plane). This threshold is used for subsequent point cloud classification.

Preferably, for a planar model to be fitted: 3 points can be randomly selected from the point cloud set each time to construct a plane model, then the distance between the rest points and the plane is calculated, the points which meet the distance threshold value of 0.06m are set as inner points, the points which do not meet the distance threshold value are set as outer points, and after 100 iterations, the model with the largest number of the inner points is selected as the plane model.

Preferably, for each point in the tray origin point cloud data, its distance from the fitted ground plane is calculated. If this distance is less than the previously set fit plane threshold, then this point is considered to be part of the ground point cloud.

And removing the ground point cloud from the tray original point cloud data.

Preferably, it will be appreciated that all points marked as ground point clouds will be deleted from the tray origin point cloud data. This is done to reduce the complexity of subsequent processing and to increase the accuracy of pose detection, as the ground point cloud typically does not contain useful information about the pose of the tray. This procedure is a typical application of the RANSAC algorithm in point cloud processing, which can effectively extract useful geometry information from data containing a lot of noise and outliers.

Further, the description is provided in connection with specific examples. Assume that there is a set of tray origin cloud data, comprising about 10 ten thousand points. The RANSAC algorithm may be set to a maximum number of iterations of 1000 times, with a point-to-plane distance threshold of 0.01 meters (i.e., 1 cm). After iterative processing by the RANSAC algorithm we obtain a ground plane model containing about 8 tens of thousands of interior points. Next, we set the fit plane threshold to be 0.005 meters (i.e., 5 millimeters) to determine which points belong to the ground point cloud. Finally, we removed about 2 ten thousand points identified as ground from the original point cloud data, leaving more accurate tray point cloud data for subsequent pose detection.

Further, the method further comprises:

It should be noted that the pallet to-be-forked plane three-dimensional model can be understood as a digital three-dimensional representation, and shows a specific plane area for forklift to insert and take goods on the pallet, and the model is constructed based on the actual size and structural characteristics of the pallet so as to facilitate subsequent point cloud matching and pose detection. The pallet to-be-forked plane point cloud template can be understood as a point cloud data template derived from the pallet to-be-forked plane three-dimensional model. The method comprises the step of matching the information of the key geometric characteristics and the space structure of the tray with the optimized tray point cloud data so as to determine the center position and the rotation angle of the tray.

Preferably, before the three-dimensional model is generated, the accurate size and structural characteristics of the tray need to be acquired, basic data is provided for subsequent three-dimensional modeling, such as accurate measurement of the actual tray by using a measuring tool (such as a tape measure, a caliper, etc.), recording the length, width, height and the position and size of key structures (such as a cross beam and a longitudinal beam), and if the tray has a design drawing, the information can be directly acquired from the drawing. The creation of the pallet to-be-forked planar three-dimensional model can use three-dimensional modeling software (such as SolidWorks, autoCAD and the like) to create an accurate three-dimensional model according to the collected pallet size and structure information. In this model, special attention needs to be paid to the part, to be inserted, of the forklift, namely the plane to be forked, so that the accuracy of the geometric characteristics of the forklift is ensured.

And (3) deriving point cloud data from the established three-dimensional model, wherein the data form a point cloud template of the tray plane to be forked. In the deriving process, the density and the precision of the point cloud can be set so as to adapt to the subsequent point cloud matching requirement. And analyzing the derived point cloud data, and extracting key geometric features (such as edges, corner points and the like) and spatial structures (such as the relative positions of the cross beams and the longitudinal beams) of the tray. This information is critical to the subsequent matching process as it helps to improve the accuracy and efficiency of the matching.

Preferably, the constructed pallet to-be-forked plane point cloud template is stored in a special template library so as to be called at any time in the subsequent pallet pose detection process. If desired, a related metadata description, such as creation time, version number, etc., may be added to the template file.

Further, in connection with the illustration, a standard euro standard pallet of size 1200mm x 800mm, the bottom of which is made up of three cross beams and two stringers. To generate a pallet to-be-forked planar three-dimensional model, we use SolidWorks software to model from these actual dimensions and structural characteristics. During the modeling process we are particularly concerned with the position of the cross and stringers at the bottom of the pallet and their relative heights.

After the three-dimensional model is completed, a tray plane point cloud template containing about 10 ten thousand points is generated through a software export function. In this process, we set an appropriate point cloud density to ensure that both the key geometric features of the tray are captured and that the computational burden of subsequent processing is not increased by too dense point clouds. Then, key geometric features such as edge points, corner points and the like of the cross beam and the longitudinal beam at the bottom of the tray can be extracted by analyzing the point cloud template, and the relative position relationship of the key geometric features is recorded. The information is integrated into a data structure to form the final form of the pallet to-be-forked planar point cloud template, and the final form is stored in a pallet to-be-forked planar point cloud template library for use by a subsequent pallet pose detection algorithm.

Further, the method further comprises:

detecting a change in the size or shape of the tray in real time;

Preferably, the pallet plane point cloud template library is understood as a database storing point cloud templates of pallet planes of various sizes and types. When the pose of the tray needs to be detected, the system searches templates matched with the tray to be detected from the library for matching operation.

Detecting the change in the size or shape of the tray in real time can monitor the morphology of the tray in real time by image processing techniques. When a new tray is brought into view or a tray is changed, the system captures these changes and prepares for subsequent size or shape recognition.

preferably, once the system detects a change in tray size or shape, it initiates a size recognition procedure. This procedure measures the critical dimensions of the tray, such as length, width, etc., through image processing techniques. According to the newly acquired pallet size information, the system automatically generates a three-dimensional model of a pallet to-be-forked plane matched with the new size by using a three-dimensional modeling technology. And then, deriving point cloud data from the model to form a new tray plane point cloud template to be forked, wherein the newly generated point cloud template can be added into the existing template library so as to be quickly matched with a corresponding template in the subsequent use.

Preferably, to maintain the efficiency and accuracy of the stencil library, if it is confirmed that a tray of a certain size has been no longer used or is obsolete, the system will delete the tray of that size from the stencil library for the flat point cloud stencil to be forked.

By way of example, assume that the system has originally stored a pallet to be forked flat point cloud template of both 1200mm x 800mm and 1000mm x 600mm dimensions. Now, by real-time detection, the system found a new 1500mm x 1000mm tray into view. The system first captures this new tray and measures its exact dimensions of 1500mm x 1000mm by image processing techniques.

Then, the system automatically generates a three-dimensional model of 1500mm x 1000mm tray to be forked plane according to the new size, and derives point cloud data from the model to form a new point cloud template. This newly generated 1500mm x 1000mm point cloud template would then be added to the pallet to-be-forked planar point cloud template library. If at some future point in time it is confirmed that 1000mm x 600mm pallets are no longer in use, the system will delete pallet to-be-forked flat point cloud templates of this size from the template library to keep the library updated and efficient.

Further, since the quality of the hardware of the depth camera is uneven, the depth point cloud data cannot be used as a research basis, the point cloud of the corresponding plane, namely, the point cloud of the forked plane, needs to be extracted through the normal vector, and the normal vector fluctuation of the forked plane is relatively small, so that the tray point cloud data acquired before needs to be subjected to secondary segmentation, and the plane is obtained by utilizing the space vector inverse cosine theorem:

Wherein, Representing the radians between two spatial vectors,Points representing a point cloud, are to beSet to (-1, 0), i.e., normal vector, setThe point cloud is a set, i.e., a forked planar point cloud. Because of the reasons of camera hardware, the point cloud data is not matched with the data in the actual environment, soThe radian is set to (1.518436,1.623156) which can meet the matching condition to obtain the point cloud of the forked plane。

Preferably, in some embodiments, the present invention may also use Euclidean clustering to aggregate points in a sample space that are closer together into one class, and points that are farther apart into another class:

Where distance represents the distance between two points in space, Points representing a point cloud. When the distance is set to be 0.04m and the distance between the two points is smaller than 0.04m, the tray is judged to be one type, namely the tray is taken.

According to the embodiment, the accurate depth image is obtained by using the depth camera, the point cloud data is generated, noise and abnormal values are removed by combining a preprocessing technology, and the accuracy and quality of the data are effectively improved. The point cloud template library constructed by utilizing the normal vector orientation constraint and the tray size information is matched, so that the matching efficiency and accuracy are improved, and the center position and the rotation angle of the tray can be accurately positioned. In addition, the method has strong self-adaptability, can cope with the change of the size or the shape of the tray in real time, dynamically manages the point cloud template library, and keeps the updating and high efficiency. The characteristics promote the automatic horizontal lifting of industries such as logistics, storage and the like, and achieve the aims of reducing manual intervention and improving working efficiency.

With further reference to fig. 2, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of an image processing-based tray pose detection system, which corresponds to the method embodiment shown in fig. 1, and is particularly applicable to various electronic devices.

As shown in fig. 2, the tray pose detection system 200 based on image processing includes:

the data generation module 201 is configured to acquire a depth image of a tray using a depth camera, and generate tray origin cloud data according to the depth image;

a preprocessing module 202, configured to perform preprocessing on the tray original point cloud data, where the preprocessing includes outlier removal, to obtain optimized tray point cloud data;

The template library construction module 203 is configured to determine a pallet forking plane according to a normal vector orientation constraint of the front surface of the pallet, and construct a pallet plane point cloud template library to be forked based on preset pallet size information;

and the determining module 204 is configured to match the optimized tray point cloud data with a tray plane point cloud template to be forked, so as to determine a center position and a rotation angle of the tray.

It will be appreciated that the modules described in this image processing-based tray pose detection 200 correspond to the steps in the image processing-based tray pose detection method described with reference to fig. 1. Thus, the operations, features, and advantages described above for the image processing-based tray pose detection method are equally applicable to the image processing-based tray pose detection system 200 and the modules included therein, and are not described herein.

Referring now to fig. 3, a schematic diagram of an electronic device 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The terminal device shown in fig. 3 is only one example and should not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that some embodiments of the present disclosure may also include a computer readable medium, and the computer readable storage medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: a depth camera is used for obtaining a depth image of the tray, and tray original point cloud data are generated according to the depth image; preprocessing the tray original point cloud data, wherein the preprocessing comprises outlier removal, so as to obtain optimized tray point cloud data; determining a forking plane of the tray according to normal vector orientation constraint of the front surface of the tray, and constructing a tray plane point cloud template library to be forked based on preset tray size information; and matching the optimized tray point cloud data with the tray plane point cloud template to be forked so as to determine the center position and the rotation angle of the tray.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, and the functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. The tray pose detection method based on image processing is characterized by comprising the following steps of:

Matching the optimized tray point cloud data with a tray plane point cloud template to be forked so as to determine the center position and the rotation angle of the tray;

the step of matching the optimized tray point cloud data with the tray to-be-forked plane point cloud template to determine the center position and the rotation angle of the tray further comprises:

calculating the center position and the rotation angle of the tray according to the matching result;

The preprocessing of the tray origin cloud data further comprises:

Removing the outliers from the tray original point cloud data to obtain optimized point cloud data;

the step of obtaining the depth image of the tray by using the depth camera and generating the tray origin point cloud data according to the depth image further comprises the following steps:

Constructing a depth camera model based on the internal parameters, inputting the acquired depth image into the depth camera model, and generating the tray original point cloud data;

The method further comprises the steps of:

setting a fitting plane threshold value through the fitted ground plane model;

removing the ground point cloud from the tray original point cloud data;

The method further comprises the steps of:

storing the constructed pallet plane point cloud templates to be forked in the pallet plane point cloud template library;

The method further comprises the steps of:

detecting a change in the size or shape of the tray in real time;

2. A tray pose detection system based on image processing, for implementing the tray pose method based on image processing according to claim 1, comprising:

3. An electronic device, comprising:

one or more processors;

A storage device having one or more programs stored thereon;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

4. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to implement the method of claim 1.