CN116309882A - Tray detection and positioning method and system for unmanned forklift application - Google Patents
Tray detection and positioning method and system for unmanned forklift application Download PDFInfo
- Publication number
- CN116309882A CN116309882A CN202310366209.9A CN202310366209A CN116309882A CN 116309882 A CN116309882 A CN 116309882A CN 202310366209 A CN202310366209 A CN 202310366209A CN 116309882 A CN116309882 A CN 116309882A
- Authority
- CN
- China
- Prior art keywords
- tray
- support column
- coordinate system
- point cloud
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000001914 filtration Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000013135 deep learning Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 7
- 230000009466 transformation Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
- G06T2207/30164—Workpiece; Machine component
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a tray detection and positioning method and system for unmanned forklift application, wherein the method comprises the following steps: acquiring a depth image and an RGB image acquired by an RDG-D camera module; establishing a tray image dataset to train a tray detector to predict a tray region and a support column region in an RGB image and reject incomplete trays in the tray region; taking the complete tray as a target tray, taking a support column region as an interested region, aligning an RGB image and a depth image, extracting depth information of the support column region, and converting the depth information of the interested region into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance; dividing the support column surfaces of the target tray, and calculating the barycenter coordinates of the support column surfaces; and extracting a support column triplet of a target tray forking surface, calculating the position and steering angle of the tray under a camera coordinate system, and converting the position and the posture of the tray from the camera coordinate system to a forklift coordinate system, thereby realizing the positioning of the tray.
Description
Technical Field
The invention belongs to the technical field of robot perception, and particularly relates to a tray detection and positioning method and system for unmanned forklift application.
Background
Unmanned fork truck, also known as fork truck AGV or fork mobile robot, fused fork truck technique and AGV technique, compare with ordinary AGV, unmanned fork truck not only can accomplish the material handling of point-to-point, can also realize the commodity circulation transportation of a plurality of production links butt joints. The unmanned forklift can solve the problems of large material flow, high labor intensity of manual transportation and the like in the processes of industrial production and storage logistics operation, promote industrial manufacturing transformation and upgrading, and improve the production efficiency and economic benefit of enterprises.
In a dynamic, unstructured industrial environment, the problem of how to efficiently and accurately detect and position a pallet with large uncertainty due to multiple factors such as operation flow, equipment precision, manual operation and the like is a need to be solved for unmanned forklift applications.
In the related art, a method using multi-sensor fusion is the main research direction at present. The RGB image obtained by the camera has rich texture and color information, and the laser radar or the depth camera can acquire high-precision depth information, so that the detection and pose estimation of the tray are more accurate by fusing multi-sensor information.
Patent CN 112907666A proposes a method, a system and a device for estimating the pose of a tray based on RGB-D, calculating the size of a compressed grid for a sensor acquired image, performing template matching processing to obtain a region of interest of the tray, and extracting the coordinates of a tray support based on the region of interest of the tray. The method adopts a template to match and detect the tray, and when the tray has a certain inclination angle, the matching degree between the tray in the image and the template is reduced; meanwhile, the shielding of cargoes can lead to inaccurate pixel classification, so that the detection precision of the tray is reduced.
Patent CN 113409397A proposes a warehouse pallet detection and positioning method based on an RGBD camera, which uses a pre-trained YOLOv5 model to perform pallet detection, frames a pallet region in an RGB image, and calculates a pallet pose in combination with depth distance information of the pallet region. Patent CN 115272275A proposes a system and a method for detecting and positioning a tray and an obstacle based on an RGB-D camera and a neural network model, which use the neural network to detect a tray region in an RGB image, and perform operations such as point cloud filtering, edge extraction, and target space pose information calculation on a target region. Both methods adopt tray edge characteristic positioning trays, and when the tray angle is large or the tray edge shielding condition occurs, the tray positioning failure or the positioning error is large easily caused.
Patent CN 1 14974968 a proposes a tray recognition and pose estimation method based on a multiple neural network, using UNet network to process the point cloud of the extracted tray region for the target tray in the segmented image, and then using the modified PointNet to output the tray pose. The method adopts the deep neural network to directly divide the tray point cloud and estimate the position and the posture of the tray, the model obtained by training the generated tray data cannot ensure the positioning precision in the actual scene, a great amount of high-precision tray point cloud marking data is required to train, meanwhile, the calculation amount of the deep neural network aiming at point cloud processing is large, the real-time performance is difficult to be ensured,
in a word, most of the existing tray detection and positioning methods based on multi-sensor fusion have higher requirements on the placing pose of the tray, and cannot accurately handle the situation of larger deflection angle of the tray; in addition, the problems of shielding the tray by cargoes, false detection and the like in practical application are not considered in the methods, and the problems of low accuracy and poor robustness of the estimation of the pose of the tray exist.
Disclosure of Invention
The embodiment of the application aims to provide a tray detection and positioning method and system for unmanned forklift application, so as to solve the technical problems of lack of safety processing on false detection and low precision and poor robustness in a tray pose estimation stage in the related technology.
According to a first aspect of an embodiment of the present application, a pallet detection and positioning method for an unmanned forklift application is provided, including:
(1) Image acquisition: acquiring a depth image and an RGB image acquired by an RDG-D camera module;
(2) Tray detection: establishing a tray image dataset to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area;
(3) Data fusion: taking the complete tray obtained in the step (2) as a target tray, taking a support column region as an interested region, aligning an RGB image and a depth image, extracting depth information of the support column region, and converting the depth information of the interested region into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance;
(4) Surface segmentation: performing point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface;
(5) Pose calculation: and extracting a support column triplet of a target tray forking surface, calculating the position and steering angle of the tray under a camera coordinate system, and converting the position and the posture of the tray from the camera coordinate system to a forklift coordinate system, thereby realizing the positioning of the tray.
Further, step (2) includes:
(2.1) acquiring a plurality of tray images, manually marking a tray area and a support column area in each tray image by using a rectangular frame, and performing image preprocessing on the marked tray images so as to establish a tray image data set;
(2.2) training a deep learning-based target detection network using the pallet image dataset, thereby obtaining a trained pallet detector;
(2.3) processing the RGB image by using the trained tray detector to obtain a plurality of tray area rectangular frames and support column area rectangular frames;
and (2.4) judging the support column belonging to each tray according to the overlapping condition of the rectangular frames based on the tray area rectangular frames and the support column area rectangular frames, checking the integrity of the tray, and rejecting all incomplete trays.
Further, step (3) includes:
(3.1) taking the complete tray obtained in the step (2) as a target tray, taking a support column region as an interested region, aligning an RGB image acquired by an RGB-D camera module with a depth image to obtain a support column region of the depth image, and extracting pixel coordinates (u, v) and depth D of the support column region in the depth image;
(3.2) calculating three-dimensional coordinates (X, Y, Z) of pixel points of the support column area under a camera coordinate system according to camera parameters calibrated in advance by the following formula, and generating three-dimensional point cloud data:
Z=D
wherein c x 、f x 、c y 、f y Is a camera parameter calibrated in advance.
Further, step (4) includes:
(4.1) preprocessing the three-dimensional point cloud data using straight-through filtering and voxel grid filtering;
(4.2) dividing the surfaces of the support columns of the target tray in the preprocessed point cloud data by using a random sampling consistency method;
(4.3) for the point cloud data of the support column surface obtained by the segmentation, calculating the point cloud centroid coordinates C (p x ,p y ,p z );
Where n is the number of points on the support column surface, (x) i ,y i ,z i ) Is the spatial coordinates of the point.
Further, step (4.2) includes:
(4.2.1) randomly extracting 3 sample points in the preprocessed point cloud data, fitting a plane equation, and estimating 4 parameters a, b, c, d of the plane equation ax+by+cx+d=0 to obtain a plane to be fixed;
(4.2.2) calculating the distance between each point in the preprocessed point cloud data and the plane to be fixed, and counting the number of points with the distance smaller than a tolerance range d, namely inner points;
(4.2.3) if the number of the inner points of the currently pending plane is larger than a threshold T, re-fitting the plane by using all the inner points to obtain point cloud data of the surface of the support column;
(4.2.4) if the number of the current interior points is less than the threshold T, returning to the above step (4.2.1).
Further, step (5) includes:
(5.1) checking all the centroid triples of the support column surface according to the collinearity and symmetry of the support column structure of the tray for the support column surface separated from the point cloud data in the step (3), and extracting the centroid triples of the target tray;
and (5.2) calculating the pose of the tray under the camera coordinate system by using the centroid triplet of the target tray, and converting the pose of the tray under the camera coordinate system into the unmanned forklift coordinate system by utilizing the coordinate conversion calibrated in advance.
Further, calculating the pose of the tray in the camera coordinate system using the centroid triplet of the target tray in step (5.2) includes:
taking the center of mass coordinates of the middle support column of the tray as the position of the tray, and calculating the corner of the tray under a camera coordinate system by using the centers of mass of the support columns at two sidesWhere deltaz is the difference in z coordinates in the centers of mass of the two side support columns,Δx is the difference in x coordinates in the centers of mass of the two-sided support columns.
According to a second aspect of embodiments of the present application, there is provided a tray detection and positioning system for an unmanned forklift application, including:
the RDG-D camera module is used for acquiring depth images and RGB images;
the tray detection module is used for establishing a tray image data set to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area;
the data fusion module is used for taking the support column region obtained by the tray detection module as a region of interest, aligning the RGB image with the depth image, extracting the depth information of the support column region, and converting the depth information of the region of interest into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance;
the surface segmentation module is used for carrying out point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface;
the pose calculation module is used for extracting the support column triplets of the target tray forking surface, calculating the position and the steering angle of the tray under the camera coordinate system, and converting the pose of the tray from the camera coordinate system to the forklift coordinate system, so that the tray positioning is realized.
According to a third aspect of embodiments of the present application, there is provided an electronic device, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the embodiment, the RGB image and the depth image are fused, the tray detector based on the deep learning can rapidly detect tray information, the image area of the tray is determined, the subsequent calculation cost is greatly reduced, meanwhile, the target tray with larger rotation angle can be processed by the positioning method based on the tray support column information, the application range of the invention is improved, the tray detection method for simultaneously detecting two types of tray information is firstly provided, the detection result can be checked by using the prior information of the tray model while the calculation cost is hardly increased, and false detection information is removed; according to the invention, the tray position is calculated for the first time by utilizing the center of mass of the tray support column, and the support column center of mass triplet of the target tray is extracted by utilizing the priori information of the tray model, so that the positioning accuracy is ensured while the positionable range of the target tray is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flowchart illustrating a method for tray detection and positioning for an unmanned forklift application, according to an exemplary embodiment;
FIG. 2 is a schematic diagram of an RGB-D camera installation and coordinate system according to one exemplary embodiment, where (a) is a side view and (b) is a top view;
FIG. 3 is a schematic diagram of a tray test result, according to an example embodiment;
FIG. 4 is a schematic diagram of a pallet support column centroid triplet shown in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating tray pose calculation according to an exemplary embodiment;
FIG. 6 is a schematic diagram of a framework of a pallet detection and positioning system for an unmanned forklift application, according to an exemplary embodiment;
fig. 7 is a schematic diagram of an electronic device shown according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
The core technology of the invention is to fuse different types of sensor data, quickly detect the tray information of RGB images by using a target detector based on deep learning, combine the depth information to perform point cloud processing, and simultaneously consider the prior information of a tray model to realize an efficient tray detection and positioning system and method.
The invention provides a tray detection and positioning method for unmanned forklift application, as shown in fig. 1, comprising the following steps:
(1) Image acquisition: acquiring a depth image and an RGB image acquired by an RDG-D camera module;
specifically, the RGB-D camera module comprises an RGB camera and a depth sensor, wherein the RGB sensor is used for collecting color and texture information of the environment, and the depth sensor is used for collecting depth information of the environment. Determining an internal reference matrix of a camera in advance by a Zhengyou camera calibration method:
through hand-eye calibration, determining homogeneous transformation matrix from RGB-D camera coordinate system { C } to unmanned forklift coordinate system { R }, and determining homogeneous transformation matrix from RGB-D camera coordinate system { C }, to unmanned forklift coordinate system { R }, and determining homogeneous transformation matrix from RGB-D camera coordinate system { C } to unmanned forklift coordinate system
An RGB-D camera used in an embodiment is Intel RealSense D455, which includes an RGB sensor and an infrared ranging sensor, and the camera is automatically calibrated when started by a built-in algorithm, the RGB-D camera is fixedly mounted on an unmanned forklift and located in the middle of a fork arm, the orientation of the RGB-D camera is consistent with the direction of the fork arm, the coordinate system of the unmanned forklift platform and the sensor is set as shown in fig. 2, including an unmanned forklift coordinate system { R }, an XOY plane parallel to the ground, an X axis coincident with the forward motion direction of the robot, a Z axis vertically oriented upward, a camera coordinate system { C }, a Z axis coincident with the optical axis, pointing forward of the camera, an X axis parallel to the photosensitive plane rightward, a Y axis vertically oriented downward, a pallet coordinate system { P }, an XOY plane parallel to the ground, an origin of the coordinate system coincident with the center of the pallet fork orientation, an X axis vertically oriented pallet fork orientation forward, and a Z axis vertically oriented upward.
Homogeneous transformation matrix from RGB-D camera coordinate system { C } to unmanned forklift coordinate system { R }, and method for transforming sameAnd after the camera is installed, the camera is determined by a hand-eye calibration method.
(2) Tray detection: establishing a tray image dataset to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area; the method specifically comprises the following substeps:
(2.1) acquiring a plurality of tray images, manually marking a tray area and a support column area in each tray image by using a rectangular frame, and performing image preprocessing on the marked tray images so as to establish a tray image data set;
specifically, a public image dataset and/or an onsite sensor are used for collecting tray images, and two types of tray information in the images are manually marked by using rectangular frames: tray and tray support column to save the mark information. And image enhancement methods such as random mosaic, random gray scale, random overturn and the like are used for the marked image, so that the diversity of the data set is improved.
(2.2) training a deep learning-based target detection network using the pallet image dataset, thereby obtaining a trained pallet detector;
specifically, the tray image dataset is divided into a training set and a testing set, a GPU platform is used for offline training of a target detection network based on deep learning, and network parameters are saved to obtain a trained tray detector. The target detection network can be Faster R-CNN, SSD, YOLO series and the like.
The present embodiment uses YOLOv5, the network structure of YOLOv5 is divided into four parts, input, backbone, neck, head. To accommodate the tray detection scenario, the output dimension of YOLOv5 is set to 3× (5+2), 3 represents three template boxes per grid prediction, 5 represents the coordinates (x, y, w, h) and confidence c of each prediction box, and 2 represents 2 information tags of the tray dataset. And using a pre-training weight provided by YOLOv5 under the COCO data set as an initial weight, using an Nvidia GPU platform, performing off-line training on a target detection network based on YOLOv5, and storing network parameters to obtain a trained tray detector.
(2.3) processing the RGB image by using the trained tray detector to obtain a plurality of tray area rectangular frames and support column area rectangular frames, as shown in figure 3;
specifically, all rectangular frames are expressed in the form of (x, y, w, h), wherein x, y respectively represent the abscissa of the center of the rectangular frame, and w and h respectively represent the width and length of the rectangular frame.
(2.4) judging the support column belonging to each tray according to the overlapping condition of the rectangular frames based on the rectangular frames of the tray area and the rectangular frames of the support column area, checking the integrity of the tray, and rejecting all incomplete trays;
specifically, according to the prior information verification of the tray model, the tray information detected in the step (1.2) is verified, and one tray detection frame R is selected first i (x i ,y i ,w i ,h i ) For each support column, a frame R is detected j (x j ,y j ,w j ,h j ) Calculating the area S of the support column detection frame j And an overlapping area S of the tray detection frame and the support column detection frame ij :
S j =w j ×h j
S ij =max(0,x 2 -x 1 )×max(0,y 2 -y 1 )
Wherein x is 1 =max(x i -w i /2,x j -w j /2),y 1 =max(y i -h i /2,y j -h j (2) the upper left corner coordinate of the overlapping region, x 2 =min(x i +w i /2,x j +w j /2),y 2 =min(y i +h i /2,y j +h j And/2) the lower right corner of the overlap region.
Finally, the overlapping area S is calculated ij The duty ratio k in the support column detection frame:
and judging the support columns with k more than or equal to 0.5 as the support columns belonging to the tray, if the number of the support columns owned by the tray is less than 3, considering the tray as an incomplete tray, and eliminating the detection result of the tray. And executing integrity judgment on all the trays to obtain a complete tray detection result. The number of the final complete trays is possibly 1 or 0, and if the number of the complete trays is 1, the target tray is successfully detected, and the subsequent steps are carried out.
(3) Data fusion: taking the complete tray obtained in the step (2) as a target tray, taking a support column region as an interested region, aligning an RGB image and a depth image, extracting depth information of the support column region, and converting the depth information of the interested region into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance; the method specifically comprises the following substeps:
(3.1) taking the complete tray obtained in the step (2) as a target tray, taking a support column rectangular frame of the target tray as an interested region, aligning an RGB image and a depth image acquired by an RGB-D camera module, obtaining a support column region of the depth image, and extracting pixel coordinates (u, v) and depth D of the support column region in the depth image;
specifically, it is assumed that a target tray to be picked up is detected in an RGB image, and a tray support column detection frame thereof is R k (k.gtoreq.3), R is also selected in the depth image since the RGB image is aligned with the depth image k The region is taken as a region of interest, and each pixel point in the depth image region can be represented by a pixel coordinate (u, v) and a depth D.
(3.2) calculating three-dimensional coordinates (X, Y, Z) of pixel points of the support column area under a camera coordinate system { C } according to camera parameters calibrated in advance by the following formula, and generating three-dimensional point cloud data:
Z=D
the method comprises the steps of determining an internal reference matrix of a camera in advance through a Zhengyou camera calibration method:
(4) Performing point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface; the method specifically comprises the following substeps:
(4.1) preprocessing the three-dimensional point cloud data using straight-through filtering and voxel grid filtering;
specifically, the straight-pass filtering determines the position range (x min ,x max ,y min ,y max ,z min ,z max ) Preserving points within the effective detection range, where z min Setting according to the length of a fork arm of the unmanned forklift, z max Y according to the maximum effective detection depth setting of the RGB-D camera min And y max According to the installation height of the camera and the setting of the tray placement height, eliminating the ground and overhigh data, x min And x max And setting according to the visual angle and the detection distance of the camera, and carrying out certain reduction to ensure the detection precision.
And performing downsampling operation on the point cloud data in the effective detection range by using voxel grid filtering, and removing noise points and outliers to enable the point clouds to have the same density at different distances.
(4.2) dividing the surfaces of the support columns of the target tray in the preprocessed point cloud data by using a random sampling consistency (Random Sample Consensus, RANSAC) method;
the surface of the tray support column is expressed as ax+by+cx+d=0 bY a plane equation, and the steps of dividing the plane using RANSAC are as follows:
(4.2.1) since 3 points determine a plane, 3 sample points P in the point cloud data are randomly extracted i (x i ,y i ,z i ) I=1, 2,3, substituting the plane equation, calculating 4 parameters a, b, c, d of the plane equation to obtain the value to be measuredSetting a plane;
(4.2.2) sequentially calculating the distances from other points in the preprocessed point cloud data to the plane to be determined, if the distances are smaller than a threshold d, considering that the points belong to the plane to be determined, namely, internal points, and counting the number of the internal points;
(4.2.3) if the number of the inner points of the currently pending plane is larger than a threshold T, re-fitting the plane by using all the inner points to obtain point cloud data of the surface of the support column;
(4.2.4) if the number of the current interior points is less than the threshold T, repeating the above steps.
The steps (4.2.1) to (4.2.4) are standard steps of the RANSAC method, and will not be described here.
(4.3) for the point cloud data of the support column surface obtained by the segmentation, calculating the point cloud centroid coordinates C (p x ,p y ,p z );
Where n is the number of points on the support column surface, (x) i ,y i ,z i ) Is the spatial coordinates of the point.
In an embodiment, the point cloud processing in step (4) is implemented by a PCL point cloud library.
(5) Pose calculation: extracting a support column triplet of a target pallet forking surface (namely a surface entering by an unmanned forklift fork arm), calculating the position p and the steering angle theta of the pallet under a camera coordinate system { C }, and converting the pose of the pallet from the camera coordinate system { C } to a forklift coordinate system { R }, thereby realizing the positioning of the pallet; the method specifically comprises the following substeps:
(5.1) checking all the centroid triples of the support column surface according to the collinearity and symmetry of the support column structure of the tray for the support column surface separated from the point cloud data in the step (3), and extracting the centroid triples of the target tray;
specifically, a centroid triplet (C 1 ,C 2 ,C 3 ) Calculate two follow-upsLine segmentAnd->Slope k of (2) 1 And k 2 If->Determining centroid triples to be collinear; ordering according to the location of centroid occurrence, if line segment +.>And determining symmetry of the centroid triples. If the centroid triples are collinear and symmetrical, then the centroid triples on the front side of the support column are used as the centroid triples of the target tray, see fig. 4. The step can eliminate the support column which is detected by mistake or the support column which does not belong to the tray fork taking surface, and ensures that the subsequent calculation of the position and the posture of the tray based on the mass center of the support column is accurate.
(5.2) use of centroid triplets (C) of target trays 1 ,C 2 ,C 3 ) Calculating the pose of the tray under the camera coordinate system, and converting the pose of the tray under the camera coordinate system into an unmanned forklift coordinate system by utilizing pre-calibrated coordinate conversion;
specifically, considering that the tray to be picked is generally placed on or parallel to the ground, the tray state estimation is expressed by spatial coordinates (x, y, z) and a rotation angle θ. Barycenter coordinate C using tray intermediate support column 2 (p x ,p y ,p z ) Representing the position of the tray, representing the center of pallet fork, calculating the corner of the tray under the camera coordinate system by using the mass centers of the supporting columns at two sidesWherein Δz is the difference between z coordinates in the centers of the support columns on both sides, Δx is the difference between x coordinates in the centers of the support columns on both sides, fig. 5 shows a schematic diagram of the calculation of the pose of the tray, and finally, the homogeneous transformation matrix from the camera coordinate system { C } to the unmanned forklift coordinate system { R }>The transformation of the tray pose is completed, and the tray pose is transformed to an unmanned coordinate system, so that the method can be used for subsequent forking tasks.
According to the embodiment, the tray detection and positioning speed and the detectable range of the target tray are improved by fusing the tray detection based on deep learning and the positioning method based on the tray support column information, meanwhile, the prior information of the tray model is considered, false detection information is removed, and the accuracy and the robustness of tray detection and positioning are guaranteed.
The present application also provides a tray detection and positioning system for unmanned forklift applications, see fig. 6, which may include:
the RDG-D camera module is used for acquiring depth images and RGB images;
the tray detection module is used for establishing a tray image data set to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area;
the data fusion module is used for taking the support column region obtained by the tray detection module as a region of interest, aligning the RGB image with the depth image, extracting the depth information of the support column region, and converting the depth information of the region of interest into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance;
the surface segmentation module is used for carrying out point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface;
the pose calculation module is used for extracting the support column triplets of the target tray forking surface, calculating the position and the steering angle of the tray under the camera coordinate system, and converting the pose of the tray from the camera coordinate system to the forklift coordinate system, so that the tray positioning is realized.
The specific manner in which the various modules perform the operations in relation to the systems of the above embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.
For system embodiments, reference is made to the description of method embodiments for the relevant points, since they essentially correspond to the method embodiments. The system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Correspondingly, the application also provides electronic equipment, which comprises: one or more processors; a memory for storing one or more programs; and when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the tray detection and positioning method facing the unmanned forklift application. As shown in fig. 7, a hardware structure diagram of an apparatus with optional data processing capability, where the tray detection and positioning method for application to an unmanned forklift is provided in the embodiment of the present invention, except for the processor, the memory and the network interface shown in fig. 7, the apparatus with optional data processing capability in the embodiment generally includes other hardware according to the actual function of the apparatus with optional data processing capability, which is not described herein.
Correspondingly, the application also provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the instructions realize the tray detection and positioning method facing the unmanned forklift application when being executed by a processor. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any device having data processing capabilities. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.
Claims (10)
1. The tray detection and positioning method for unmanned forklift application is characterized by comprising the following steps of:
(1) Image acquisition: acquiring a depth image and an RGB image acquired by an RDG-D camera module;
(2) Tray detection: establishing a tray image dataset to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area;
(3) Data fusion: taking the complete tray obtained in the step (2) as a target tray, taking a support column region as an interested region, aligning an RGB image and a depth image, extracting depth information of the support column region, and converting the depth information of the interested region into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance;
(4) Surface segmentation: performing point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface;
(5) Pose calculation: and extracting a support column triplet of a target tray forking surface, calculating the position and steering angle of the tray under a camera coordinate system, and converting the position and the posture of the tray from the camera coordinate system to a forklift coordinate system, thereby realizing the positioning of the tray.
2. The method of claim 1, wherein step (2) comprises:
(2.1) acquiring a plurality of tray images, manually marking a tray area and a support column area in each tray image by using a rectangular frame, and performing image preprocessing on the marked tray images so as to establish a tray image data set;
(2.2) training a deep learning-based target detection network using the pallet image dataset, thereby obtaining a trained pallet detector;
(2.3) processing the RGB image by using the trained tray detector to obtain a plurality of tray area rectangular frames and support column area rectangular frames;
and (2.4) judging the support column belonging to each tray according to the overlapping condition of the rectangular frames based on the tray area rectangular frames and the support column area rectangular frames, checking the integrity of the tray, and rejecting all incomplete trays.
3. The method of claim 1, wherein step (3) comprises:
(3.1) taking the complete tray obtained in the step (2) as a target tray, taking a support column region as an interested region, aligning an RGB image acquired by an RGB-D camera module with a depth image to obtain a support column region of the depth image, and extracting pixel coordinates (u, v) and depth D of the support column region in the depth image;
(3.2) calculating three-dimensional coordinates (X,) of pixel points of the support column area under a camera coordinate system according to pre-calibrated camera parameters by the following formula, and generating three-dimensional point cloud data:
Z=D
wherein c x 、f x 、c y 、f y Is a camera parameter calibrated in advance.
4. The method of claim 1, wherein step (4) comprises:
(4.1) preprocessing the three-dimensional point cloud data using straight-through filtering and voxel grid filtering;
(4.2) dividing the surfaces of the support columns of the target tray in the preprocessed point cloud data by using a RANSAC method;
(4.3) for the point cloud data of the support column surface obtained by the segmentation, calculating the point cloud centroid coordinates C (p x ,p y ,p z );
Where n is the number of points on the support column surface, (x) i ,y i ,z i ) Is the spatial coordinates of the point.
5. The method of claim 4, wherein step (4.2) comprises:
(4.2.1) randomly extracting 3 sample points in the preprocessed point cloud data, fitting a plane equation, and estimating 4 parameters a, b, c, d of the plane equation ax+by+cx+d=0 to obtain a plane to be fixed;
(4.2.2) calculating the distance between each point in the preprocessed point cloud data and the plane to be fixed, and counting the number of points with the distance smaller than a tolerance range d, namely inner points;
(4.2.3) if the number of the inner points of the currently pending plane is larger than a threshold T, re-fitting the plane by using all the inner points to obtain point cloud data of the surface of the support column;
(4.2.4) if the number of the current interior points is less than the threshold T, returning to the above step (4.2.1).
6. The method of claim 1, wherein step (5) comprises:
(5.1) checking all the centroid triples of the support column surface according to the collinearity and symmetry of the support column structure of the tray for the support column surface separated from the point cloud data in the step (3), and extracting the centroid triples of the target tray;
and (5.2) calculating the pose of the tray under the camera coordinate system by using the centroid triplet of the target tray, and converting the pose of the tray under the camera coordinate system into the unmanned forklift coordinate system by utilizing the coordinate conversion calibrated in advance.
7. The method of claim 6, wherein calculating the pose of the tray in the camera coordinate system using the centroid triplet of the target tray in step (5.2) comprises:
taking the center of mass coordinates of the middle support column of the tray as the position of the tray, and calculating the corner of the tray under a camera coordinate system by using the centers of mass of the support columns at two sidesWhere Δz is the difference in z-coordinates in the centers of the two-sided support columns and Δx is the difference in x-coordinates in the centers of the two-sided support columns.
8. Tray detection and positioning system towards unmanned fork truck application, characterized in that includes:
the RDG-D camera module is used for acquiring depth images and RGB images;
the tray detection module is used for establishing a tray image data set to train a tray detector, predicting a tray area and a support column area in the RGB image by using the trained tray detector, and eliminating incomplete trays in the tray area;
the data fusion module is used for taking the support column region obtained by the tray detection module as a region of interest, aligning the RGB image with the depth image, extracting the depth information of the support column region, and converting the depth information of the region of interest into three-dimensional point cloud data under a camera coordinate system based on camera parameters calibrated in advance;
the surface segmentation module is used for carrying out point cloud filtering, plane segmentation and geometric information extraction on the three-dimensional point cloud data of the support column region, segmenting the support column surface of the target tray, and calculating the centroid coordinates of each support column surface;
the pose calculation module is used for extracting the support column triplets of the target tray forking surface, calculating the position and the steering angle of the tray under the camera coordinate system, and converting the pose of the tray from the camera coordinate system to the forklift coordinate system, so that the tray positioning is realized.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310366209.9A CN116309882A (en) | 2023-04-07 | 2023-04-07 | Tray detection and positioning method and system for unmanned forklift application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310366209.9A CN116309882A (en) | 2023-04-07 | 2023-04-07 | Tray detection and positioning method and system for unmanned forklift application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116309882A true CN116309882A (en) | 2023-06-23 |
Family
ID=86787068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310366209.9A Pending CN116309882A (en) | 2023-04-07 | 2023-04-07 | Tray detection and positioning method and system for unmanned forklift application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116309882A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117058218A (en) * | 2023-07-13 | 2023-11-14 | 湖南工商大学 | Image-depth-based online measurement method for filling rate of disc-type pelletizing granule powder |
CN118229772A (en) * | 2024-05-24 | 2024-06-21 | 杭州士腾科技有限公司 | Tray pose detection method, system, equipment and medium based on image processing |
-
2023
- 2023-04-07 CN CN202310366209.9A patent/CN116309882A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117058218A (en) * | 2023-07-13 | 2023-11-14 | 湖南工商大学 | Image-depth-based online measurement method for filling rate of disc-type pelletizing granule powder |
CN117058218B (en) * | 2023-07-13 | 2024-06-07 | 湖南工商大学 | Image-depth-based online measurement method for filling rate of disc-type pelletizing granule powder |
CN118229772A (en) * | 2024-05-24 | 2024-06-21 | 杭州士腾科技有限公司 | Tray pose detection method, system, equipment and medium based on image processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112476434B (en) | Visual 3D pick-and-place method and system based on cooperative robot | |
CN107507167B (en) | Cargo tray detection method and system based on point cloud plane contour matching | |
US9327406B1 (en) | Object segmentation based on detected object-specific visual cues | |
CN105021124B (en) | A kind of planar part three-dimensional position and normal vector computational methods based on depth map | |
EP3168812B1 (en) | System and method for scoring clutter for use in 3d point cloud matching in a vision system | |
CN112017240B (en) | Tray identification and positioning method for unmanned forklift | |
CN110648367A (en) | Geometric object positioning method based on multilayer depth and color visual information | |
CN112132523B (en) | Method, system and device for determining quantity of goods | |
CN116309882A (en) | Tray detection and positioning method and system for unmanned forklift application | |
CN110910350B (en) | Nut loosening detection method for wind power tower cylinder | |
CN111260289A (en) | Micro unmanned aerial vehicle warehouse checking system and method based on visual navigation | |
CN107977996B (en) | Space target positioning method based on target calibration positioning model | |
CN113191174B (en) | Article positioning method and device, robot and computer readable storage medium | |
Sansoni et al. | Optoranger: A 3D pattern matching method for bin picking applications | |
US20210371260A1 (en) | Automatic detection and tracking of pallet pockets for automated pickup | |
US20230297068A1 (en) | Information processing device and information processing method | |
CN116309817A (en) | Tray detection and positioning method based on RGB-D camera | |
CN112734844A (en) | Monocular 6D pose estimation method based on octahedron | |
CN114170521B (en) | Forklift pallet butt joint identification positioning method | |
CN114972421A (en) | Workshop material identification tracking and positioning method and system | |
CN114241269A (en) | A collection card vision fuses positioning system for bank bridge automatic control | |
CN115546202A (en) | Tray detection and positioning method for unmanned forklift | |
Gao et al. | Improved binocular localization of kiwifruit in orchard based on fruit and calyx detection using YOLOv5x for robotic picking | |
CN110136193B (en) | Rectangular box three-dimensional size measuring method based on depth image and storage medium | |
Kiddee et al. | A geometry based feature detection method of V-groove weld seams for thick plate welding robots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |