CN110238840B - Mechanical arm autonomous grabbing method based on vision - Google Patents
Mechanical arm autonomous grabbing method based on vision Download PDFInfo
- Publication number
- CN110238840B CN110238840B CN201910335507.5A CN201910335507A CN110238840B CN 110238840 B CN110238840 B CN 110238840B CN 201910335507 A CN201910335507 A CN 201910335507A CN 110238840 B CN110238840 B CN 110238840B
- Authority
- CN
- China
- Prior art keywords
- grabbing
- image
- label
- function
- capture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004088 simulation Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000003042 antagnostic effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 38
- 238000013135 deep learning Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of robots, in particular to a mechanical arm automatic grabbing method based on vision. The corrective grabbing strategy based on the antagonistic grabbing rule is provided, and trial and error grabbing can be performed on the simulation platform by using the corrective grabbing strategy to obtain the grabbing samples meeting the rule. The sample acquired by the method clearly expresses the capture mode of the anti-capture rule, and is beneficial to the learning of the model. The whole data acquisition process does not need manual intervention or any real data, and the problem possibly brought by real data acquisition is avoided. Only a small amount of simulation data acquired by the method is needed, and the trained model can be directly applied to different real capturing scenes. The whole training process does not need domain self-adaptation and domain randomization operation, and the accuracy and the robustness are high.
Description
Technical Field
The invention relates to the technical field of robots, in particular to a mechanical arm automatic grabbing method based on vision.
Background
Robot grabbing is mainly divided into two directions of an analysis method and an experience method. The analysis method generally refers to the construction of force closure grabbing based on the rules defined by four attributes of flexibility, balance, stability and dynamic certainty. This approach can generally be built into a constrained optimization problem. The empirical method is a data-driven method, and generally refers to extracting the feature representation of an object based on data, and then implementing a grabbing decision by using a set grabbing heuristic rule.
As deep learning has made tremendous progress in the field of computer vision, it has also begun to gain extensive attention and research in the field of robotics. A Learning to Grasp from 50K Tries and 700Robot homes. A robot trial-and-error grabbing mode is utilized to collect a 50000 grabbing data set, and a deep neural network is trained to realize the decision of grabbing angles. This method has 73% accuracy for unseen objects. Levine et at, left Hand-editing for a Robotic grading with Deep Learning and Large-Scale Data Collection. 800000 captured datasets were collected over two months using 6-14 robots and an evaluation model was trained using the datasets. The model can evaluate the action command according to the current scene to find out the optimal action command. This method can achieve 80% capture accuracy.
The methods have high capturing success rate, but the robots are required to capture and trial and error to acquire data. This approach is time and labor consuming and presents a significant safety hazard.
Mahler et al Dex-Net 2.0 Deep Learning to plant Robust scales with Synthetic Point clocks and analytical Grasp meters. And sampling object grabbing points in the simulation platform based on the anti-grabbing rule, and then obtaining sampling points with high robustness in a force closing mode. Based on the data obtained in the mode, a grabbing quality evaluation neural network is trained, and the grabbing success rate of the method on the countermeasures can be up to 93%. Although the method can have higher accuracy, the data size required by the training model is very large, and one reason of the method is that the acquired sample data does not clearly reflect the defined capture mode.
Disclosure of Invention
The invention provides a vision-based mechanical arm autonomous grasping method for overcoming at least one defect in the prior art, and data grasped in a simulation platform by the method is favorable for model learning.
In order to solve the technical problems, the invention adopts the technical scheme that: a mechanical arm autonomous grabbing method based on vision comprises the following steps:
s1, in a simulation environment, building an environment similar to a real scene, and collecting a global image;
s2, processing the data, wherein the preprocessed data comprise: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image; the treatment process comprises the following steps: firstly, generating an object mask according to a position set of pixels where an object is located in an image, then generating a label mask according to the object mask, a capture pixel position and a capture label, and generating a label graph by using the capture position and the capture label; then discretizing the grabbing angle according to the grabbing problem definition;
s3, training a deep neural network:
(1) normalizing the input RGB images, and then synthesizing a batch;
(2) transmitting the batch of data into a full convolution neural network to obtain an output value;
(3) calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, and calculating by the following loss function:
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,an output characteristic diagram representing the last convolutional layer;representing a real number domain, wherein the corresponding superscript represents the dimension of the tensor;
and S4, applying the trained model to a real grabbing environment.
The invention provides a mechanical arm autonomous grabbing method based on vision, which trains an end-to-end deep neural network capable of realizing pixel-level grabbing prediction by acquiring a small amount of grabbing data in a simulation environment, and a learned model can be directly applied to a real grabbing scene. The whole process does not need to use domain self-adaption and domain randomization operation, and does not need any data collected by a real environment.
Further, the step S1 specifically includes:
s11, placing a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed in a working space of a simulation environment;
s12, placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in the image and a grabbing angle, and then randomly selecting an angle to allow the mechanical arm to perform trial-and-error grabbing;
s13, judging whether the grabbing is successful, and if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the grabbing failed label l; if the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored.
Further, the definition of the grabbing problem comprises: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,is a 3-dimensional one-bit effective code used for representing the grabbing function; the grabbing function is divided into three types, namely, grippable function, non-grippable function and background function; when projected into image space, capture at image I may be represented asWhereinIndicating the position of the grasp in the image,representing a discrete grabbing angle; each pixel in the image may define a capture function, so the entire capture function graph may be represented as:whereinA capture function graph for an image at a given ith angle; in the figure, 3 channels respectively represent three categories of graspable, non-graspable and background; from each grab function graph CiIn the first channelAnd are combined together to form Representing the real domain, the corresponding superscript represents the dimension of the tensor.
Further, the most robust grab point is obtained by solving the following formula:
i*,h*,w*=argmaxi,h,wG(i,h,w)
where G (i, h, w) represents the confidence of the graspable function in the rotation angle and image position. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotationAnd then grabbing is performed.
Further, during the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
in the formula (I), the compound is shown in the specification,for image I rotateThe image after the degree of the image is,is composed ofA corresponding grabbing function diagram; f. ofθImplemented with a deep neural network; in conjunction with the loss function, the overall training objective may be defined by the following equation:
Further, considering a scene in which only one object is placed in the working space, c1 and c2 are defined as contact points of two fingers of the gripper and the object, n1 and n2 are corresponding normal vectors thereof, and g is defined as a grabbing direction of the gripper in the image space, wherein c1, c2, n1, n2,By the above definition, it is possible to obtain:
wherein, | | · | | represents norm operation;
defining a fetch operation as a resistive fetch, when it satisfies the following condition:
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; wherein ω is1And ω2Two contact points for gripping direction and contact with objectThe included angle of the surface normal vector; when the gripper gripping direction is parallel to the normal vector of the contact point, the grip is defined as a stable counter grip.
All data are collected in the simulation platform, any real data are not needed, and the problem possibly caused by collecting the data in the real environment is avoided. In the simulation platform, the confrontation grabbing rules are added, so that the acquired data can effectively reflect the corresponding grabbing mode, and the trained model can be directly applied to a real grabbing scene only by a very small amount of grabbing data. The invention realizes the capture function prediction of the end-to-end pixel level by using the full-volume machine neural network. Each output pixel can capture the global information of the input image, which enables the model to learn more efficiently and make more accurate predictions.
Further, the step S4 includes:
s41, acquiring an RGB (red, green and blue) image and a depth image of a working space by using a camera;
s42, carrying out normalization processing on the RGB images, and rotating the RGB images by 16 angles to transmit the RGB images into the model to obtain 16 capture function images;
s43, according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained;
and S44, mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to the grabbing angle after the image position reaches the position right above the object, and judging the descending height of the mechanical arm according to the collected depth map to avoid collision.
Further, the step S42 specifically includes: the image input into the full convolution neural network model comprises a global image of the whole space, firstly, Resnet50 is used as an encoder to extract features, then, a four-layer bilinear difference value and convolution sampling module is used, and finally, a 5x5 convolution is used for obtaining an input through-scale grabbing function diagram.
Compared with the prior art, the beneficial effects are:
1. the invention provides a corrective grabbing strategy based on an antagonistic grabbing rule, and trial and error grabbing can be carried out on a simulation platform by utilizing the strategy to obtain a grabbing sample according with the rule. The sample acquired by the method clearly expresses the capture mode of the anti-capture rule, and is beneficial to the learning of the model. The whole data acquisition process does not need manual intervention or any real data, and the problem possibly brought by real data acquisition is avoided.
2. Only a small amount of simulation data acquired by the method is needed, and the trained model can be directly applied to different real capturing scenes. The whole training process does not need domain self-adaptation and domain randomization operation, and the accuracy and the robustness are high.
3. A full convolution depth neural network is designed, the network inputs images containing the information of the whole working space, and the capturing function of each pixel point is output and predicted. The network structure of global input and pixel level prediction can learn corresponding grabbing modes faster and better.
Drawings
FIG. 1 is a diagram illustrating parameters defined in anti-snatching rules in a simulator according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a full convolution neural network of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
Example 1:
defining the grabbing problem: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,is a 3-dimensional one-bit efficient code used to represent the grab function. The grabbing function is divided into three types, i.e. grippable function, non-grippable function and background function. When projected into image space, capture at image I may be represented asWhereinIndicating the position of the grasp in the image,representing discrete grasping angles. Discretization can reduce the complexity of the learning process. Thus, each pixel in the image may define a capture function, so the entire capture function graph may be represented as:whereinThe capture function graph of the image at the given ith angle is obtained. In the figure, 3 channels respectively represent three categories of graspable, non-graspable and background. From each grab function graph CiIn the first channel(i.e., snatchable functional channel) and combined together Thus, the most robust grab point can be obtained by solving the following equation:
i*,h*,w*=argmaxi,h,wG(i,h,w)
wherein G (i, h, w) represents the rotation angle and the image positionConfidence of the grab function. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotationAnd then grabbing is performed.
During the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
for image I rotateThe image after the degree of the image is,is composed ofAnd (5) corresponding grabbing function diagrams.
fθMay be implemented with a deep neural network; for example, learning is performed by using a gradient descent method to obtain an expression of a function, data is input into a neural network to obtain a prediction output, the prediction output is compared with a real label to obtain an error, the error is propagated backwards to obtain a gradient of each parameter in the neural network, and finally the parameters are updated by using the gradients to make the output of the neural network closer to the real label, so that a specific expression of the function is obtained by learning.
In conjunction with the loss function, the overall training objective may be defined by the following equation:
Collecting simulation data: the invention defines the rule of object fighting grabbing in the image space. Consider a scene in which only one object is placed in the workspace. C1 and c2 are defined as the contact points of the two fingers and the object, n1 and n2 are their corresponding normal vectors, and g is the direction of capture of the gripper in image space. c1, c2, n1, n2,As shown in fig. 1; by the above definition, it is possible to obtain:
| | · | | represents a norm operation. The invention defines a grabbing operation as a counter-grabbing when it satisfies the following condition:
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; omega1And ω2The included angle between the grabbing direction and the normal vector of the surfaces of two contact points contacted with the object is shown. In general, a gripper is defined as a stable pair when the direction of the gripper's grasp is parallel to the normal vector of the point of contactAnd (4) resisting grabbing.
In a practical implementation, the present invention uses a corrective grab strategy to achieve the collection of samples that satisfy the resistive grab rule. Firstly, a grabbing angle and a pixel position containing an object are randomly selected, and then the information of the whole working space is recorded by a camera. And then controlling the mechanical arm to perform trial and error grabbing, and if grabbing fails, storing the working space image I, the grabbing pixel position p, the set C of all pixel positions occupied by the object in the image, the grabbing angle psi and the label l. If the grabbing is successful, the position of the object is changed due to the contact of the gripper and the object, the grabbing direction of the gripper is approximately parallel to the normal vector of the contact point due to the correction change, the requirement for resisting the grabbing rule is met, at the moment, the camera records the corrected image I 'again, the pixel position C' of the object is obtained again, and then the image, the pixel position of the grabbing point, the set of all pixel positions occupied by the object in the image, the grabbing angle and the label are stored.
Defining a network structure:
the network structure is shown in fig. 2. The method adopts a full-volume machine neural network, inputs the global image containing the whole working space, firstly uses Resnet50 as an encoder to extract features, then uses an up-sampling module with four layers of bilinear interpolation and convolution, and optimally uses a 5x5 convolution to obtain a grabbing function graph with the input through scale.
Defining a loss function:
because most pixels in an image belong to the background class and the graspable and non-graspable labels are very sparse, training directly with such data can be very inefficient. The invention therefore proposes to calculate the loss function in combination with a label mask. For pixels belonging to the object but not subjected to trial-and-error capture, the value of the pixel at the position corresponding to the label mask is set asFor other pixels, the value of the position corresponding to the label mask is set asIs provided withThe output characteristic diagram of the last convolutional layer is shown. The corresponding loss function is therefore:
indicating the label graph corresponding to the sample, H and W are the length and width of the label graph respectively, i, j and k are index subscripts of the position in the 3-channel image respectively, l is the index of the channel number,an output characteristic diagram representing the last convolutional layer;representing the real domain, the corresponding superscript represents the dimension of the tensor.
In order to reduce the influence caused by label sparsity, the invention increases the loss weight of grippable and non-grippable, and reduces the loss weight of the background. For both grippable and non-grippable labels, the position of their mask is multiplied by 120, while the background area is multiplied by 0.1.
The method comprises the following specific implementation steps:
step 1: in the simulation environment, an environment similar to a real scene is built.
Step 1.1: a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed are placed in a working space of the simulation environment.
Step 1.2: the method comprises the steps of placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in an image and a grabbing angle, and then randomly selecting an angle to enable a mechanical arm to perform trial-and-error grabbing.
Step 1.3: and judging whether the grabbing is successful, if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l of the grabbing failure. If the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored. The acquired global image is the global image defined by the grabbing problem in the invention content, and the grabbing angle and the grabbing position are also defined in the image space.
Step 2: the data is pre-processed.
Step 2.1: generating an object mask according to the position set of the pixel where the object is located in the image, generating a label mask according to the object mask, the pixel grabbing position and the label grabbing position, and generating a label image by using the grabbing position and the label grabbing position. For the label mask, the weights belonging to the grippable and non-grippable regions are increased, and the weight of the background is decreased.
Step 2.2: and discretizing the grabbing angle according to the problem definition. In this step, the image is rotated by 16 degrees, and the corresponding label and label mask are also rotated by 16 degrees, because only horizontal grabbing is considered, only data in which the grabbing direction is parallel to the horizontal direction after rotation is retained.
Step 2.3: the preprocessed data includes: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image.
And step 3: and training the deep neural network.
Step 3.1: the input RGB maps are normalized and then a batch (batch) is synthesized.
Step 3.2: the batch of data is transmitted to a full convolution neural network defined in the invention content to obtain an output value.
Step 3.3: and calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, wherein the calculation formula is as follows:
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,an output characteristic diagram representing the last convolutional layer;representing the real domain, the corresponding superscript represents the dimension of the tensor.
And 4, step 4: and applying the trained model to a real grabbing environment.
Step 4.1: and acquiring an RGB (red, green and blue) map and a depth map of the working space by using the camera.
Step 4.2: and (3) carrying out normalization processing on the RGB image, and rotating 16 angles to transmit into the full convolution neural network model to obtain 16 capture function images.
Step 4.3: according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained.
Step 4.4: and mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to a grabbing angle after the tail end reaches the position right above the object, and judging the descending height of the mechanical arm according to the acquired depth map to avoid collision.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (7)
1. A mechanical arm autonomous grabbing method based on vision is characterized by comprising the following steps:
s1, in a simulation environment, building an environment similar to a real scene, and collecting a global image;
s2, processing the data, wherein the preprocessed data comprise: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image; the treatment process comprises the following steps: firstly, generating an object mask according to a position set of pixels where an object is located in an image, then generating a label mask according to the object mask, a capture pixel position and a capture label, and generating a label graph by using the capture position and the capture label; then discretizing the grabbing angle according to the grabbing problem definition;
the definition of the grabbing problem comprises the following steps: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal, is a 3-dimensional one-bit effective code used for representing the grabbing function; the grabbing function is divided into three types, namely, grippable function, non-grippable function and background function; when projected into image space, capture at image I may be represented as WhereinIndicating the position of the grasp in the image,representing a discrete grabbing angle; each pixel in the image may define a capture function, so the entire capture function graph may be represented as:whereinA capture function graph for an image at a given ith angle; in the figure, 3 channels respectively represent three categories of graspable, non-graspable and background; from each grab function graph CiIn the first channelAnd are combined together to form
S3, training a deep neural network:
(1) normalizing the input RGB images, and then synthesizing a batch;
(2) transmitting the batch of data into a full convolution neural network to obtain an output value;
(3) calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, and calculating by the following loss function:
wherein Y is a label image, M is a label mask, H and W are the length and width of the label image, respectively, and i, j and k are in the 3-channel image, respectivelyIndex subscript of the position, l is the index of the channel number,an output characteristic diagram representing the last convolutional layer;representing a real number domain, wherein the corresponding superscript represents the dimension of the tensor;
and S4, applying the trained model to a real grabbing environment.
2. The vision-based mechanical arm autonomous grasping method according to claim 1, wherein the step S1 specifically includes:
s11, placing a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed in a working space of a simulation environment;
s12, placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in the image and a grabbing angle, and then randomly selecting an angle to allow the mechanical arm to perform trial-and-error grabbing;
s13, judging whether the grabbing is successful, and if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the grabbing failed label l; if the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored.
3. The vision-based mechanical arm autonomous grasping method according to claim 2, wherein the most robust grasping point is obtained by solving the following formula:
i*,h*,w*=argmaxi,h,wG(i,h,w)
4. The vision-based mechanical arm autonomous grasping method according to claim 3, wherein in the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
in the formula (I), the compound is shown in the specification,for image I rotateThe image after the degree of the image is,is composed ofA corresponding grabbing function diagram; f. ofθImplemented with a deep neural network; in conjunction with the loss function, the overall training objective may be defined by the following equation:
5. The vision-based mechanical arm automatic grabbing method of claim 4, wherein considering a scene that only one object is placed in a working space, c1 and c2 are defined as contact points of two fingers of a gripper and the object, n1 and n2 are corresponding normal vectors thereof, and g is defined as a grabbing direction of the gripper in an image space, wherein c1, c2, n1, n2, n 3538,By the above definition, it is possible to obtain:
wherein, | | · | | represents norm operation;
defining a fetch operation as a resistive fetch, when it satisfies the following condition:
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; wherein ω is1And ω2The included angle between the grabbing direction and the normal vector of the surfaces of two contact points contacted with the object is included; when the gripper grips in the direction normal to the contact pointWhen the quantities are parallel, the capture is defined as a stable counter-capture.
6. The vision-based robotic arm autonomous grasping method according to claim 5, wherein the step S4 includes:
s41, acquiring an RGB (red, green and blue) image and a depth image of a working space by using a camera;
s42, carrying out normalization processing on the RGB image, and rotating 16 angles to transmit the RGB image into a full convolution neural network model to obtain 16 capture function images;
s43, according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained;
and S44, mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to the grabbing angle after the image position reaches the position right above the object, and judging the descending height of the mechanical arm according to the collected depth map to avoid collision.
7. The vision-based mechanical arm autonomous grasping method according to claim 6, wherein the step S42 specifically includes: the image input into the full convolution neural network model comprises a global image of the whole space, firstly, Resnet50 is used as an encoder to extract features, then, a four-layer bilinear difference value and convolution sampling module is used, and finally, a 5x5 convolution is used for obtaining an input through-scale grabbing function diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910335507.5A CN110238840B (en) | 2019-04-24 | 2019-04-24 | Mechanical arm autonomous grabbing method based on vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910335507.5A CN110238840B (en) | 2019-04-24 | 2019-04-24 | Mechanical arm autonomous grabbing method based on vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110238840A CN110238840A (en) | 2019-09-17 |
CN110238840B true CN110238840B (en) | 2021-01-29 |
Family
ID=67883271
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910335507.5A Active CN110238840B (en) | 2019-04-24 | 2019-04-24 | Mechanical arm autonomous grabbing method based on vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110238840B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889460B (en) * | 2019-12-06 | 2023-05-23 | 中山大学 | Mechanical arm specified object grabbing method based on cooperative attention mechanism |
CN111127548B (en) * | 2019-12-25 | 2023-11-24 | 深圳市商汤科技有限公司 | Grabbing position detection model training method, grabbing position detection method and grabbing position detection device |
CN111325795B (en) * | 2020-02-25 | 2023-07-25 | 深圳市商汤科技有限公司 | Image processing method, device, storage medium and robot |
CN111590577B (en) * | 2020-05-19 | 2021-06-15 | 台州中盟联动企业管理合伙企业(有限合伙) | Mechanical arm multi-parameter digital frequency conversion control method and device |
CN112465825A (en) * | 2021-02-02 | 2021-03-09 | 聚时科技(江苏)有限公司 | Method for acquiring spatial position information of part based on image processing |
CN116197887B (en) * | 2021-11-28 | 2024-01-30 | 梅卡曼德(北京)机器人科技有限公司 | Image data processing method, device, electronic equipment and storage medium for generating grabbing auxiliary image |
CN114407011B (en) * | 2022-01-05 | 2023-10-13 | 中科新松有限公司 | Special-shaped workpiece grabbing planning method, planning device and special-shaped workpiece grabbing method |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7076313B2 (en) * | 2003-06-06 | 2006-07-11 | Visteon Global Technologies, Inc. | Method for optimizing configuration of pick-and-place machine |
KR101211601B1 (en) * | 2010-11-05 | 2012-12-12 | 한국과학기술연구원 | Motion Control System and Method for Grasping Object with Dual Arms of Robot |
US8843236B2 (en) * | 2012-03-15 | 2014-09-23 | GM Global Technology Operations LLC | Method and system for training a robot using human-assisted task demonstration |
US20150314439A1 (en) * | 2014-05-02 | 2015-11-05 | Precision Machinery Research & Development Center | End effector controlling method |
US10394327B2 (en) * | 2014-09-12 | 2019-08-27 | University Of Washington | Integration of auxiliary sensors with point cloud-based haptic rendering and virtual fixtures |
KR102487493B1 (en) * | 2016-03-03 | 2023-01-11 | 구글 엘엘씨 | Deep machine learning methods and apparatus for robotic grasping |
JP2018051704A (en) * | 2016-09-29 | 2018-04-05 | セイコーエプソン株式会社 | Robot control device, robot, and robot system |
CN106874914B (en) * | 2017-01-12 | 2019-05-14 | 华南理工大学 | A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks |
CN106846463B (en) * | 2017-01-13 | 2020-02-18 | 清华大学 | Microscopic image three-dimensional reconstruction method and system based on deep learning neural network |
CN106914897A (en) * | 2017-03-31 | 2017-07-04 | 长安大学 | Inverse Solution For Manipulator Kinematics method based on RBF neural |
CN109407603B (en) * | 2017-08-16 | 2020-03-06 | 北京猎户星空科技有限公司 | Method and device for controlling mechanical arm to grab object |
CN108161934B (en) * | 2017-12-25 | 2020-06-09 | 清华大学 | Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning |
CN108415254B (en) * | 2018-03-12 | 2020-12-11 | 苏州大学 | Waste recycling robot control method based on deep Q network |
CN109483534B (en) * | 2018-11-08 | 2022-08-02 | 腾讯科技(深圳)有限公司 | Object grabbing method, device and system |
-
2019
- 2019-04-24 CN CN201910335507.5A patent/CN110238840B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110238840A (en) | 2019-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110238840B (en) | Mechanical arm autonomous grabbing method based on vision | |
Cao et al. | Suctionnet-1billion: A large-scale benchmark for suction grasping | |
CN108491880B (en) | Object classification and pose estimation method based on neural network | |
CN111079561B (en) | Robot intelligent grabbing method based on virtual training | |
CN108280856B (en) | Unknown object grabbing pose estimation method based on mixed information input network model | |
CN111331607B (en) | Automatic grabbing and stacking method and system based on mechanical arm | |
Zhang et al. | Grasp for stacking via deep reinforcement learning | |
CN110969660B (en) | Robot feeding system based on three-dimensional vision and point cloud deep learning | |
Tang et al. | Learning collaborative pushing and grasping policies in dense clutter | |
CN113172629A (en) | Object grabbing method based on time sequence tactile data processing | |
CN113762159B (en) | Target grabbing detection method and system based on directional arrow model | |
CN113752255A (en) | Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning | |
CN115147488A (en) | Workpiece pose estimation method based on intensive prediction and grasping system | |
CN115861780B (en) | Robot arm detection grabbing method based on YOLO-GGCNN | |
CN113894058A (en) | Quality detection and sorting method and system based on deep learning and storage medium | |
CN110171001A (en) | A kind of intelligent sorting machinery arm system based on CornerNet and crawl control method | |
CN114998573B (en) | Grabbing pose detection method based on RGB-D feature depth fusion | |
Ito et al. | Integrated learning of robot motion and sentences: Real-time prediction of grasping motion and attention based on language instructions | |
CN116214524A (en) | Unmanned aerial vehicle grabbing method and device for oil sample recovery and storage medium | |
Wu et al. | A cascaded CNN-based method for monocular vision robotic grasping | |
CN114131603B (en) | Deep reinforcement learning robot grabbing method based on perception enhancement and scene migration | |
CN113681552B (en) | Five-dimensional grabbing method for robot hybrid object based on cascade neural network | |
CN114211490A (en) | Robot arm gripper pose prediction method based on Transformer model | |
CN110889460B (en) | Mechanical arm specified object grabbing method based on cooperative attention mechanism | |
Li et al. | Grasping Detection Based on YOLOv3 Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |