CN109934864A

CN109934864A - Residual error network depth learning method towards mechanical arm crawl pose estimation

Info

Publication number: CN109934864A
Application number: CN201910192296.4A
Authority: CN
Inventors: 白帆; 姚仁杰; 陈懋宁; 崔哲新
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2019-06-25
Anticipated expiration: 2039-03-14
Also published as: CN109934864B

Abstract

The present invention discloses a kind of residual error network depth learning method towards mechanical arm crawl pose estimation, this method comprises: initialization mechanical arm, and adjust at the known altitude that mechanical wrist camera is located above vertical X0Y plane；Obtain the depth image of mechanical arm object to be grabbed；Mapping processing is carried out to depth image using the improved GG-CNN model of training in advance, exports the crawl information image of four 300 × 300 pixels, including crawl success rate, crawl angle cosine value, crawl angle sine value and grasp width；And then obtain the crawl angle and width information of success rate extreme higher position；The crawl information obtained in success rate image will be grabbed, the crawl angle and width under mechanical arm basis coordinates system to target object are obtained by coordinate transform.Improvement GG-CNN model in the above method builds residual error network by constructing residual error module, enhances the fitting effect and learning ability of convolutional neural networks, and then the crawl precision for generating crawl pose is higher.

Description

Residual error network deep learning method towards mechanical arm crawl pose estimation

Technical field

The invention belongs to information control technology more particularly to a kind of residual error network towards mechanical arm crawl pose estimation are deep Spend learning method.

Background technique

In recent years, the mechanical arm crawl of view-based access control model becomes a hot spot of current research.Generally carrying out grasping movement When, it is necessary first to realize accurate target detection and positioning.Conventional target detection is usually static detection, and target is single, Target detection is influenced by factors such as the variations that shape, size, the variation at visual angle and exterior light are shone, thus extracted spy Sign generalization ability is not strong, and robustness is poor.The development of deep learning algorithm promotes the progress of target detection and location tasks.It grinds Study carefully boundary and generally believe that deep network generally can be better than shallow network effect, but the depth promotion of network cannot pass through the letter of layer and layer It is single to stack to realize.Due to the presence of gradient disappearance problem, deep layer network is difficult to train.2015, those skilled in the art proposed The thought of residual error network (ResNet), for solving the problems, such as that precision declines.It is concentrated in ImageNet individual-layer data, passes through pole Deep residual error network, has obtained extraordinary result.

It is that present mechanical arm grabs the Main way studied that mechanical arm visual grasping and deep learning, which are combined,.Recently, portion Divide those skilled in the art to propose and generation neural network (Generative Grasping is grabbed by building Convolutional Neural Network abbreviation GG-CNN) the optimal pose of Lai Jinhang object grabs research, by will be defeated The pixel of the depth image entered is corresponding with the crawl pixel of information image of output, constructs convolutional neural networks, realizes to multiple The prediction of sundries body Optimal Grasp pose, this method has higher application efficiency compared with popular stochastical sampling method, right The estimation of mechanical arm Optimal Grasp pose has definite meaning.But since above-mentioned GG-CNN excessively pursues the speed of identification and crawl Degree, reduces the accuracy of identification of neural network, so that application of the network model in terms of mechanical arm crawl has certain limitation Property.

It currently needs to solve for this purpose, how to improve and become applied to the accuracy of identification of the GG-CNN of mechanical arm crawl pose estimation Certainly the problem of.

Summary of the invention

The object of the present invention is to provide a kind of residual error network deep learning method towards mechanical arm crawl pose estimation, energy Enough effectively improve mechanical arm Optimal Grasp pose generation precision so that GG-CNN model had more in high-precision crawl field Practicability.

To achieve the above object, the main technical schemes that the present invention uses include:

The present invention provides a kind of residual error network deep learning method towards mechanical arm crawl pose estimation, comprising:

S1, initialization mechanical arm, and mechanical arm is adjusted, so that wrist camera is located at the known altitude above vertical X0Y plane Place；

S2, the depth image for obtaining mechanical arm object to be grabbed；

S3, the central part of depth image is cut out, obtains the Object Depth image of 300 × 300 pixels；

S4, mapping processing is carried out to the Object Depth image using the improved GG-CNN model of training in advance, it is defeated The crawl information image of four 300 × 300 pixels out, including crawl success rate, crawl angle cosine value, crawl angle sine value And grasp width；

The highest pixel of success rate in S5, selection crawl success rate image corresponds to crawl angle cosine value with this, grabs The respective pixel point in angle sine value and grasp width information image is taken, the conduct crawl of crawl success rate extreme higher position is obtained The crawl angle and width information of information；

S5, the crawl information obtained in success rate image will be grabbed, passes through wrist camera, mechanical wrist and mechanical arm base The coordinate transform of seat, obtains under mechanical arm basis coordinates system (cartesian coordinate system), to the crawl angle and width of target object；

S6, input crawl information, control mechanical arm realize crawl (i.e. output coordinate transformed target object to be grabbed Crawl position, angle and width, to control the crawl that mechanical arm carries out target object)；

Wherein, the improved GG-CNN model is residual to be built in existing GG-CNN model by building residual error module Poor network enhances the fitting effect and learning ability of convolutional neural networks, so that the crawl generated by improved GG-CNN model The crawl precision of pose is higher, more sensitive for the variation of object space and shape, has more practical application value.

Optionally, before step S1, which comprises

S0-1, one is created for training outputting and inputting for improved GG-CNN model based on existing data set First data set G_train；First data set includes that label is positive the image for grabbing information and label is negative and grabs the figure of information Picture, and the image in the first data set has multiple markd crawl frames；

S0-2, it builds residual error network by constructing residual error module existing GG-CNN model is improved, changed with building GG-CNN model after, while guaranteeing that the image size of the input and output of improved GG-CNN model is constant；

S0-3, the first data set G is used_trainIt is trained to by the improved GG-CNN model of residual error, after being trained Improved GG-CNN model.

Optionally, include: by the improved GG-CNN model of residual error

Conventional part, deconvolution part and output par, c；

Conventional part includes: ten residual error modules,

Wherein, the first residual error module includes: 1 convolution residual error module for having pond layer, in the convolution residual error module Parameter includes: 4 filters that step-length is 3 × 3；

Second residual error module includes: 5 identical residual error modules, the parameter in the identical residual error module include: step-length be 1 × 14 filters；

Third residual error module includes: 1 convolution residual error module for having pond layer, the parameter packet in the convolution residual error module It includes: 8 filters that step-length is 2 × 2；

Four-infirm difference module includes: 5 identical residual error modules, the parameter in the identical residual error module include: step-length be 1 × 18 filters；

5th residual error module includes: 1 convolution residual error module for having pond layer, the parameter packet in the convolution residual error module It includes: 16 filters that step-length is 2 × 2；

6th residual error module includes: 5 identical residual error modules, the parameter in the identical residual error module include: step-length be 1 × 1 16 filters；

7th residual error module includes: 1 convolution residual error module for having pond layer, the parameter packet in the convolution residual error module It includes: 32 filters that step-length is 5 × 5；

8th residual error module includes: 5 identical residual error modules, the parameter in the identical residual error module include: step-length be 1 × 1 32 filters；

9th residual error module includes: 1 convolution residual error module for having pond layer, the parameter packet in the convolution residual error module It includes: 64 filters that step-length is 1 × 1；

Tenth residual error module includes: 5 identical residual error modules, the parameter in the identical residual error module include: step-length be 1 × 1 64 filters；

Deconvolution part includes the different warp lamination of 5 parameters；

The filter number of first warp lamination is 64, and the size of each filter is 3 × 3, and step-length is 1 × 1；

The filter number of second warp lamination is 32, and the size of each filter is 5 × 5, and step-length is 5 × 5；

The filter number of third warp lamination is 16, and the size of each filter is 5 × 5, and step-length is 2 × 2；

The filter number of 4th warp lamination is 8, and the size of each filter is 7 × 7, and step-length is 2 × 2；

5th warp lamination filter number is 4, and the size of each filter is 9 × 9, and step-length is 3 × 3；

Output par, c includes the convolutional layer of four Linear Mappings, and each convolutional layer respectively includes 1 filter, and four linearly reflect The convolutional layer penetrated successively mapping output crawl success rate, the cosine value for grabbing angle, the sine value of crawl angle and crawl respectively Width.

Optionally, it in step S0-3, is grabbed using following friendships and than formula to by the improved GG-CNN network of residual error Accuracy rate is taken to be measured；

It hands over and than formula:

Wherein, C and G respectively represents two known regions, hands over and than the intersection between as two regions of calculating and simultaneously The ratio of collection.

Optionally, the depth image I=R in S2^H×W, wherein H is height, and W is width, the crawl description of depth image Are as follows:

By the coordinate transform of mechanical arm, by the crawl in image spaceBe converted to the crawl in world coordinates g:

Wherein, o=(u, v) is the position coordinates for grabbing success rate maximum pixel,It is the rotation in camera referential Angle,It is the grasping width in image coordinate；T_RCIt is the coordinate transform that mechanical arm coordinate system is transformed into from camera coordinates system, T_CIIt is the calibration transfer based on trick position between camera internal parameter and mechanical arm and camera；

Output image in S4 is expressed as: G=(Φ, W, Q) ∈ R^3×H×W；

Φ, W and Q are individually ∈ R^3×H×W, crawl angle, grasp width and crawl accuracy rate are respectively represented, wherein grabbing angle Degree Φ is split as crawl angle cosine value and grabs angle sine value, and grabs the highest coordinate o of success rate corresponding Φ, W Include in QWith the value of q；

The improved GG-CNN model of use training in advance in step S4 maps the Object Depth image Processing, specifically: G=M (I)；

Best crawl pose from image space determining in G:

Specifically from the crawl information G of output, the maximum pixel of crawl success rate q of Q image therein is first selected, it will Its coordinate o corresponds to the Φ and W in output G, thus to obtain position, angle and the width information of best crawl pose；

Further, pass throughCalculate the best crawl pose g in world coordinates_best。

Optionally, the treatment process of each residual error module of conventional part includes:

Each residual error module includes main path and bypass diameter；

Bypass diameter using the path of pondization and convolution operation and two kinds of paths of shortcut path without operation by being formed；

Specifically, main path includes:

1) the data X inputted is first operated through regularization, then through the active coating using ReLU activation primitive, most afterwards through filtering Device, convolutional layer are output to next layer；

2) it is operated by upper one layer through regularization, then through the active coating using ReLU activation primitive, most afterwards through filter, convolution Layer exports F (X)；

Bypass diameter includes:

1) module pond parameter is true: the data X of input first passes through maximum pond layer, using filter size be 5 × 5, number filters, the convolutional layer that step-length is 1 × 1 export W (X)；

2) module pond parameter is false: without any operation, directly being exported X；

Main path is added with the output of selected bypass diameter, the overall output H (X) as residual error modularity function.

The beneficial effects of the present invention are:

Compared with prior art, it is raw that mechanical arm Optimal Grasp pose can be improved in method of the invention for method of the invention At precision so that in the application method improved GG-CNN model high-precision crawl field have more practicability.

That is, convolution residual error module is built in the application proposition first, building residual error is accumulated using residual error module multilayer Network deepens the depth of convolutional neural networks, and in this, as the main part for improving GG-CNN.Improvement GG- has been invented herein CNN model improves the precision that mechanical arm Optimal Grasp pose generates, so that the network model is more real in high-precision crawl field The property used.

Detailed description of the invention

Fig. 1 is the flow chart of the residual error network deep learning method of the invention that pose estimation is grabbed towards mechanical arm；

Fig. 2 is the schematic diagram of the cartesian space and image space description in the application；

Fig. 3 is the schematic diagram that Cornell University in the prior art grabs data set；

Fig. 4 is the generating process schematic diagram of training dataset in the application；

Fig. 5 is GG-CNN structural schematic diagram in the prior art；

The schematic diagram for the part-structure that Fig. 6 is used when being the building residual error module in the application；

Fig. 7 is the schematic diagram of the identical residual block of building in the application；

Fig. 8 is the convolution residual block schematic diagram in the application；

Fig. 9 is the schematic diagram of the residual error modularity function in the application；

Figure 10 is the structure chart by the improved GG-CNN model of residual error in the application；

Figure 11 is the accuracy rate comparison diagram using the model of Fig. 5 and Figure 10；

Figure 12 is the output effect comparison diagram into front and back model such as Fig. 5 and Figure 10.

Specific embodiment

In order to better explain the present invention, in order to understand, with reference to the accompanying drawing, by specific embodiment, to this hair It is bright to be described in detail.

The autonomous crawl problem of mechanical arm is a major issue of robot research field.It is asked for Optimal Grasp pose Topic, the application assign mechanical arm vision and combine deep learning algorithm, realize the intelligence of mechanical arm crawl.

Crawl is improved using the thought of residual error network in the application and generates convolutional neural networks (GG-CNN), builds volume first Product residual error module (as shown in Figure 9) accumulates building residual error network using residual error module multilayer, deepens the depth of convolutional neural networks Degree, and in this, as the main part for improving GG-CNN.By depth residual error network improvement GG-CNN in the application, machine is improved The accuracy rate of tool arm Optimal Grasp pose generation model.The experimental results showed that the application utilizes the GG-CNN of residual error network improvement Model accuracy rate has reached 88%, much higher than the accuracy rate of master mould 72%, substantially increases that model prediction mechanical arm is optimal to grab The accuracy rate of fetch bit appearance has certain significance of scientific research and application value in mechanical arm visual grasping field.

Fig. 1 shows the method for one embodiment of the invention offer, and this method may include following step:

S1, initialization mechanical arm, and mechanical arm is adjusted, so that wrist camera is located at the known altitude above vertical X0Y plane Place.

It is illustrated in the present embodiment with the wrist camera of mechanical arm, certainly, in practical applications, wrist phase can not be limited Machine, being optionally located in the camera that the cooperative mechanical arm on mechanical arm top uses can be used.

S2, the depth image for obtaining mechanical arm object to be grabbed.

S3, the central part of depth image is cut out, obtains the Object Depth image of 300 × 300 pixels.

Limited depth image does not cut out mode to the present embodiment, but need to retain depth image target object it is main Part.

In order to better understand the scheme of the application, it is described as follows below in conjunction with scheme of the attached drawing to the application.

1, based on the fetching of GG-CNN

1.1 definition crawl parameters and transformation

In the case where obtaining depth image using depth camera in given scenario, the application research is perpendicular to plane The problem of detecting and grab on unknown object, as shown in Figure 2

It is grabbed perpendicular to implementation on XOY plane (i.e. mechanical arm base coordinate system, abbreviation mechanical arm coordinate system), this implementation It can will be grabbed in example is defined as:

It just can determine that a grasping movement using these parameters for describing pose, position is in the fixture in cartesian coordinate Heart position p=(x, y, z), posture include the angle that end effector is rotated around z-axisWith required width ω.It grabs successfully A possibility that rate q, expression grabs successfully.

The inner parameter of camera used in this application be it is known, with this obtain height be H and width for W depth image I=R^H×W, detect the crawl of depth image I.Crawl description in image I are as follows:

Wherein, o=(u, v) is the position coordinates for grabbing success rate maximum pixel,It is in camera (wrist phase i.e. above-mentioned The camera of machine or mechanical arm) rotation angle in referential,It is the grasping width in image coordinate.Pass through the coordinate of mechanical arm Transformation, can be by the crawl in image spaceBe converted to the crawl in world coordinates g:

T_RCIt is the coordinate transform that mechanical arm coordinate system is transformed into from camera coordinate system, T_CIIt is based on camera internal parameter The calibration transfer of trick position between mechanical arm and camera is transformed into 3D camera coordinates system from 2D image coordinate.

In addition, one group of crawl in image space is known as to grab figure, it is denoted as

G=(Φ, W, Q) ∈ R^3×H×W

Wherein Φ, W and Q are individually ∈ R^3×H×W, crawl angle, grasp width and crawl accuracy rate are respectively represented, wherein grabbing Angle, φ is taken to be split as crawl angle cosine value and grab angle sine value, and it is corresponding to grab the highest coordinate o of success rate Include in Φ, W and QWith the value of q.

In ideal, can directly calculate the crawl value of each pixel in depth image I, rather than to input picture carry out with Machine sampling.For this purpose, the function M (or being mapping M/ mapping function M) in depth image is defined as the depth image from input To crawl information image transformation:

G=M (I)

Pose is most preferably grabbed in image space from that can calculate in G

And pass through equationCalculate the best crawl pose g in world coordinates_best.

1.2 neural network approximation mapping relations

It is understood that how described further below use improved GG-CNN to determine mapping function M.

Convolutional neural networks (GG-CNN), which is generated, using crawl carrys out approximate function mapping M:I → G.Use M_λIndicate nerve net Network, wherein λ is the weight after neural metwork training.

Demonstrate M_λ(I)=(Q_λ,Φ_λ,W_λ) ≈ M (I), use L2 loss functionI is inputted with training set_trainWith it is corresponding Output G_trainLearn and train the neural network, as follows:

Wherein, G is one group of crawl parameter at the point p estimated in cartesian coordinate system, it corresponds to each pixel o.θ Without meaning, merely for convenience of description.

Crawl figure G is expressed as three images for being one group: Φ, W, and Q.These parameters are expressed as follows:

Q is the image for the crawl success rate that description executes at each point (u, v).The value is the mark in [0,1] range Amount, wherein the value close to 1 indicates that crawl has higher success rate.

Φ is the image for describing the angle of the crawl executed at each point.Because general grasping body is in ± pi/2 radian Near be it is symmetrical, so angle should be in the range of [- pi/2, pi/2].

W is the image for describing the end effector width of the crawl executed at each point.In order to guarantee that depth is constant, W Value in the range of [0,150] pixel, the depth that depth camera parameter and measurement can be used is converted into physical measurement.

The building and training of 1.3 GG-CNN

Existing data set is not able to satisfy the training requirement of GG-CNN, big from Connell in order to train GG-CNN model Crawl data set (shown in Fig. 3) creates one and meets the data set that GG-CNN is output and input.Cornell University grabs number It include the RGB-D image of 885 real-world objects according to collection, wherein 5110 are marked as " positive crawl (positive crawl) ", 2909 are marked as " negative crawl (negative crawl) ".Although this is one compared with some newer generated data collection Relatively small crawl data set, but this data is most able to satisfy the demand grabbed pixel-by-pixel in the application, because of each image Provide multiple markd crawl frames.

Using random cropping, scales and rotate to increase the quantity of Cornell University's crawl data set, to create one The set G of 8840 depth images of group and related crawl image_train, and 51 have been effectively combined, 100 crawl examples.

Cornell University grabs data set and object to be captured is expressed as crawl rectangle frame using pixel coordinate, thus school The position of quasi- end effector and rotation angle.In order to indicate a transition to the expression G based on image from crawl rectangle frame, selection is every It is a crawl rectangle center one third at as image can capture area, correspond to end effector center position.And Assuming that other any regions are not effectively to grab.Data set generation process is as shown in Figure 4.

Crawl success rate Q: by Cornell University grab each pixel in data set whether be effective crawl be considered as two into Label processed, and by Q_trainCan capture area be set as 1, other all pixels are 0.

It rotates angle, φ: calculating angle of each crawl rectangle in [- pi/2, pi/2] range, and corresponding Φ is set_train Region.In order to eliminate when using original angle, discontinuity and the value is too large that the data that angle is located at ± pi/2 are likely to occur The problem of.By two vector components that angle decomposition is on unit circle, the value in [- 1,1] range is generated, since mapping is grasped in ± pi/2 radian is nearby symmetrically, to use two component sin (2 Φ_train) and cos (2 Φ_train), they are provided in Φ_train Uniquely it is worth in ∈ [- pi/2, pi/2].

Grasp width W: it is similar with angle, the width (as unit of maximum value) of each crawl rectangle is calculated, indicates crawl Simultaneously W is arranged in the width of device_TCorresponding portion.During the training period, by W_TValue press 1/150 scale smaller, make it in [0,1] model In enclosing.The depth of video camera/camera parameter and measurement can be used to calculate the width of end effector.

Depth image input: since Cornell University's crawl data set is captured using real camera, it is Comprising true sensor noise, therefore addition noise is not needed.Depth image is repaired using OpenCV to delete invalid value.Subtract The average value for removing each depth image provides its value to depth invariance centered on 0.

By above definition and operation, generate from Cornell University's crawl data set for training GG-CNN model Data set.

M is mapped using GG-CNN function model in the prior art_λ(I)=(Q_λ,Φ_λ,W_λ), directly from the depth map of input Information image G is grabbed as I carrys out approximate generation_λ, using 300 × 300 depth image as input, by the operation of three-layer coil product and three Layer deconvolution operation, the final hum pattern for obtaining crawl.GG-CNN complete structure is as shown in Figure 5.

Since GG-CNN shown in fig. 5 can not improve precision in identification and crawl, for this purpose, to shown in Fig. 5 in the application GG-CNN structure improve, as shown in Figure 10, development is as described below.

The 2 improvement GG-CNN models based on residual error network

The thought of residual error network is introduced first, secondly illustrates two basic modules (such as identical residual block, convolution residual block), Finally two basic modules is combined to construct residual error module, has built residual error network using residual error module, structure is as shown in Figure 10.

2.1 residual error networks

Residual error network has used for reference the cross-layer link thought of high speed network (Highway Network), but makes improvements. By way of constructing residual block " shortcut connections (shortcut connection) ", directly using input X pass to output as Initial results, output result are

H (X)=F (X)+X

As F (X)=0, then H (X)=X, that is, identical mapping.ResNet, which is equivalent to, changes learning objective, It is no longer study one complete output, but the difference of target value H (X) and X, that is, so-called residual error:

F (X)=H (X)-X

Therefore, subsequent training objective seeks to residual result approaching 0, so that accuracy rate is not as network is deepened Decline.

The structure of this residual error great-jump-forward, the output for having broken n-1 layers of traditional neural network can only be to n-layer as input Convention, make a certain layer output can directly across several layers of inputs as a certain layer below, its significance lies in that for superposition it is more Layer network and the problem that the accuracy rate of entire learning model is gone up not down provides new direction.

In ResNet (residual error network), shortcut is connected so that gradient propagates backward to the layer of more front, and Fig. 6 (a) is shown The main path of neural network, Fig. 6 (b) are that main path is added to a shortcut connection, by stacking these ResNet modules, can To construct very deep neural network.

Two kinds of major type of modules (i.e. identical residual block and convolution residual block) are used in ResNet, are selected identical residual It is identical or different that poor block and convolution residual block, which depend primarily on input/output size,.If identical, identical residual error is used Otherwise block uses convolution residual block.

(1) identical residual block

Identical residual block is calibrated bolck used in ResNet, and corresponding to input and output has the case where identical dimensional.

Bypass diameter is shortcut connection (shortcut), and convolutional layer constitutes main path.In Fig. 7, convolution has equally also been carried out It activates and operates with ReLU, in order to accelerate trained speed, prevent over-fitting, joined Batch regularization.

(2) convolution residual block

The convolution residual block of ResNet is another type of residual block, can be with when outputting and inputting size and mismatching Using such module, as shown in Figure 8.

Convolutional layer in shortcut path will be for that will input the size that X is adjusted to different, so as to by shortcut path and main path Output size matching.

2.2 introduce residual error network improvement GG-CNN

The thought of residual error network is introduced into GG-CNN in the application, by constructing residual error module, Lai Jinhang deeper The building of neural network model obtains better mechanical arm most to improve the accuracy rate that GG-CNN model generates crawl pose Excellent crawl pose generates network.The residual error modular structure of building is as shown in Figure 9.

In this application, the residual error module of building is divided into main path and the big path of bypass diameter two, and wherein bypass diameter is by using Two kinds of paths compositions of shortcut path of the pondization with the path of convolution operation and without operation.

To better illustrate, it is assumed that input as X, in order to distinguish the output of each path, be respectively designated as F (X), W (X) and H (X), this partial content mainly explain single residual error module.

Operation on main path includes:

3) as shown in connection with fig. 9, the X of input is first operated through regularization, then through the active coating using ReLU activation primitive, finally It is filters/2 (the wherein input parameter filters that filters is modularity function) through filter number, step-length is 1 × 1 Convolutional layer is output to next layer；Wherein, filter size is 3 × 3；

4) it is operated by upper one layer through regularization, then through the active coating using ReLU activation primitive, most afterwards through filter number For filters (the wherein input parameter filters that filters is modularity function), step-length is that (wherein strides is strides The input parameter strides of modularity function) convolutional layer, export F (X)；Wherein, filter size is 5 × 5.

Operation on bypass diameter includes:

3) modularity function pond parameter is true: the X of input first passes through maximum pond layer, size be strides (wherein Strides is the input parameter strides of modularity function), it is that (wherein filters is mould to filters using filter number The input parameter filters of block function), the convolutional layer that step-length is 1 × 1 exports W (X).Wherein, filter size is 5 × 5；

4) modularity function pond parameter is false: without any operation, directly by X export by main path with it is selected auxiliary The output in path is added, the overall output H (X) as residual error modularity function.

The residual error module built in the application using oneself carries out the improvement of GG-CNN model, is guaranteeing to be originally inputted output Under the premise of size is constant, intermediate structure is constructed by the accumulation of residual error module, model structure is as shown in Figure 10.

Specifically, it is shown in Fig. 10 by the improved GG-CNN network of residual error include: conventional part, deconvolution part and Output par, c；

Conventional part includes: ten residual error modules,

Deconvolution part includes the different warp lamination of 5 parameters；

That is, in the present embodiment, the network of residual error portion being exported the transformation through three warp laminations, obtains this The crawl set G needed in application, does linear activation for the output of deconvolution, is mapped to the position picture p of output layer, grabs angle The angle figure Φ and grasp width picture W of sinusoidal picture and cosine the picture composition of degree, thus constitute in the application and pass through The improved GG-CNN network of residual error.

3 experimental results and analysis

Mechanical arm crawl emulation experiment algorithm uses the GG-CNN model of residual error network improvement in the application, and experimental situation is The programmed environment of Ubuntu16.04 system, pose generating algorithm and grasping algorithm is Python 2, aobvious using laboratory server Card GTX1080 accelerates training process, carries out repeatedly improving test.

In the training and test of network model, REFER object detection field is handed over and measures model than the concept of (IoU) Accuracy rate.It hands over and compares and be defined as follows

The crawl frame generated by network is grabbed with the ratio of label crawl frame intersection and union as the generation of the application network The accuracy rate taken.

The network parameter of former GG-CNN is tested, improved and optimized, optimizer type, study by adjusting network Rate, regularization parameter, batch data size, the number of plies of loss function, activation primitive and neural network improve GG-CNN network Accuracy rate finally selects Adam optimizer after many experiments, and batch data is sized such that 32, loss by learning rate decaying Function uses MSE, and activation primitive uses ReLU, builds residual error network using the residual error module of building, be superimposed by multilayer module Construct depth residual error network.

As shown in figure 11, the GG-CNN model of residual error network improvement shown in Fig. 10 and original GG-CNN model are (such as Fig. 5 institute Show) accuracy rate curve, with the increase of epoch, the accuracy rate of the crawl pose generated by model is stepped up, by 100 The training of a epoch, the accuracy rate for improving front and back model are basicly stable.The accuracy rate curve of improved model and master mould is compared, The application can be clearly seen that the accuracy rate for improving preceding model is stablized 71% or so, and the accuracy rate of model is final after improving It can stablize 88% or so.

Using the GG-CNN model of residual error network improvement, the accuracy rate that pose generates improves 17%, illustrates to utilize multilayer Residual error module builds deep layer residual error network, constructs deeper crawl and generates convolutional neural networks model, can be highly effective Raising GG-CNN model accuracy rate, obtain more accurate mechanical arm Optimal Grasp pose.

In order to which the crawl pose before and after testing improvement generates the effect of network, the application data set is grabbed from Cornell University Data set creates one and meets the data set that GG-CNN is output and input.Cornell University is grabbed into real-world object in data set RGB-D image, and label " positive crawl " thereon and " negative crawl " be simultaneously displayed in image, marks The pose that grabs be indicated with rectangle frame, whole RGB image is placed on top-left position；The corresponding depth image of data set is made For output, pose can be grabbed by equally being represented with grayish rectangle frame, indicate to pass through neural network with the rectangle frame of Dark grey The crawl pose that training generates, overall depth image are placed on upper-right position；The grasp width of neural metwork training output and crawl Each pixel has corresponding crawl parameter value in angular image, and grasp width image is placed on lower left position, grabs angle Image is placed on bottom-right location.The effect before and after crawl generation network improvement is shown by four one group of image, utilizes 1 He of object 2 two objects are shown, and effect is as shown in figure 12.In Figure 12, (a) indicates the output generated before improving to the identification of object 1 Pose (b) indicates the output pose generated before improving to the identification of object 2, (c) indicates to generate the identification of object 1 after improving defeated Pose out (d) indicates the output pose generated after improving to the identification of object 2.

The output for improving front and back network model is compared, looks first in depth image and convolutional Neural is generated by crawl The Dark grey rectangle frame effect that network generates, for object 1, improve previous existence at crawl width of frame it is small, be not able to satisfy and actually grab It takes, not only width is suitable for the crawl frame generated after GG-CNN model refinement, but also position is also able to satisfy crawl and requires；For object 2, actual requirement can be met by improving the crawl frame position that front and back generates, and effect is all preferable.Observation crawl generates convolutional Neural net The grasp width and angular image of network output, for object 1 and object 2, the crawl image generated after model refinement can grab picture The distribution situation of vegetarian refreshments and the pixel distribution situation of actual object depth image are more consistent, and grasp width and angle value more accord with It closes practical.The crawl information image color of network model output becomes apparent from, size, shape of the improved network model for object The perception of the difference of shape and position is more sensitive, can preferably reflect the variation of crawl information.

It constructs residual error network by the application to improve GG-CNN model, so that model generates the accurate of crawl pose Rate significantly improves, grabs significant effect promotion.

GG-CNN model pursues calculating speed in the prior art, using too simple neural network structure, reduces mind Magnitude through network parameter sacrifices the crawl accuracy rate of a part of network model.The thought of residual error network is used in the application, The residual block function for being suitble to oneself network model is constructed, the structure of GG-CNN model is reconstructed, substantially increases model prediction The accuracy rate of mechanical arm Optimal Grasp pose, it is high quality, high-precision although deeper network means to calculate the growth of time The crawl of degree is still the important need in practical crawl, in the higher field of certain required precisions, has certain application value.

It is to be appreciated that describing the skill simply to illustrate that of the invention to what specific embodiments of the present invention carried out above Art route and feature, its object is to allow those skilled in the art to can understand the content of the present invention and implement it accordingly, but The present invention is not limited to above-mentioned particular implementations.All various changes made within the scope of the claims are repaired Decorations, should be covered by the scope of protection of the present invention.

Claims

1. a kind of residual error network deep learning method towards mechanical arm crawl pose estimation characterized by comprising

The depth image for the target object to be grabbed that S2, the wrist camera for obtaining the mechanical arm after initialization acquire, wherein adjustment Mechanical arm tail end is located at wrist camera at the preset height above vertical X0Y plane；

S3, the depth image of acquisition is pre-processed, obtains the Object Depth image of 300 × 300 pixels；

S4, mapping processing, output four are carried out to the Object Depth image using the improved GG-CNN model of training in advance The crawl information image of 300 × 300 pixels is opened, including grabs success rate, crawl angle cosine value, crawl angle sine value and grabs Take width；

The highest pixel of success rate in S5, selection crawl success rate image, and correspond to crawl angle cosine value, crawl angle Respective pixel point in sine value and grasp width information image, obtain crawl success rate extreme higher position as crawl information Grab angle and width information；

S5, the crawl information by acquisition first pass through the coordinate transform of wrist camera, then pass through the seat between mechanical wrist and pedestal Mark transformation, finally obtains the crawl angle and width of the target object to be grabbed under mechanical arm basis coordinates system；

Wherein, the improved GG-CNN model is to build residual error net by constructing residual error module in existing GG-CNN model Network enhances the fitting effect and learning ability of convolutional neural networks.

2. the method according to claim 1, wherein before step S2, which comprises

S0-1, one is created for training first output and input of improved GG-CNN model based on existing data set Data set G_train；First data set includes that label is positive the image for grabbing information and label is negative and grabs the image of information, And the first image in data set has multiple markd crawl frames；

S0-2, it builds residual error network by constructing residual error module existing GG-CNN model is improved, after being improved with building GG-CNN model, while guarantee improved GG-CNN model input with export image size it is constant；

S0-3, the first data set G is used_trainIt is trained to by the improved GG-CNN model of residual error, changing after being trained Into GG-CNN model.

3. according to the method described in claim 2, it is characterized in that, including: by the improved GG-CNN model of residual error

Conventional part, deconvolution part and output par, c；

Conventional part includes: ten residual error modules,

Wherein, the first residual error module includes: 1 convolution residual error module for having pond layer, the parameter in the convolution residual error module It include: 4 filters that step-length is 3 × 3；

Second residual error module includes: 5 identical residual error modules, and the parameter in the identical residual error module includes: 4 that step-length is 1 × 1 A filter；

Third residual error module includes: 1 convolution residual error module for having pond layer, and the parameter in the convolution residual error module includes: 8 filters that step-length is 2 × 2；

Four-infirm difference module includes: 5 identical residual error modules, and the parameter in the identical residual error module includes: 8 that step-length is 1 × 1 A filter；

5th residual error module includes: 1 convolution residual error module for having pond layer, and the parameter in the convolution residual error module includes: 16 filters that step-length is 2 × 2；

6th residual error module includes: 5 identical residual error modules, and the parameter in the identical residual error module includes: that step-length is 1 × 1 16 filters；

7th residual error module includes: 1 convolution residual error module for having pond layer, and the parameter in the convolution residual error module includes: 32 filters that step-length is 5 × 5；

8th residual error module includes: 5 identical residual error modules, and the parameter in the identical residual error module includes: that step-length is 1 × 1 32 filters；

9th residual error module includes: 1 convolution residual error module for having pond layer, and the parameter in the convolution residual error module includes: 64 filters that step-length is 1 × 1；

Tenth residual error module includes: 5 identical residual error modules, and the parameter in the identical residual error module includes: that step-length is 1 × 1 64 filters；

Deconvolution part includes the different warp lamination of 5 parameters；

Output par, c includes the convolutional layer of four Linear Mappings, and each convolutional layer respectively includes 1 filter, four Linear Mappings Successively mapping output grabs success rate, the cosine value for grabbing angle, the sine value and grasp width for grabbing angle to convolutional layer respectively.

4. according to the method described in claim 3, it is characterized in that, in step S0-3, using following friendships and than formula to logical The crawl accuracy rate for crossing the improved GG-CNN network of residual error is measured；

It hands over and than formula:

Wherein, C and G respectively represents two known regions, the intersection and union between friendship and as two regions than calculating Ratio.

5. the method according to claim 1, wherein

Depth image I=R in S2^H×W, wherein H is height, and W is width, the crawl description of depth image are as follows:

Wherein, o=(u, v) is the position coordinates for grabbing success rate maximum pixel,It is the rotation angle in camera referential,It is the grasping width in image coordinate；T_RCIt is the coordinate transform that mechanical arm coordinate system is transformed into from camera coordinates system, T_CIIt is base The calibration transfer of trick position between camera internal parameter and mechanical arm and camera；

Output image in S4 is expressed as: G=(Φ, W, Q) ∈ R^3×H×W；

Φ, W and Q are individually ∈ R^3×H×W, crawl angle, grasp width and crawl accuracy rate are respectively represented, wherein grabbing angle, φ It is split as crawl angle cosine value and grabs angle sine value, and grab the corresponding Φ of the highest coordinate o of success rate, in W and Q Include With the value of q；

The improved GG-CNN model of use training in advance in step S4 carries out mapping processing to the Object Depth image, Specifically: G=M (I)；

Best crawl pose from image space determining in G:

Specifically from the crawl information G of output, the maximum pixel of crawl success rate q of Q image therein is first selected, is sat Mark o corresponds to the Φ and W in output G, thus to obtain position, angle and the width information of best crawl pose；

6. according to the method described in claim 3, it is characterized in that,

The treatment process of each residual error module of conventional part includes:

Each residual error module includes main path and bypass diameter；

Specifically, main path includes:

1) the data X inputted is first operated through regularization, then through the active coating using ReLU activation primitive, most afterwards through filter, volume Lamination is output to next layer；

2) it is operated by upper one layer through regularization, then through the active coating using ReLU activation primitive, most afterwards through filter, convolutional layer, It exports F (X)；

Bypass diameter includes:

1) module pond parameter is true: the data X of input first passes through maximum pond layer, is 5 × 5 using filter size, number Mesh is filters, and the convolutional layer that step-length is 1 × 1 exports W (X)；

7. method according to any one of claims 1 to 6, which is characterized in that further include following steps before the step S2 Rapid S1:

S1, initialization mechanical arm, and mechanical arm is adjusted, it is located at wrist camera at the preset height above vertical X0Y plane；

Correspondingly, after step s 5, further include step S6:

Crawl position, angle and the width information of the transformed target object to be grabbed of S6, output coordinate, with control mechanical arm into The crawl of row target object.