CN108280481A

CN108280481A - A kind of joint objective classification and 3 d pose method of estimation based on residual error network

Info

Publication number: CN108280481A
Application number: CN201810077747.5A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-01-26
Filing date: 2018-01-26
Publication date: 2018-07-13

Abstract

A kind of joint objective based on residual error network proposed in the present invention is classified and 3 d pose method of estimation, main contents include：Joint objective is classified and 3 d pose estimation, loss function, training, its process is, first, using 50 fourth stages of ResNet as character network, using the 5th stages of ResNet 50 as sorter network, and use three layers of posture network as posture network, and estimated come joint objective classification and 3 d pose using the residual error network based on framework, then, new mathematic(al) representation and new loss function are proposed to 3 d pose, the sum of posture loss function and Classification Loss function is used to characterize live posture, loss function between live tag along sort and the network output proposed, newest training finally is carried out to Pascal3D+ databases.The present invention utilizes the residual error network based on framework, and the loss function that construction is new, has achieved the purpose that joint objective classification and 3 d pose estimation, has realized the effect for reducing the algorithm loss time.

Description

A kind of joint objective classification and 3 d pose method of estimation based on residual error network

Technical field

The present invention relates to target classification and Attitude estimation fields, more particularly, to a kind of joint mesh based on residual error network Mark classification and 3 d pose method of estimation.

Background technology

Environment sensing is a key problem of computer vision science and an important portion of modern visual challenge Point.Understand that a kind of mode of a width scene image is the target described it as inside scene, this relate to target classification and Attitude estimation.Target classification be by measured target and the training sample of known target one by one compared with, answer with or it is different (true or false)； Attitude estimation is then that the attributes such as form, action to different target are estimated.Target classification is with Attitude estimation in many fields All have a wide range of applications, for example, safety-security area recognition of face, pedestrian detection, pedestrian tracking, intelligent video analysis etc., traffic The traffic scene object identification in field, vehicle count, drive in the wrong direction detection, car plate detection and identification and internet arena based on The image retrieval of content, photograph album automatic clustering etc..It can be said that target classification has been applied to the daily life of people with Attitude estimation Every aspect living.As success of the emerging science and technology such as deep learning in image classification and two dimension target detection is answered Classified come processing target using convolutional neural networks with, much work at this stage and the problem of Attitude estimation.But these Work all uses the output of two dimension target detection system as the input of 3 d pose estimating system.In fact, existing method is just Estimate target object, the position for detecting target, the 3 d pose for estimating target successively as assembly line.This has resulted in consumption more The a series of problem such as more time.

The present invention proposes that a kind of joint objective based on residual error network is classified and 3 d pose method of estimation first will ResNet-50 fourth stages using the 5th stages of ResNet-50 as sorter network, and use three layers of appearance as character network State network is classified come joint objective using the residual error network based on framework and is estimated with 3 d pose as posture network, so Afterwards, new mathematic(al) representation and new loss function are proposed to 3 d pose, that is, uses posture loss function and Classification Loss function The sum of loss function between the live posture of characterization, live tag along sort and the network output proposed, finally to Pascal3D+ Database carries out newest training.Invention achieves the purposes of joint objective classification and 3 d pose estimation, and realize Reduce the effect of algorithm loss time.

Invention content

For the problems such as more is taken, the present invention is intended to provide the side of a kind of classification of joint objective and 3 d pose estimation Method, first, using ResNet-50 fourth stages as character network, using the 5th stages of ResNet-50 as sorter network, and Using three layers of posture network as posture network, and using the residual error network based on framework come joint objective classification and three-dimensional appearance State is estimated, then, proposes new mathematic(al) representation and new loss function to 3 d pose, that is, uses posture loss function and classification Loss function between the live posture of the sum of loss function characterization, live tag along sort and the network output proposed, it is finally right Pascal3D+ databases carry out newest training.

Specifically, main contents of the invention include：

(1) joint objective classification and 3 d pose estimation；

(2) loss function；

(3) training.

Wherein, joint objective classification and 3 d pose estimation, it is unknown to can be applied to target classification label Situation, and use residual error network ResNet-50 as character network.

Further, the classification inputs, the feature of character network for estimating target classification label as it.

Further, described to use residual error network ResNet-50 as character network, by ResNet-50 fourth stages As character network, using the 5th stages of ResNet-50 as sorter network, and use three layers of posture network as posture net Network.

Wherein, the loss function, when target classification label is unknown, the present invention constructs 3 d pose new mathematics Expression formula and new loss function.First, live posture R is characterized with the sum of posture loss function and Classification Loss function^*, it is real Condition tag along sort c^*The loss function between (R, c) is exported with the network proposed, i.e.,：

Wherein, Classification Loss functionUse the classification cross-entropy loss function of standard；And posture loses letter NumberThen depend on the representation of spin matrix R.

Further, the spin matrix R, R use the representation of axis angle, i.e. ,=expm (θ [v]_×), wherein v Corresponding rotary shaft, [v]_×It indicates by v=[v₁,v₂,v₃]^TThe antisymmetric matrix of generation, i.e.,：

And θ corresponds to rotation angle, limit θ ∈ [0, π), obtain one a pair of between spin matrix R and axis angle vector y=θ v It should be related to.

Further, the correspondence between the spin matrix and axis angle vector, Wherein, y₁And y₂It is two axis angle vectors；

The loss function over the ground in space is as shown in above formula where spin matrix.

Further, the axis angle vector sets y_iIt is the output of i-th of posture network, when known to target classification When, posture output can be selected according to correct tag along sort, i.e.,：

When live target classification label is unknown, Weighted Loss Function or highest loss function can be used to estimate posture Output.

Further, the Weighted Loss Function and highest loss function, it is assumed that the output of sorter network is random Vector, then the posture estimated is y_wgt(c)=∑ iyip (c=i), loss function are：

And if it is considered to scheduled target classification label is a label for having maximum probability, then the posture estimated is y_{Argmaxip (c=i)}；

Loss function is as shown in above formula.

Wherein, the training trains network using following steps：

The first step, fixed character network carry out classification pre-training to the image of ImageNet and seek its weights；

Second step knows that the network of sorter network and particular category is unrelated with other networks；

Third walks, and using the information that two steps obtain above as the initial value of whole network, then utilizes new loss function, Whole network is optimized with lower learning rate, realizes the task of joint objective classification and Attitude estimation.

Description of the drawings

Fig. 1 is a kind of system flow of joint objective classification and 3 d pose method of estimation based on residual error network of the present invention Figure.

Fig. 2 is a kind of network architecture of joint objective classification and 3 d pose method of estimation based on residual error network of the present invention Figure.

Specific implementation mode

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.

Fig. 1 is a kind of system flow of joint objective classification and 3 d pose method of estimation based on residual error network of the present invention Figure.Include mainly joint objective classification and 3 d pose estimation, loss function, training.

Joint objective is classified and 3 d pose estimation, can be applied to the unknown situation of target classification label, and using residual Poor network ResNet-50 is as character network.

Loss function characterizes live posture R with the sum of posture loss function and Classification Loss function^*, live tag along sort c^* The loss function between (R, c) is exported with the network proposed, i.e.,：

Wherein, Classification Loss functionUsing the classification cross-entropy loss function of standard, and posture loses letter NumberRepresentation dependent on spin matrix R.

Spin matrix R uses the representation of axis angle, i.e. R=expm (θ [v]_×), wherein v corresponds to rotary shaft, [v]_× It indicates by v=[v₁,v₂,v₃]^TThe antisymmetric matrix of generation, i.e.,：

Correspondence between spin matrix and axis angle vector,Wherein, y₁And y₂It is two A axis angle vector；

Set y_iIt is the output of i-th of posture network, when known to target classification, can be selected according to correct tag along sort Posture output is selected, i.e.,：

Weighted Loss Function and highest loss function, it is assumed that the output of sorter network is random vector, then estimate Posture is y_wgt(c)=∑ iyip (c=i), loss function are：

Loss function is as shown in above formula.

Network is trained using following steps：

Fig. 2 is a kind of network architecture of joint objective classification and 3 d pose method of estimation based on residual error network of the present invention Figure.Using the feature of character network as input, for estimating target classification label.Using ResNet-50 fourth stages as feature Network using the 5th stages of ResNet-50 as sorter network, and uses three layers of posture network as posture network.

For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention More and change.

Claims

1. a kind of joint objective classification and 3 d pose method of estimation based on residual error network, which is characterized in that main includes connection Close target classification and 3 d pose estimation (one)；Loss function (two)；Training (three).

2. estimating (one) with 3 d pose based on the joint objective classification described in claims 1, which is characterized in that can apply In the situation that target classification label is unknown, and use residual error network ResNet-50 as character network.

3. based on the classification described in claims 2, which is characterized in that using the feature of character network as input, for estimating Target classification label.

4. based on using residual error network ResNet-50 as character network described in claims 2, which is characterized in that will ResNet-50 fourth stages using the 5th stages of ResNet-50 as sorter network, and use three layers of appearance as character network State network is as posture network.

5. based on the loss function (two) described in claims 1, which is characterized in that use posture loss function and Classification Loss letter The live posture R of the sum of number characterization^*, live tag along sort c^*The loss function between (R, c) is exported with the network proposed, i.e.,：

Wherein, Classification Loss functionUsing the classification cross-entropy loss function of standard, and posture loss functionRepresentation dependent on spin matrix R.

6. based on the spin matrix R described in claims 5, which is characterized in that R uses the representation of axis angle, i.e. R= expm(θ[v]_×), wherein v corresponds to rotary shaft, [v]_×It indicates by v=[v₁,v₂,v₃]^TThe antisymmetric matrix of generation, i.e.,：

And θ corresponds to rotation angle, limit θ ∈ [0, π), the one-to-one correspondence obtained between spin matrix R and axis angle vector y=θ v closes System.

7. based on the correspondence between the spin matrix described in claims 6 and axis angle vector, which is characterized in thatWherein, y₁And y₂It is two axis angle vectors；

8. based on the axis angle vector described in claims 7, which is characterized in that setting y_iIt is the output of i-th of posture network, When known to target classification, posture output can be selected according to correct tag along sort, i.e.,：

When live target classification label is unknown, Weighted Loss Function or highest loss function estimation posture can be used defeated Go out.

9. based on Weighted Loss Function and highest loss function described in claims 8, which is characterized in that assuming that classification net The output of network is random vector, then the posture estimated is y_wgt(c)=∑_iy_iP (c=i), loss function are：

And if it is considered to scheduled target classification label is a label for having maximum probability, then the posture estimated is

Loss function is as shown in above formula.

10. based on the training (three) described in claims 1, which is characterized in that train network using following steps：

Third walks, using the information that two steps obtain above as the initial value of whole network, then using new loss function, with compared with Low learning rate optimizes whole network, realizes the task of joint objective classification and Attitude estimation.