CN111209799A

CN111209799A - Pedestrian searching method based on partial shared network and cosine interval loss function

Info

Publication number: CN111209799A
Application number: CN201911337014.1A
Authority: CN
Inventors: 罗炬锋; 陈浩然; 李丹; 曹永长; 偰超; 张力; 崔笛扬; 郑春雷
Original assignee: Shanghai Internet Of Things Co ltd
Current assignee: Shanghai Internet Of Things Co ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-29
Anticipated expiration: 2039-12-23
Also published as: CN111209799B

Abstract

The invention relates to a pedestrian searching method based on a partial shared network and a cosine interval loss function, which comprises the following steps: firstly, a new neural network structure is designed, so that the pedestrian detection part and the pedestrian re-identification part share the characteristics of a shallower layer, so that the pedestrian detection part and the pedestrian re-identification part are more focused on respective tasks, and mutual interference between the pedestrian detection part and the pedestrian re-identification part is reduced to a certain extent from the perspective of improving a model structure. Secondly, the influence of the weight of the pedestrian re-identification loss function on model optimization in the multi-loss function combined optimization is deeply researched, and the mutual interference between pedestrian detection and re-identification is relieved from the optimization perspective by setting reasonable loss function parameters. Finally, a more robust lookup table updating strategy is provided, cosine intervals are added into the OIM loss function to reduce the distance between similar samples, and finally the pedestrian features learned by the network are more distinctive. The invention can reduce the mutual interference between pedestrian detection and pedestrian re-identification.

Description

Pedestrian searching method based on partial shared network and cosine interval loss function

Technical Field

The invention relates to the technical field of computer vision application, in particular to a pedestrian searching method based on a partial shared network and a cosine interval loss function.

Background

Pedestrian re-identification, aiming at matching target pedestrians from a multi-camera monitoring system without overlapping fields of view, is a very important and fast-developing research field in the field of computer vision. At present, pedestrian re-identification has been applied in the field of video monitoring, for example, searching criminal suspects from people, cross-camera pedestrian tracking, pedestrian activity analysis, etc., and has a very important meaning for guaranteeing the safety of public lives and properties, so in recent years, the pedestrian re-identification technology has led to extensive research in academic and industrial fields. Although many data sets and related algorithms for pedestrian re-identification are proposed at present, a huge gap still exists between the pedestrian re-identification technology itself and the practical application in real life: in other words, in most pedestrian re-identification researches, cut pedestrian images are used as data sets to be queried, but in a real scene, images acquired by a monitoring system are all scene images, and after all pedestrians are detected from the scene images, target pedestrian query is performed. Therefore, in practical applications, it is necessary to combine the tasks of pedestrian detection and pedestrian re-identification and process the two tasks at the same time. Pedestrian search aims to search for a target pedestrian from a scene graph by simultaneously processing pedestrian detection and pedestrian re-recognition, is a novel problem appearing in recent years, and has attracted extensive attention in both academic and industrial fields.

In the past few years, a number of pedestrian search algorithms have been proposed, which can be broadly divided into two categories: a two-step algorithm and an end-to-end algorithm. The two-step algorithm respectively processes the two tasks by using a pedestrian detection model and a pedestrian re-identification model, takes a scene picture as input, firstly detects pedestrians, and then takes the pedestrian slices as the input of a pedestrian re-identification network for matching after the detected pedestrian slices are obtained; the end-to-end algorithm utilizes a joint optimization deep learning model to uniformly process two tasks by setting a multi-loss function comprising pedestrian detection and pedestrian re-identification. Theoretically, the end-to-end pedestrian search algorithm has more advantages than the two-step algorithm, such as: pedestrian detection and pedestrian re-identification can be mutually promoted through joint optimization, so that the probability of pedestrian false detection is reduced, the accuracy of identification is improved, and meanwhile, the pedestrian search is more time-efficient due to the fact that the pedestrian detection and the pedestrian re-identification share the feature layer.

However, the end-to-end model always faces these three important issues. The first is that the shared features of pedestrian detection and pedestrian re-identification affect the performance of the model. The second is that the performance of the end-to-end model depends greatly on the reasonable setting of the weight between the pedestrian detection and pedestrian re-identification loss functions due to the adoption of a multi-task learning mode. The last one is that the widely used pedestrian re-identification loss function OIM loss function in the end-to-end pedestrian search network lacks the ability to distinguish pedestrians, since it only considers correctly distinguishing different classes of samples, but ignores optimizing the similarity between the same class of samples.

Disclosure of Invention

The invention aims to provide a pedestrian searching method based on a partial shared network and a cosine interval loss function, which can reduce the mutual interference between pedestrian detection and pedestrian re-identification.

The technical scheme adopted by the invention for solving the technical problems is as follows: the pedestrian searching method based on the partial shared network and the cosine interval loss function comprises the following steps:

(1) constructing a neural network structure comprising a pedestrian suggestion network and a pedestrian re-identification network; the neural network structure adopts ResNet-50 as a basic network, wherein the ResNet-50 comprises a convolutional layer Conv1 and four convolutional layer groups Conv2_ x to Conv5_ x, and each convolutional layer group has 3, 4, 6 and 3 residual error units;

(2) inputting pictures into the neural network structure, wherein Conv 1-Conv 3_4 of the neural network structure are used as a backbone network to extract a shallow feature map, and the shallow feature map is shared by a pedestrian suggestion network and a pedestrian re-identification network;

(3) when the shallow feature map is extracted, the shallow feature map enters two branches; the first branch sends the shallow feature map to a copied convolutional network layer group Conv4_1 to Conv4_3 for further feature extraction, and then the shallow feature map is sent to a pedestrian suggestion network to generate a plurality of pedestrian suggestion boxes; the second branch sends the shallow feature map to an interest pooling layer, pedestrian suggestion frames obtained by the pedestrian suggestion network are used as interested areas, and pedestrian feature maps corresponding to the frames are pooled in the shallow feature map;

(4) inputting the pedestrian feature maps into a pedestrian re-identification network composed of convolutional layer groups Conv4_1 to Conv5_3, outputting deep pedestrian feature maps, performing pooling through a global average pooling layer, and finally mapping each pedestrian feature map into a pedestrian feature vector of a dimension.

The neural network structure in the step (1) is trained by adopting a multi-loss function, and the form of the total loss function is expressed as: and the pedestrian suggestion network loss function + r is a pedestrian re-identification network loss function, wherein r is an adjusting parameter, and mutual interference between pedestrian detection and re-identification is relieved by adjusting r.

The pedestrian re-identification network loss function adopts a cosine interval loss function, and is represented as:

wherein N represents the total number of labeled samples in a training batch, s is a scale parameter, and theta_jAnd i denotes the ith sample x_iAnd pedestrian feature vector v of labeled category_jM is the cosine interval introduced, L is the total number of classes of labeled samples, Q ≈ L, ψ_kAnd i denotes the ith sample x_iAnd pedestrian feature vector u of unlabeled class_kThe included angle of (a).

For pedestrian feature vector v with label category_jThe updating method comprises the following steps:

wherein N is_tRepresents the total number of samples of class t, C, in a training batch_iDenotes the ith sample x_iAs the confidence of the pedestrian, μ is a variable that increases as the training period increases.

The pedestrian suggestion network in the step (1) adopts a structure of FasterR-CNN.

After the step (4), the method further comprises the following steps: the pedestrian vectors in the a dimension are mapped into the feature subspace in the b dimension using a full connected layer, and then the pedestrian features in the b dimension feature subspace are L2-normalized to prevent overfitting.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the partial shared network adopted by the invention can effectively reduce mutual interference between the pedestrian detection part and the pedestrian re-identification part from the perspective of a model structure by reasonably reducing the network layers shared by the pedestrian detection part and the pedestrian re-identification part, so that the pedestrian detection part and the pedestrian re-identification part can focus on respective tasks. The invention enables the whole network to converge to a more ideal parameter space by setting the correct weight. The invention also introduces a cosine interval to the OIM loss function, increases the aggregation of the similar samples and leads the learned characteristics to have stronger distinguishability. The invention designs a novel updating strategy for re-identifying the loss of the pedestrian, and experiments prove that the strategy is more robust. Finally, a large number of experiments are carried out on the two standard data sets, and finally, the method is proved to have better performance and stronger practicability on mAP and top-1 evaluation indexes than the existing end-to-end pedestrian search model.

Drawings

Fig. 1 is a diagram of a network architecture of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The embodiment of the invention relates to an end-to-end pedestrian searching method based on a partial shared network and a cosine interval loss function, which comprises the following steps: a new neural network structure is designed, so that the pedestrian detection and pedestrian re-identification parts share more shallow features, the pedestrian detection and pedestrian re-identification parts are more concentrated on respective tasks, and mutual interference between the pedestrian detection and pedestrian re-identification parts is relieved to a certain extent from the perspective of improving a model structure. The influence of the weight of the pedestrian re-identification loss function on model optimization in the multi-loss function combined optimization is deeply researched, and the mutual interference between pedestrian detection and re-identification is relieved from the optimization perspective by setting reasonable loss function parameters. The widely used OIM loss function is improved: a more robust lookup table updating strategy is provided, cosine intervals are added into an OIM loss function to reduce the distance between similar samples, and finally pedestrian features learned by a network are more distinctive. The whole neural network architecture is shown in fig. 1.

The partial shared network proposed by the present embodiment is improved from the proposed end-to-end pedestrian search network. Specifically, ResNet-50 is used as the base network. ResNet-50 consists of a convolutional layer (Conv1) and four convolutional layer groups (Conv2_ x to Conv5_ x), each convolutional layer group having 3, 4, 6, 3 residual units. In the present embodiment, Conv1 to Conv3 — 4 are used as the backbone network, and the extracted shallow features are shared by a Pedestrian suggestion network (PPN) and a Pedestrian re-IDentification network (IDnet). When the shallow feature is extracted, it enters two branches. The first leg directs the shallow feature map to duplicate Conv4_1 through Conv4_3 (not sharing parameters with the corresponding network layer in the IDnet) for further feature extraction, and then to generate pedestrian advice boxes in the PPN. And the second branch sends the shallow feature map to the interest pooling layer, the pedestrian suggestion frames obtained by the PPN are used as interested areas, and the pedestrian feature map corresponding to each frame is pooled in the shallow feature map. The size of the pedestrian feature map corresponding to each frame is 512 × 40 × 20, and the structure of the PPN is consistent with the structure of the area detection network in the Faster R-CNN. The IDnet is composed of the Conv4_1 to Conv5_3, inputs pedestrian feature maps obtained by Pooling, outputs deep pedestrian feature maps with high-level classification semantics, then performs Pooling through a Global Average Pooling layer (GAP), and finally maps each pedestrian feature map into a 2048-dimensional pedestrian feature vector. Strictly speaking, IDnet should focus only on the pedestrian re-recognition task and should not include the pedestrian detection section. However, it was found through experiments that keeping the pedestrian front background classification layer and the frame regression layer after IDnet would make the detection result better, because the false detected pedestrians can be further eliminated by using high-level semantics. For the pedestrian re-identification part, a 2048-dimensional pedestrian vector is first mapped into a 256-dimensional feature subspace using one full connected layer, and then the 256-dimensional pedestrian features are L2-regularized to prevent overfitting. The whole network is trained using multiple loss functions, and the total loss function composition is shown in equation 1.

Wherein the first four loss functions constitute Fast R-CNN losses. In the partial sharing network, the network layer shared by the tasks of pedestrian detection and pedestrian re-identification is composed of Conv1 to Conv3_4, and the partial sharing network enables the two tasks to share shallower features and enables the two tasks to have more network layers to focus on the respective tasks more. Furthermore, the height-to-width ratio of the pedestrian pooling feature map is fixed to be 2:1 by the partial sharing network, and the pooling mode is more suitable for the pedestrian suggestion box. By taking this compromise, the mutual interference between the pedestrian detection task and the pedestrian re-identification task caused by the shared features can be effectively mitigated, while the shared useful shallow feature information is preserved.

After 256-dimensional pedestrian feature vectors subjected to L2-regularization are obtained, an Onlinelnstance Matching (OIM) loss function is proposed by an existing algorithm to serve as a loss function of a pedestrian re-identification task. The OIM penalty function maintains an extra cache of two online updates: one LookUp Table (LookUp Table, LUT)

And a Circular Queue (CQ)

Where D is the vector dimension, L is the total number of classes of labeled samples, and Q is a value close to L. V is used for maintaining the pedestrian feature vector of the labeled category, and U is used for maintaining the pedestrian feature vector of the unlabeled category. The expression of the OIM loss function is shown in equation 2.

Where N represents the total number of labeled samples in a training batch, x_iRepresenting the ith sample, is a 256-dimensional L2-regularized feature vector labeled t (t e [1, C)]And C represents the total number of categories). After each iteration, the LUT and the CQ are updated, and the update strategy of the existing algorithm is shown in formula 3:

v_t＝γv_t+(1-γ)x (3)

wherein x represents the pedestrian feature vector with the category t, and the formula is sequentially executed for each pedestrian feature vector with the category t. This means that v is updated after_tThe feature vector of (a) has in factA greater weight. Unlike this approach, the present embodiment proposes a more efficient update approach:

wherein N is_tDenotes the total number of samples, C, in a training batch with class t_iRepresents a sample x_iAs the confidence of the pedestrian, it is obtained by the classification layer of IDnet. Meanwhile, mu is a variable that increases as the training period increases, unlike formula 3 in which gamma is set to a fixed constant.

Such an update strategy is mainly based on two considerations: (1) because the pedestrian feature vectors extracted from the same batch do not have the sequence, updating by using the weighted average information of the pedestrian feature vectors is a more robust method than updating according to the sequence. (2) At the beginning of training, v is now the case because the pedestrian features obtained are not very accurate_tIs not very reliable, so it is necessary to set a small value of μ such that v is_tCan be updated more quickly. When the training reaches a certain stage, v_tAfter becoming more stable, the value of μ needs to be reduced to combat the uncertain interference noise.

Since all features are L2-regularized, equation 2 can be rewritten as:

wherein, theta_jI represents x_iAnd v_jAngle of vector of phi_kI represents x_iAnd u_kS is a scale parameter, and has a value equal to 1/τ.

From equation 5, the objective of the OIM penalty function is to maximize the cosine similarity of the v vectors of x and its corresponding labels, however, it does not explicitly reduce the differences between homogeneous samples, which makes the learned pedestrian features not sufficiently strong discriminative. To solve this problem, the present embodiment introduces a cosine interval for the OIM loss function, which may be naturally fused into the cosine representation of the OIM loss function.

Formally, the proposed cosine interval OIM loss function (CM-OIM) expression is:

where m is a hyperparameter representing the introduced cosine interval. The CM-OIM loss function can enable the difference between different types of samples in a sample space to be larger, meanwhile, the similarity between the same type of samples is also enhanced, and finally the learned features have stronger distinctiveness.

The method disclosed by the invention is experimentally verified through two widely used large-scale data sets CUHK-SYSU and PRW, 83.5% of mAP is obtained in CUHK-SYSU, 32.8% of mAP is obtained in PRW, and the result shows that the method has better pedestrian search performance than other conventional methods.

Claims

1. A pedestrian searching method based on a partially shared network and a cosine interval loss function is characterized by comprising the following steps:

2. The pedestrian searching method based on the partially shared network and the cosine interval loss function as claimed in claim 1, wherein the neural network structure in the step (1) is trained by using a multiple loss function, and the form of the total loss function is expressed as: and the pedestrian suggestion network loss function + r is a pedestrian re-identification network loss function, wherein r is an adjusting parameter, and mutual interference between pedestrian detection and re-identification is relieved by adjusting r.

3. The pedestrian searching method based on the partially shared network and the cosine interval loss function according to claim 2, wherein the pedestrian re-identification network loss function adopts the cosine interval loss function, and is expressed as:

4. The pedestrian search method based on the partially shared network and the cosine interval loss function as claimed in claim 3, wherein the pedestrian feature vector v of the labeled class is selected_jThe updating method comprises the following steps:

5. The pedestrian searching method based on the partially shared network and the cosine interval loss function according to claim 1, wherein the pedestrian proposing network in the step (1) adopts a structure of FasterR-CNN.

6. The pedestrian searching method based on the partially shared network and the cosine interval loss function according to claim 1, wherein the step (4) is further followed by: the pedestrian vectors in the a dimension are mapped into the feature subspace in the b dimension using a full connected layer, and then the pedestrian features in the b dimension feature subspace are L2-normalized to prevent overfitting.