CN111738164A

CN111738164A - Pedestrian detection method based on deep learning

Info

Publication number: CN111738164A
Application number: CN202010586392.XA
Authority: CN
Inventors: 陈凌霄; 廖宏; 肖杨; 杜奕霖; 杨程; 彭一峰; 黄铭斌
Original assignee: Guangxi Computing Center Co ltd
Current assignee: Guangxi Computing Center Co ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-02
Anticipated expiration: 2040-06-24
Also published as: CN111738164B

Abstract

The invention relates to the technical field of pedestrian detection systems, in particular to a pedestrian detection method based on deep learning. The method comprises the following steps: s1, arranging a camera, acquiring a video of a current analysis scene, and marking a surrounding frame of a pedestrian in the video as a training data set for deep learning; s2, calculating a corresponding pedestrian pixel width matrix W and a pedestrian pixel height matrix H according to the pixel width and height of the pedestrian in each position of the image in the training data set; s3, calculating the Scale Scale and the aspect Ratio of the anchor frame of each region position in the image by using the matrix W and the matrix H; s4 deep learning Faster Rcnn model training is carried out; s5: and acquiring the position coordinates and width and height of each pedestrian, and counting the total number of people or the local density of the current scene by adopting a trained Faster Rcnn model. The invention improves the detection accuracy and reduces the false detection.

Description

Pedestrian detection method based on deep learning

Technical Field

The invention relates to the field of computer vision and deep learning, in particular to the technical field of pedestrian detection systems for security monitoring and intelligent video analysis, and particularly relates to a pedestrian detection method based on deep learning.

Background

With the rapid development of computer science technology, the application of computer vision technology to automatically and intelligently analyze pedestrian targets in a monitoring scene gradually becomes a research hotspot. Through pedestrian's detection, can carry out high density state warning, count pedestrian's flow etc. and replace traditional manpower control on duty. In recent years, the deep learning neural network has been developed in a striding way, and gradually replaces the traditional target detection method. Traditional algorithms often require manual design of features according to a specific domain and generally can only detect a certain class. The deep learning enables the deep neural network to automatically learn the target characteristics through supervised learning of a large-scale sample set, and not only has higher precision than that of the traditional algorithm, but also can simultaneously detect various targets. At present, target detection methods based on deep learning are divided into two types, one is a target detection method based on a one-step method, and position coordinates of a target, such as SSD, YOLO and the like, are directly regressed from an image, but the algorithm is often low in detection precision of small targets, and a large amount of missing detection occurs; in another method, which is a target detection method based on a two-step method, such as fast-RCNN, the algorithm first extracts a feature map from an image through a base network, then generates a candidate region through an RPN network, and then performs target classification through a classification algorithm. The multi-target deep learning detection algorithm is called a general target detection algorithm, and has poor detection effect on pedestrians in a monitoring scene. Disclosure of Invention

In order to solve the problems, the invention provides a pedestrian detection method based on deep learning, which comprises the following specific technical scheme:

a pedestrian detection method based on deep learning comprises the following steps:

s1: arranging a camera at a scene to be analyzed, acquiring a video of the current analysis scene, marking a surrounding frame of a pedestrian in the video, and storing a pedestrian marking data file as a training data set for deep learning;

s2: calculating a corresponding pedestrian pixel width matrix W and a pedestrian pixel height matrix H according to the pixel width and height of the pedestrian at each position in the image in the training data set;

s3: calculating the Scale and the aspect Ratio of an anchor frame of each region position in the image by using the pedestrian pixel width matrix W and the pedestrian pixel height matrix H;

s4: performing deep learning FasterRcnn model training on a current scene image data set to obtain weight parameters of a trained FasterRcnn model;

s5: deploying a pedestrian detection system for the current scene, acquiring the position coordinate and width and height of each pedestrian, and counting the total number of people or the local density of the current scene by adopting a trained Faster Rcnn model.

Preferably, the step S2 includes the steps of:

s21: reading a locally stored pedestrian marking data file;

s22: creating a pedestrian pixel width matrix W and a pedestrian pixel height matrix H, wherein the dimensions of the pedestrian pixel width matrix W and the pedestrian pixel height matrix H respectively correspond to the pixel width and height of the camera image;

s23: if a certain pixel point in the training data set image is the central point of a pedestrian bounding box, writing the pixel width value and the pixel height value of the bounding box into corresponding positions of a pedestrian pixel width matrix W and a pedestrian pixel height matrix H respectively; if the corresponding positions of the pedestrian pixel width matrix W and the pedestrian pixel height matrix H have numerical values, calculating the addition and average value of the written value and the stored value, and then writing the sum and average value into the corresponding positions of the pedestrian pixel width matrix W and the pedestrian pixel height matrix H;

s24: and checking whether 0 value points exist in the pedestrian pixel width matrix W and the pedestrian pixel height matrix H obtained in the steps, if so, performing adjacent pixel distance interpolation to calculate the pixel value of the current position, and writing the pixel value into the corresponding positions of the pedestrian pixel width matrix W and the pedestrian pixel height matrix H.

Preferably, the step S3 includes the steps of:

s31: performing n-n meshing on the pedestrian pixel width matrix W and the pedestrian pixel height matrix H, and traversing the training data set to obtain the pedestrian maximum pixel height H1 and the pedestrian minimum pixel height H2 in the current analysis scene, wherein n = H2/H1, and n is an integer obtained by downward rounding;

s32: calculating the average pixel value in each grid of the pedestrian pixel width matrix W as the Scale of the corresponding training RPN network;

s33: calculating the average value of the pedestrian pixel width matrix W and the pedestrian pixel height matrix H in each grid respectively as

And

then the aspect Ratio within each grid is {1:

/

}。

preferably, in the step S4, in the RPN network training process, the Scale and the aspect Ratio of the anchor frame are obtained through coordinates of a center of a candidate frame generated by the RPN network.

Preferably, the step S5 includes the steps of:

s51: automatically acquiring a real-time video stream of the analysis scene by using the camera sdk, and decoding to obtain an image of the analysis scene;

s52: loading the weight parameters of the trained FasterRcnn model, and inputting the decoded image into the trained FasterRcnn model to obtain a convolution characteristic diagram;

s53: and performing RPN network reasoning on the convolution characteristic diagram to obtain a target candidate region, and performing Fast RCNN classification and region frame regression to obtain a target classification score and pedestrian width and height.

The invention has the beneficial effects that: the target pedestrian detection effect under the monitoring scene is improved, the detection accuracy is improved, and the false detection is reduced.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of selecting an anchor point box according to the present invention.

Detailed Description

For a better understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings in which:

as shown in fig. 1, a pedestrian detection method based on deep learning includes the steps of:

s1: arranging a camera at a scene needing analysis, acquiring a video record of the current analysis scene, performing frame skipping processing on the video every 50 frames because pedestrians walk slowly in the scene, and extracting images containing the pedestrians from the video, wherein the images form an original data set. And (3) carrying out surrounding frame labeling on the pedestrians in the video by using open source labeling software Labelimg, and storing a pedestrian labeling data file as a training data set for deep learning.

S2: calculating a corresponding pedestrian pixel width matrix W and a pedestrian pixel height matrix H according to the pixel width and height of the pedestrian at each position in the image in the training data set; the method comprises the following steps:

s21: reading a locally stored pedestrian marking data file;

S3: calculating the Scale and the aspect Ratio of an anchor frame of each region position in the image by using the pedestrian pixel width matrix W and the pedestrian pixel height matrix H; the method comprises the following steps:

s31: performing n-n gridding on the pedestrian pixel width matrix W and the pedestrian pixel height matrix H, and traversing a training data set to obtain the maximum pixel height H1 (near target) and the minimum pixel height H2 (far target) of a pedestrian in the current analysis scene, wherein n = H2/H1, and n is an integer obtained by downward integration;

And

then the aspect Ratio within each grid is {1:

/

}。

s4: performing deep learning fast Rcnn model training on a current scene image data set to obtain weight parameters of a trained fast Rcnn model; the method specifically comprises the following steps: the disclosed VGG-16 model was used as a feature extraction infrastructure network structure, the base of which was used for 13 convolutional layers. In the RPN network training process, the Scale and the aspect Ratio of the anchor frame are obtained through the center coordinates of the candidate frame generated by the RPN network. As shown in fig. 2, where a rectangular frame 1 is generated by the method, pedestrians are compactly wrapped. The rectangular frames 2-4 are rectangular frames generated by a conventional method, and have a large difference from the actual pedestrian size. Because the pixel width and height of the candidate frame generated in the RPN network training process are similar to the pedestrian pixel height in the analysis scene, the accuracy of the RPN network generating the candidate frame is improved, and the final FasterRCNN pedestrian detection accuracy is further improved. In the step, a final model is trained by adopting a fast RCNN end-to-end back propagation algorithm. After 10 ten thousand iterations, the weight parameters of the trained FasterRcnn model are saved to the local hard disk.

S5: deploying a pedestrian detection system for the current scene, acquiring the position coordinate and width and height of each pedestrian, and counting the total number of people or local density of the current scene by adopting a trained Faster Rcnn model; the method comprises the following steps:

In this embodiment, intel i7-8700k CPU and intevada 1080Ti GPU are used for training, and a traditional fast-rcnn neural network algorithm and the method of the present invention are used to detect and compare the citrypersons data set, and the results are shown in table 1:

TABLE 1 comparative results

CityPersons dataset
		Faster-rcnn	76.3
Method for producing a composite material	82.6

The calculation method comprises the following steps:

1. setting an IOU threshold for a certain category, where the threshold is set to 0.5 in this embodiment, (meaning that the intersection ratio of a prediction boundary frame detected as a human target and a real boundary frame is greater than 0.5, the boundary frame greater than the threshold is regarded as TP, and the other boundary frames are regarded as FP, TP represents a detected positive sample, FP represents a detected positive sample, FN is obtained by subtracting TP from the number of real positive samples in a test sample, FN is a missed positive sample), and counting the number of group truth frames (denominator for calculating a recall ratio) M and the number of detection frames N; the IOU represents the intersection-to-union ratio, i.e., the ratio of the intersection and union of the target prediction box and the real box.

2. Slave meterComputer memory initialization of a two-dimensional array

(i =1,2, 3...., N.j =1, 2), the first column stores the prediction score of the target classification, and the second column is used to mark whether the detection box is a TP.

3. Initializing precision, recall result matrices from computer memory

(i =1,2, 3.... N.j =1, 2), a first column stores recall values and a second column stores corresponding precision values. precision = TP/(TP + FP); recall = TP/(TP + FN).

4. Calculating IOU for each detection frame and ground Truth of each image, wherein TP is greater than threshold value, FP is greater than threshold value, and the value is assigned to the array

。

5. For two-dimensional array

The first column of prediction scores is ordered from large to small (the second column also follows the first column ordering).

6. For two-dimensional array

Calculating precision and recall ratio of current time line by line to obtain a group of

) Is assigned a value to

Row i of (2).

7. Calculating the AP (Average Precision) under the current category, specifically by

DrawingPR curve, then using interpolation method (0, 0.1, 0.2.. 1) to make 11 interpolation points or all points on the curve undergo the process of interpolation to obtain the area under the curve, namely

The area under the P-R curve can be used to evaluate the capability of the model to which the curve corresponds, i.e., the larger the area the better the model performance.

The present invention is not limited to the above-described embodiments, which are merely preferred embodiments of the present invention, and the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pedestrian detection method based on deep learning is characterized in that: the method comprises the following steps:

2. The pedestrian detection method based on deep learning of claim 1, wherein: the step S2 includes the steps of:

s21: reading a locally stored pedestrian marking data file;

3. The pedestrian detection method based on deep learning of claim 1, wherein: the step S3 includes the steps of:

And

then the aspect Ratio within each grid is {1:

/

}。

4. the pedestrian detection method based on deep learning of claim 1, wherein: in the step S4, in the RPN network training process, the Scale and the aspect Ratio of the anchor frame are obtained through the coordinates of the center of the candidate frame generated by the RPN network.

5. The pedestrian detection method based on deep learning of claim 1, wherein: the step S5 includes the steps of: