CN107784288A - A kind of iteration positioning formula method for detecting human face based on deep neural network - Google Patents
A kind of iteration positioning formula method for detecting human face based on deep neural network Download PDFInfo
- Publication number
- CN107784288A CN107784288A CN201711034973.7A CN201711034973A CN107784288A CN 107784288 A CN107784288 A CN 107784288A CN 201711034973 A CN201711034973 A CN 201711034973A CN 107784288 A CN107784288 A CN 107784288A
- Authority
- CN
- China
- Prior art keywords
- face
- net
- candidate
- model
- offset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 22
- 238000001514 detection method Methods 0.000 claims abstract description 63
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 239000011159 matrix material Substances 0.000 claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 230000001629 suppression Effects 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 10
- 230000004807 localization Effects 0.000 claims description 7
- 230000010354 integration Effects 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 6
- 238000013527 convolutional neural network Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000405217 Viola <butterfly> Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of iteration based on deep neural network to position formula method for detecting human face, comprises the following steps:Based on AFLW common image data sets, area image block therein is extracted as the input of training set and is pre-processed;Face candidate frame extraction model P Net and face skew fine setting model A Net are defined, the model is trained using above-mentioned training set;Full convolution strategy is used to the above-mentioned P of training pattern Net, to obtain the global detection matrix of consequence to sample;Picture input model P Net are obtained into face candidate frame in test process, then by the iterative fine setting face candidate frame positions of model A Net, coordinate maximum suppressing method to obtain final result.The inventive method is used in complex environment, using computer auto-detection face, and with accuracy is high, recognition speed is fast, steady performance.
Description
Technical Field
The invention relates to the technical field of image-based face detection, in particular to an iterative positioning type face detection method based on a deep neural network.
Background
1. Definition of face detection
Face detection means that given any one image, all faces (if any) in the image are automatically detected by a computer, and the positions of the faces are returned.
2. Importance of face detection
The human face is a visual mode with large information quantity, and the reflected visual information has important significance and function in life and work of people. Today, face recognition is widely applied in social life, in which face detection is a key link, and if the effect of a face detection algorithm is not good, the effect of a subsequent recognition algorithm is inevitably affected. Besides, based on identification algorithms such as age identification, gender identification, emotion identification and the like of the images, a face detection algorithm is also needed as a basic link. The wide application of the technologies promotes the importance of the face detection algorithm to a new height.
3. Technical development of face detection
The research on the face detection dates back to 70 s in the 20 th century, and the early research on the face detection mainly focuses on template matching, subspace method, deformed template matching and the like. These early face detection methods are often directed at front face detection under a simple and unchanging background, and do not have a good detection effect for faces under a complex environment. In the 90 s of the 20 th century to the beginning of the 21 st century, a face detection method based on a cascade structure is greatly developed, wherein on the basis of an Adaboost algorithm, viola and Jones use a Haar-like wavelet feature and an integral graph method to detect faces, the detection accuracy and the real-time performance of the method are greatly improved, but the face detection under a complex scene cannot be processed. In recent years, with the rapid development of deep learning, face detection algorithms based on deep learning have been developed greatly, and these methods include:
wu Suwen, wary vicavine face detection based on selective search and convolutional neural networks, 2016 [ J ] 9/28. Computer applications research, 2017 (2); chen Weidong, zhang Yang, yang Xiaolong face detection methods based on skin color features and depth models [ J ]. Industrial control computer, 2017,30 (3): 26-28; chen Rui, linda. Face key point localization based on cascaded convolutional neural networks [ J ]. Proceedings of Sichuan institute of technology (self edition), 2017,30 (1): 32-37; zhang Bailing, xia Yizhang, qian Rongjiang, etc. face occlusion detection method based on deep convolutional neural network, CN 106485215A [ P ].2017.
4. The current face detection method has the following defects: accuracy and speed
However, the method based on deep learning often has no advantage in speed, because the forward process of the deep neural network is time-consuming, and the forward process may need to be performed multiple times for one picture, which results in excessive time consumption. In addition, the existing method does not pay enough attention to the positioning accuracy of face detection, and actually, the positioning accuracy can affect the effects of subsequent algorithms such as face recognition and emotion recognition. Therefore, the algorithm utilizes the convolutional neural networks of two different tasks to combine with the detection result matrix to carry out face detection and iterative positioning of the face candidate box, and good effects on accuracy and real-time performance are obtained.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an iterative positioning type face detection method based on a deep neural network, wherein the multitask deep neural network is designed and trained by using mass data of a public data set, a face candidate frame is extracted by using a face candidate frame extraction model in the test process to obtain a preliminary face candidate frame, and then a face offset fine adjustment model is used for iteratively positioning a face for multiple times to obtain more accurate face positioning. The algorithm is used for detecting the face in real time in a complex environment and has the characteristics of high accuracy, stable performance and the like.
The purpose of the invention can be achieved by adopting the following technical scheme:
an iterative positioning type face detection method based on a deep neural network comprises the following steps:
s1, defining a face candidate frame extraction model P-Net and a face offset fine adjustment model A-Net;
s2, extracting data and corresponding labels required by training P-Net and A-Net based on the AFLW public image data set;
s3, fine tuning training P-Net and A-Net based on a classical convolution neural network by using the data obtained in the last step;
s4, adopting a full convolution strategy to the trained P-Net model to obtain a global detection result matrix of the input picture;
s5, inputting the picture under the multi-scale form into P-Net to obtain a detection result matrix of multiple scales for a picture to be tested, and obtaining a candidate face frame through the matrix and a narrowed non-maximum suppression algorithm;
s6, inputting the candidate face frame iterative formula into the A-Net for fine adjustment according to the face position judging condition until the judging condition is met;
s7, removing repeated face candidate frames by using a narrowing non-maximum suppression algorithm, and outputting a final detection result;
further, the face candidate frame extraction model P-Net and the face offset fine-tuning model a-Net defined in the step S1 both adopt AlexNet models, and output layers thereof are modified into 2 types and 45 types according to actual situations.
Further, the training data required by the P-Net in the step S2 is two types of data, namely a human face and a non-human face; the training data required for A-Net is class 45 data, namely face candidate boxes in various offset modes.
Further, the training method in the step S3 adopts random gradient descent, and matches with learning rate attenuation and momentum; the adopted loss function is a cross entropy loss function, and the concrete form is as follows:
wherein x represents the original signal, z represents the reconstructed signal, the lengths of which are all represented in a vector form are d, the length can be easily modified into a vector inner product form, and K represents the number of samples in one iteration.
Further, the fully convolution strategy in step S4 is to store parameters of the fully connected layer, then replace the fully connected layer with the convolution layer of the same size, and assign the parameters of the previously stored fully connected layer to the new convolution layer.
Further, the narrowing of the non-maximum suppression in step S5 is an object shape customized maximum suppression algorithm, which has a better effect on rectangular candidate frames with different aspect ratios, such as a human face. The method comprises the following specific steps:
before carrying out non-maximum value suppression on a plurality of partially overlapped candidate frames, carrying out center narrowing on an original square candidate frame, wherein the narrowing formula is as follows
Wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) For the bottom right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, which means that the candidate frame is narrowed to keep the original height and center point, but the width is reduced to 0.84 times the original width;
then, performing non-maximum suppression calculation and deduplication on the narrowed data, and after deduplication is finished, performing narrowing restoration, wherein a restoration formula is as follows:
wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) For the bottom right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, meaning that the candidate box is restored to maintain the original center point and height, and the width is enlarged to the original width size before narrowing.
Further, in step S6, for the face candidate frame obtained by the model P-Net in the previous step, the frame candidate image is input to a-Net for performing offset mode classification, the model a-Net outputs classification confidences of the frame candidate image for 45 offset modes, and the offset condition of the frame candidate is integrated by using the classification result, as follows:
wherein, [ s, x, y ]]Is the final integration result; n is the number of offset patterns, N =45, [ s ] n ,x n ,y n ]Is an offset pattern of class n, where n is an offset pattern subscript, following the 45 offset pattern settings previously; z is the number of offset patterns exceeding the threshold, and I is a weight calculation formula for calculating the respective weights of the offset patterns exceeding the threshold. The calculation formula for z and I is as follows:
wherein the definition of I is the same as that of the above weight calculation formula I
Wherein c is n Is the weight of the offset pattern n that exceeds the threshold, t is the threshold;
then, according to the classification result obtained in the above process, the direction of the inverse classification result is finely adjusted to obtain more accurate face positioning, which is specifically as follows:
for a candidate frame with coordinates (x, y) at the upper left corner and length (w, h), the offset mode is [ x, y, s ] obtained by model A-Net classification]Then x after fine tuning of the reverse offset mode direction new ,y new ,w new ,h new ]Comprises the following steps:
further, in the step S6, after the fine tuning in the reverse offset mode direction, the fine-tuned candidate frame is input into the model a-Net again, and whether the current candidate frame reaches the most suitable position is determined by the estimation of the offset condition again, if yes, the fine tuning is stopped, and the next step is performed; if not, the fine tuning step is continued until the condition is met or the iteration number exceeds the set threshold.
The formula for determining whether the candidate frame has reached the optimal position is as follows:
wherein [ s, x, y ] is the integrated offset pattern of the current candidate frame calculated according to the above formula, and meanwhile, the maximum iteration number is set to 10, that is, the maximum iteration number is 10 to end.
Compared with the prior art, the invention has the following advantages and effects:
1. the method adopts a convolutional neural network, trains the convolutional neural network through mass data, and enables the convolutional neural network to automatically learn convolutional kernels with strong expression and a combination mode of the convolutional kernels so as to obtain better human face characteristic expression;
2. the method of the invention sets a face candidate frame extraction model P-Net and a face offset fine tuning model A-Net, which respectively have two functions of face/non-face classification and offset mode classification, and the two functions have a mutual gain effect on the face detection effect.
3. The method adopts a full convolution strategy, so that the image can be input into the picture with any size, and a detection result matrix is obtained through one convolution neural network forward operation, so that the method has higher detection speed.
4. The method adopts iterative face candidate frame positioning, and the positioning technology is beneficial to positioning the face candidate frame to a more proper position, namely improving the positioning precision.
5. Compared with the traditional method, the method has the advantages of high accuracy, high detection speed and stable performance, and has certain market value and popularization value.
Drawings
FIG. 1 is a flowchart illustrating steps of an iterative localization-based face detection method based on a deep neural network disclosed in the present invention;
fig. 2 is a schematic diagram of an iterative positioning method in a prediction process in the iterative positioning type face detection method based on the deep neural network disclosed in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Examples
An iterative positioning type face detection method based on a deep neural network comprises the following steps:
s1, defining a face candidate frame extraction model P-Net and a face offset fine adjustment model A-Net:
in the step S1, two required model functions are defined as face candidate box extraction and face offset fine adjustment, then an AlexNet original model is downloaded from a model public source, and the output of the AlexNet original model is modified to 2 and 45, so as to adapt to the task requirement in the embodiment.
S2, extracting data and corresponding labels required by the training model P-Net and the model A-Net based on the AFLW common image data set, wherein in the step S2,
1. for the face/background classification task, data is obtained by cropping a picture in the AFLW common image dataset. The AFLW data set comprises 25000 pictures, the total number of faces is 50000, the position of a real face frame on each picture is marked by using coordinates (x 1, y 1) at the upper left corner and coordinates (x 2, y 2) at the lower right corner, and the real face frame is expressed as:
(x1,y1,x2,y2)
the face picture is cut by utilizing the coordinate information, and meanwhile, in order to enlarge a data set, certain displacement is allowed to be carried out on a cut face frame, as long as the following conditions are met:
defining the background picture as satisfying
And obtaining a background picture by adopting a random cutting mode.
B is a candidate frame for cutting the picture, and g is a standard human face actual frame in the data set; the IOU is the intersection ratio, namely the area proportion of the overlapping area of the rectangular frames b and g to the union of b and g.
2. For the offset mode classification task, data is obtained by cropping a picture in the AFLW common image dataset. Wherein for each face, the corresponding offset pattern is passed
[x n ,y n ,s n ],
Wherein,
s n ∈{0.83,0.91,1.0,1.10,1.21}
x n ∈{-0.17,0,0.17}
y n ∈{-0.17,0,0.17}
a total of 45 offset face images are cut out (discarded if the size of the face image exceeds the size of the picture), and each picture is labeled with one category, and 45 categories are provided.
S3, fine tuning a training model P-Net and a model A-Net based on a classical convolution neural network by using the data obtained in the previous step;
in the step S3, a model is trained by using random gradient descent in cooperation with learning rate attenuation and momentum, and the specific parameters are as follows:
parameter name/model | Learning rate | Maximum number of iterations | Batch size | Learning rate decay rate |
Face candidate frame extraction model | 0.001 | 90000 | 128 | 0.1 |
Human face offset fine-tuning model | 0.002 | 60000 | 128 | 0.12 |
The training process is implemented on the TensorFlow framework.
S4, adopting a full convolution strategy to the trained model P-Net to obtain a global detection result matrix of the input picture;
the full convolution strategy in step S4 is to store the parameters of the full link layer, replace the full link layer with the convolution layer of the same size, and assign the parameters of the previously stored full link layer to the new convolution layer.
S5, for a picture to be tested, inputting the picture in a multi-scale form into a model P-Net to obtain detection result matrixes in multiple scales, and obtaining a candidate face frame through the matrixes and a narrowing non-maximum suppression algorithm:
in the step S5, for one picture to be tested, scaling is performed by using 6 scaling rates, i.e. 0.79, 1, 1.26, 1.59, 2.0, and 5.0, to obtain 6 pictures with different scales. Inputting the 6 pictures into a model P-Net, 6 global detection result matrixes can be obtained, each data point on the matrixes represents the classification result whether a certain square area in the original picture is a human face or not, candidate frames classified as the human face are screened out through analyzing data information of the 6 matrixes, and the screening criteria are as follows:
p (face) >0.85
Where p (face) represents the confidence that the candidate box is a face.
For the screened candidate frames, a narrowing non-maximum suppression algorithm is used to obtain candidate face frames, which specifically comprises the following steps:
before performing non-maximum suppression on a plurality of partially overlapped candidate frames, performing center narrowing on an original square candidate frame, wherein the narrowing formula is as follows:
wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) For the bottom right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, which means that the candidate frame is narrowed to keep the original height and center point, but the width is reduced to 0.84 times the original width;
and then carrying out non-maximum suppression calculation and deduplication on the narrowed data, wherein the non-maximum suppression algorithm is calculated as follows:
and (3) sorting all candidate frames according to the confidence degrees of the classified face, taking the candidate frame with the highest confidence degree as a target candidate frame, searching other candidate frames with the coincidence rate exceeding 0.3, removing the candidate frames, and storing the target candidate frame. Then, the object with the highest confidence level is continuously selected from the candidate boxes, and the process is continuously carried out until no candidate box exists.
After the duplication removal is finished, narrowing restoration is carried out, and the restoration formula is as follows:
wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) For the bottom right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, which means that the candidate frame is narrowed to keep the original height and center point, but the width is reduced to 0.84 times the original width;
s6, inputting the candidate face frame iterative formula to A-Net for fine adjustment according to the face position judging condition until the judging condition is met;
in the step S6, the face candidate frames obtained in the step S5 are cut out, scaled to a size of 227x 227 and input to a-Net. A-Net will output the classification confidence of the candidate box image for 45 shift modes, i.e., a vector of length 45. Then, the offset condition of the candidate frame is integrated by using the classification result, as follows:
wherein, [ s, x, y [ ]]Is the final integration result; n is the number of offset patterns, N =45, [ s ] n ,x n ,y n ]Is an offset mode of class n, n being in the offset modeThe target follows the previous setting of 45 offset patterns; z is the number of offset patterns exceeding the threshold, and I is a weight calculation formula for calculating the weight of each offset pattern exceeding the threshold. The calculation formula for z and I is as follows:
wherein the definition of I is the same as that of the above weight calculation formula I
Wherein c is n Is the weight of the offset pattern n that exceeds the threshold, t is the threshold;
the mathematical meaning of the above formula is that we choose the weighted direction of the shift directions of those shift modes whose confidence is greater than the threshold as the estimate of the image shift mode of the candidate frame.
Then, according to the classification result obtained in the above process, the direction of the inverse classification result is finely adjusted to obtain more accurate face positioning. The method comprises the following specific steps:
for a candidate box with coordinates (x, y) at the upper left corner and length (w, h), obtaining the offset mode [ x, y, s ] of the candidate box through A-Net classification]Then x after fine tuning of the reverse offset mode direction new ,y new ,w new ,h new ]Comprises the following steps:
s7, removing repeated face candidate frames by using a narrowing non-maximum suppression algorithm, and outputting a final detection result;
the narrowing non-maximum suppression algorithm used in the above step S7 is the same as that in step S5, and the result is output until this time as the final result.
Example two
The embodiment specifically introduces an iterative positioning type face detection method based on a deep neural network from the aspects of framework building, data set preparation, model training and actual testing, and the specific process is described as follows.
1. The framework building process is as follows:
1. installing an Nvidia GPU driver and a related computing library on a Linux server;
2. and compiling and installing a deep learning framework TensorFlow.
2. The data set preparation process is as follows:
1. writing a tool script by using Python language to obtain a training set, cutting an image in an AFLW public image data set under the condition of four thread by using the tool script, and automatically recording a face/background label and an offset mode label;
2. for the data, a random copy method is adopted for a face/non-face training set to keep the face/non-face proportion of 1:3; and for the offset mode, the control parameters in the tool script are adopted to keep the equalization of various training samples.
3. Carrying out normalization processing on the picture data;
4. and converting the processed human face picture and the label thereof into a tfrecrds data format which can be stored in a memory in a large quantity and has a higher reading speed.
3. The training process is as follows:
1. downloading an AlexNet convolutional neural network model through a Tensorflow public model network publishing platform;
2. parameters of the convolution layer, the down-sampling layer and the front two full-connection layers of the AlexNet are reserved, and the number of output nodes of the final output layer is modified to be 2 and 45, so that the AlexNet and the face offset fine-tuning model are respectively adapted to a face candidate frame extraction model P-Net and a face offset fine-tuning model A-Net;
3. inputting a fixed number of samples into a convolutional neural network according to the number of samples in a batch for tfrecrds format data obtained in the data set preparation process;
4. outputting a feature map through a plurality of convolutional layers and downsampling layers in an AlexNet convolutional neural network model;
5. mapping the feature map features to a full connection layer through a concatemate operation;
6. calculating the classification result of the sample through a Softmax classifier, and sending the result to a loss function layer;
7. calculating the loss and the return gradient of the system according to the result and the loss function;
8. adjusting parameters of the convolutional neural network through a back propagation algorithm; the back propagation algorithm adopts the following hyper-parameters, and the hyper-parameters are obtained by multiple cross validation:
parameter name/model | Learning rate | Maximum number of iterations | Batch size | Learning rate decay rate |
Face candidate frame extraction model | 0.001 | 90000 | 128 | 0.1 |
Human face offset fine-tuning model | 0.002 | 60000 | 128 | 0.12 |
9. And repeating the processes from 3 to 8 until the maximum iteration number reaches a set threshold value, and terminating the training.
10. And adopting a full convolution strategy to the trained model P-Net to obtain a global detection result matrix of the sample.
4. The test procedure was as follows:
writing a test script by using Python language, wherein the test script comprises the following operations:
1. completing normalization operation on the pictures to be tested, and performing multi-scale operation to obtain at most 6 input pictures under multiple scales;
2. loading and training to obtain a model P-Net and a model A-Net;
3. inputting the processed picture to be tested into a model P-Net, analyzing the obtained at most 6 global result matrixes, and matching with narrowing non-maximum value inhibition to obtain a face candidate frame;
4. according to the face candidate frame, cutting out a face picture from an original picture, inputting the face picture into the model A-Net, and carrying out fine adjustment according to the result and determining whether to carry out repeated iterative fine adjustment;
5. and processing the residual candidate boxes again by using a narrowing non-maximum value suppression algorithm, and outputting the result as a final result.
In summary, the present invention extracts the image blocks of the region in the AFLW public image data set as the input of the training set and performs the preprocessing; defining a face candidate frame extraction model P-Net and a face offset fine tuning model A-Net, and using the training set to fine-tune and train the model; and adopting a full convolution strategy to the trained model P-Net to obtain a global detection result matrix of the sample. In the testing process, the picture is input into the model P-Net to obtain a face candidate frame, then the position of the face candidate frame is iteratively fine-tuned through the model A-Net, and a final result is obtained by matching with a maximum value inhibition method. The method is used for automatically detecting the face by using the computer in a complex environment, and has the advantages of high accuracy, high recognition speed, stable performance and the like.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (9)
1. An iterative positioning type face detection method based on a deep neural network is characterized by comprising the following steps:
s1, defining a face candidate frame extraction model P-Net and a face offset fine adjustment model A-Net;
s2, extracting data and corresponding labels required by a training model P-Net and a model A-Net based on the AFLW common image data set;
s3, fine-tuning a training model P-Net and a model A-Net based on a classical convolution neural network by using the data obtained in the previous step;
s4, adopting a full convolution strategy to the trained model P-Net to obtain a global detection result matrix of the input picture;
s5, inputting a picture to be tested into the model P-Net in a multi-scale mode to obtain detection result matrixes in multiple scales, and obtaining a candidate face frame through the matrixes and a narrowed non-maximum suppression algorithm;
s6, inputting the candidate face frame iterative formula into the model A-Net for fine adjustment according to the face position judging condition until the judging condition is met;
and S7, removing repeated face candidate frames by using a narrowing non-maximum suppression algorithm, and outputting a final detection result.
2. The iterative localization type face detection method according to claim 1, wherein in the face offset fine tuning model a-Net in step S1, the model is set to be an N-class classification model, the N-class face offset mode is used to evaluate an offset degree of a face candidate frame with respect to a face real frame, the face offset mode is measured by three factors, i.e. a horizontal axis, a vertical axis, and a scaling rate, and the setting is as follows:
defining a set of offset patterns:
wherein x is n Representing the rate of shift of the frame candidate in the x-axis relative to the frame candidate width itself, y n Representing the rate of shift, s, of the frame candidate in the y-axis relative to the frame candidate length itself n Represents the ratio of the candidate box to itself, which should be scaled, N represents the number of classes of the offset pattern, and N is the class index.
3. The iterative positioning type face detection method based on the deep neural network as claimed in claim 2, wherein the offset mode class number N =45, N is set as a class index,
for x n ,y n ,s n The respective assignments are as follows, and 5x 3x 3=45 categories can be obtained:
4. the iterative localization-type face detection method based on the deep neural network as claimed in claim 1, wherein the full convolution strategy in step S4 is to store parameters of the full connection layer, replace the full connection layer with the convolution layer of the same size, and assign the parameters of the full connection layer stored before to the new convolution layer.
5. The iterative localization type face detection method according to claim 1, wherein each point of the detection result matrix in step S5 represents a detection result of a square area with a size of 227 × 227 pixels in the original picture, and the face candidate frame is obtained by restoring the detection result to a candidate frame in the original picture and using a narrowing non-maximum suppression algorithm according to the overlapping condition of the candidate frame.
6. The iterative localized face detection method according to claim 1, wherein the narrowing non-maximum suppression in step S5 is an object shape customized maximum suppression algorithm, which has a better effect on rectangular candidate frames with different aspect ratios, such as a human face, as follows:
before carrying out non-maximum value suppression on a plurality of partially overlapped candidate frames, carrying out center narrowing on an original square candidate frame, wherein the narrowing formula is as follows
Wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) For the bottom right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, which means that the candidate frame is narrowed to keep the original height and center point, but the width is reduced to 0.84 times the original width;
then, carrying out non-maximum suppression calculation and deduplication on the narrowed data, and after deduplication is finished, carrying out narrowing restoration, wherein a restoration formula is as follows:
wherein (x) 1 ,y 1 ) Is the coordinate of the upper left corner, (x) 2 ,y 2 ) In the lower right corner coordinate, narrowrate is the narrowing rate, which is set to 0.08, which means that the candidate frame is restored to maintain the original center point and height, and the width is enlarged to the original width before narrowing.
7. The iterative localization type face detection method based on the deep neural network of claim 1, wherein in step S6, for the face candidate frame obtained by the model P-Net, the frame candidate image is input to the model a-Net for the shift mode classification, the model a-Net outputs the classification confidence of the frame candidate image for N shift modes, and the shift condition of the frame candidate is integrated by using the classification result as follows:
wherein, [ s, x, y [ ]]Is the final integration result; n is the number of offset patterns, N =45, [ s ] n ,x n ,y n ]Is an offset pattern of class n, where n is an offset pattern subscript, following the 45 offset pattern settings previously; z is the number of offset patterns exceeding the threshold, and I is a weight calculation formula for calculating the respective weights of the offset patterns exceeding the threshold. The calculation formula for z and I is as follows:
wherein the definition of I is the same as that of the above weight calculation formula I
Wherein c is n Is the weight of the offset pattern n that exceeds the threshold, t is the threshold;
then, according to the classification result obtained in the above process, the direction of the inverse classification result is finely adjusted to obtain more accurate face positioning, which is specifically as follows:
for a candidate box with (x, y) coordinates at the upper left corner and (w, h) length and width, obtaining the offset mode of [ x, y, s ] through A-Net classification]Then x after fine tuning of the reverse offset mode direction new ,y new ,w new ,h new ]Comprises the following steps:
8. the iterative localization type face detection method based on the deep neural network of claim 1, wherein in step S6, after the fine tuning in the direction of the reverse offset mode, the fine-tuned candidate frame is input into the model a-Net again, and whether the current candidate frame reaches the most suitable position is determined by the estimation of the offset condition again, if yes, the fine tuning is stopped, and the next step is performed; if not, the fine tuning step is continued until the condition is met or the iteration number exceeds the set threshold.
9. The iterative localization type face detection method based on the deep neural network of claim 8, wherein the formula for determining whether the candidate frame reaches the most suitable position is as follows:
wherein [ s, x, y ] is the integrated offset pattern of the current candidate frame calculated according to the above formula, and meanwhile, the maximum iteration number is set to 10, that is, the maximum iteration number is 10 times over.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711034973.7A CN107784288B (en) | 2017-10-30 | 2017-10-30 | Iterative positioning type face detection method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711034973.7A CN107784288B (en) | 2017-10-30 | 2017-10-30 | Iterative positioning type face detection method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107784288A true CN107784288A (en) | 2018-03-09 |
CN107784288B CN107784288B (en) | 2020-01-14 |
Family
ID=61432442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711034973.7A Active CN107784288B (en) | 2017-10-30 | 2017-10-30 | Iterative positioning type face detection method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107784288B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109031262A (en) * | 2018-06-05 | 2018-12-18 | 长沙大京网络科技有限公司 | Vehicle system and method are sought in a kind of positioning |
CN109145798A (en) * | 2018-08-13 | 2019-01-04 | 浙江零跑科技有限公司 | A kind of Driving Scene target identification and travelable region segmentation integrated approach |
CN109344762A (en) * | 2018-09-26 | 2019-02-15 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109684920A (en) * | 2018-11-19 | 2019-04-26 | 腾讯科技(深圳)有限公司 | Localization method, image processing method, device and the storage medium of object key point |
CN109840565A (en) * | 2019-01-31 | 2019-06-04 | 成都大学 | A kind of blink detection method based on eye contour feature point aspect ratio |
CN110321841A (en) * | 2019-07-03 | 2019-10-11 | 成都汇纳智能科技有限公司 | A kind of method for detecting human face and system |
CN110472640A (en) * | 2019-08-15 | 2019-11-19 | 山东浪潮人工智能研究院有限公司 | A kind of target detection model prediction frame processing method and processing device |
CN110647813A (en) * | 2019-08-21 | 2020-01-03 | 成都携恩科技有限公司 | Human face real-time detection and identification method based on unmanned aerial vehicle aerial photography |
CN112183351A (en) * | 2020-09-28 | 2021-01-05 | 普联国际有限公司 | Face detection method, device and equipment combined with skin color information and readable storage medium |
CN113139460A (en) * | 2021-04-22 | 2021-07-20 | 广州织点智能科技有限公司 | Face detection model training method, face detection method and related device thereof |
CN115830411A (en) * | 2022-11-18 | 2023-03-21 | 智慧眼科技股份有限公司 | Biological feature model training method, biological feature extraction method and related equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130108171A1 (en) * | 2011-10-28 | 2013-05-02 | Raymond William Ptucha | Image Recomposition From Face Detection And Facial Features |
US20160004904A1 (en) * | 2010-06-07 | 2016-01-07 | Affectiva, Inc. | Facial tracking with classifiers |
CN105701467A (en) * | 2016-01-13 | 2016-06-22 | 河海大学常州校区 | Many-people abnormal behavior identification method based on human body shape characteristic |
CN106874868A (en) * | 2017-02-14 | 2017-06-20 | 北京飞搜科技有限公司 | A kind of method for detecting human face and system based on three-level convolutional neural networks |
-
2017
- 2017-10-30 CN CN201711034973.7A patent/CN107784288B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160004904A1 (en) * | 2010-06-07 | 2016-01-07 | Affectiva, Inc. | Facial tracking with classifiers |
US20130108171A1 (en) * | 2011-10-28 | 2013-05-02 | Raymond William Ptucha | Image Recomposition From Face Detection And Facial Features |
CN105701467A (en) * | 2016-01-13 | 2016-06-22 | 河海大学常州校区 | Many-people abnormal behavior identification method based on human body shape characteristic |
CN106874868A (en) * | 2017-02-14 | 2017-06-20 | 北京飞搜科技有限公司 | A kind of method for detecting human face and system based on three-level convolutional neural networks |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109031262A (en) * | 2018-06-05 | 2018-12-18 | 长沙大京网络科技有限公司 | Vehicle system and method are sought in a kind of positioning |
CN109145798A (en) * | 2018-08-13 | 2019-01-04 | 浙江零跑科技有限公司 | A kind of Driving Scene target identification and travelable region segmentation integrated approach |
CN109145798B (en) * | 2018-08-13 | 2021-10-22 | 浙江零跑科技股份有限公司 | Driving scene target identification and travelable region segmentation integration method |
CN109344762A (en) * | 2018-09-26 | 2019-02-15 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109684920B (en) * | 2018-11-19 | 2020-12-11 | 腾讯科技(深圳)有限公司 | Object key point positioning method, image processing method, device and storage medium |
CN109684920A (en) * | 2018-11-19 | 2019-04-26 | 腾讯科技(深圳)有限公司 | Localization method, image processing method, device and the storage medium of object key point |
US11450080B2 (en) | 2018-11-19 | 2022-09-20 | Tencent Technology (Shenzhen) Company Limited | Image processing method and apparatus, and storage medium |
CN109840565A (en) * | 2019-01-31 | 2019-06-04 | 成都大学 | A kind of blink detection method based on eye contour feature point aspect ratio |
CN110321841A (en) * | 2019-07-03 | 2019-10-11 | 成都汇纳智能科技有限公司 | A kind of method for detecting human face and system |
CN110472640A (en) * | 2019-08-15 | 2019-11-19 | 山东浪潮人工智能研究院有限公司 | A kind of target detection model prediction frame processing method and processing device |
CN110472640B (en) * | 2019-08-15 | 2022-03-15 | 山东浪潮科学研究院有限公司 | Target detection model prediction frame processing method and device |
CN110647813A (en) * | 2019-08-21 | 2020-01-03 | 成都携恩科技有限公司 | Human face real-time detection and identification method based on unmanned aerial vehicle aerial photography |
CN112183351A (en) * | 2020-09-28 | 2021-01-05 | 普联国际有限公司 | Face detection method, device and equipment combined with skin color information and readable storage medium |
CN112183351B (en) * | 2020-09-28 | 2024-03-29 | 普联国际有限公司 | Face detection method, device and equipment combined with skin color information and readable storage medium |
CN113139460A (en) * | 2021-04-22 | 2021-07-20 | 广州织点智能科技有限公司 | Face detection model training method, face detection method and related device thereof |
CN115830411A (en) * | 2022-11-18 | 2023-03-21 | 智慧眼科技股份有限公司 | Biological feature model training method, biological feature extraction method and related equipment |
CN115830411B (en) * | 2022-11-18 | 2023-09-01 | 智慧眼科技股份有限公司 | Biological feature model training method, biological feature extraction method and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107784288B (en) | 2020-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107784288B (en) | Iterative positioning type face detection method based on deep neural network | |
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN108427920B (en) | Edge-sea defense target detection method based on deep learning | |
CN110348319B (en) | Face anti-counterfeiting method based on face depth information and edge image fusion | |
CN108470320B (en) | Image stylization method and system based on CNN | |
EP3388978B1 (en) | Image classification method, electronic device, and storage medium | |
WO2019233166A1 (en) | Surface defect detection method and apparatus, and electronic device | |
CN111814794B (en) | Text detection method and device, electronic equipment and storage medium | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
WO2016054779A1 (en) | Spatial pyramid pooling networks for image processing | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN109919145B (en) | Mine card detection method and system based on 3D point cloud deep learning | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN112861970B (en) | Fine-grained image classification method based on feature fusion | |
CN116310850B (en) | Remote sensing image target detection method based on improved RetinaNet | |
CN109101984B (en) | Image identification method and device based on convolutional neural network | |
CN114998756A (en) | Yolov 5-based remote sensing image detection method and device and storage medium | |
CN112329771A (en) | Building material sample identification method based on deep learning | |
CN115861595B (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning | |
CN109543716B (en) | K-line form image identification method based on deep learning | |
CN116597275A (en) | High-speed moving target recognition method based on data enhancement | |
CN116563227A (en) | Tea bud detection method, device, equipment and storage medium | |
CN116977265A (en) | Training method and device for defect detection model, computer equipment and storage medium | |
CN116206302A (en) | Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium | |
CN112396648B (en) | Target identification method and system capable of positioning mass center of target object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210918 Address after: 011599 West third floor and west fourth floor of enterprise headquarters in Shengle modern service industry cluster, Shengle economic Park, Helinger County, Hohhot City, Inner Mongolia Autonomous Region Patentee after: INNER MONGOLIA KEDIAN DATA SERVICE Co.,Ltd. Address before: 510006 South China University of Technology, Guangzhou University City, Panyu District, Guangzhou City, Guangdong Province Patentee before: SOUTH CHINA University OF TECHNOLOGY |
|
TR01 | Transfer of patent right |