CN102799901A

CN102799901A - Method for multi-angle face detection

Info

Publication number: CN102799901A
Application number: CN2012102369651A
Authority: CN
Inventors: 李�灿
Original assignee: WISLAND TECHNOLOGY (BEIJING) Pte Ltd
Current assignee: Chen Yuchun
Priority date: 2012-07-10
Filing date: 2012-07-10
Publication date: 2012-11-28
Anticipated expiration: 2032-07-10
Also published as: CN102799901B

Abstract

The invention provides a method for multi-angle face detection. The method comprises the following steps: detecting input images using a maximally stable extremal region feature detection method to obtain maximum stable extremum regions of the images, normalizing the obtained regions to obtain rectangular to-be-detected images of candidate faces, and then using an LBP algorithm to extract LBP characteristics of the to-be-detected images, according to the LBP characteristics, using a neural network RBF function as a face pose classifier to divide the images into three subclasses of left, front, and right according to face poses, calculating the images of each pose subclass through a continuous Adaboost algorithm to obtain facial features under the three pose subclasses, and finally, combining calculating results under the three pose subclasses to detect the face in the image.

Description

A kind of multi-angle method for detecting human face

Technical field

The invention belongs to personage's target query search technique method field, especially relate to a kind of multi-angle method for detecting human face.

Background technology

Present research is divided into four types to the method for people's face detection of single image:

(1) based on the method for knowledge.These methods based on priori are encoded to the knowledge of forming the typical human face.Usually, priori has comprised the mutual relationship between these face characteristics.These class methods are mainly used in people's face location.A difficult point of this method is how to be converted into the good criterion of definition to people's face knowledge.If criterion is too detailed, some people's face will be by omission, because it is not through all criterions so.If criterion is too rough, then possibly make a lot of actively mistakes.In addition, this method is difficult to be extended to the people's face that detects different gestures, because will enumerate relatively difficulty of all possible situation.On the other hand, this heuristic detects for the front face in the special scenes and has good effect.

(2) the constant method of characteristic.The target of these algorithms is to find out some architectural features of existence, and these characteristics remain unchanged under the situation that posture, observation point, illumination condition change.Use these characteristics to locate people's face then.These methods are mainly used in people's face location.Relative with the method based on knowledge, researchers are attempting to find the invariant of people's face always, are used for detecting people's face.Utilize global characteristics such as elephant hide skin color, size, shape to find out candidate face, then with local feature checking these candidate face, for example eyebrow, nose and hair.A typical method is, detects the zone of picture skin at first, utilizes combinatory analysis or swarm algorithm and to connect like people's face group pixels then.If the zone that is connected has oval or avette, should just become candidate face in the zone so.

(3) masterplate matching method.This method at first is several standard masterplates of people's face of storage, is used for describing the part characteristic of whole people's face or people's face.Then through calculating input image and the degree of correlation of having stored between the masterplate detect.These methods both can be used for the detection of people's face and also can be used for people's face location.

The submodule version of use eyes, nose, face and facial contour such as Sakai is set up the faceform in the time of early stage.Propositions such as Miao are carried out people's face with a kind of layering masterplate matching method and are detected.With input picture by a fixed step size from being rotated, form image layeredly, extract the edge with Laplace transform.People's face masterplate comprises the edge of six people's face structural elements: two eyebrows, two eyes, nose, an a mouth cling to.The existence of using heuristic decider face at last whether.Sinha uses one group of spatial image invariant to describe the spatial character of people's face pattern.When variable changed the brightness of people's face different piece along with the change of illumination, the relative brightness of these parts remained unchanged basically.Confirm the paired brightness ratio of some zone similarities, keep the approximate trend of these ratios, regional brighter or darker than another such as a zone, these ratios just improve to us and have supplied a good invariant.Therefore, observed brightness rule is encoded into coarse people's face space proportion masterplate, comprises the subregion corresponding to main face characteristic of suitable selection in this masterplate, like eyes, cheek and forehead.Through one group between subregion suitable bright-secretly concern obtaining the brightness limit between face characteristic.

4) based on the method for outward appearance.Different with masterplate coupling is, through study and get, these images should comprise the representative changing factor of people's face outward appearance to the masterplate here from one group of training image.These methods are mainly used in people's face and detect.

Method based on outward appearance can be understood through the probability of use framework.Image vector that order is obtained from piece image or proper vector are a stochastic variable x, through the value of conditional density function decision x.Can use this classification of shellfish page or leaf and maximum likelihood method decision candidate image position to be people's face or not to be people's face then.Based on another realization approach of outward appearance method be people's face and non-face between find out a discriminant function.By convention, image model is mapped to lower dimensional space, forms discriminant function then and be used for classification, perhaps form a non-linear judgement face with multilayer neural network.

The development of human face detection tech can be divided into several stages on speed, mainly be that people's face with the raising of accuracy of detection and various visual angles detects at the research initial stage is main contents, and is less relatively to the concern of speed.The research of some representative row has the method for k-mean cluster, and this method is set up a plurality of face templates through the method for cluster in feature space, utilizes the neural network learning training sample to the distance between each template.Think that based on the method for detecting human face of features of skin colors the color of skin of people's face has consistance, can describe with a kind of unified model.When the colour of skin is used for the detection of people's face, different modeling methods be can adopt, Gauss model, gauss hybrid models and non-parametric estmation etc. mainly contained.The non-parametric kernel function probability density estimation technique also can be used to set up complexion model, and mean shift method can realize the detection and tracking of face on this basis.This algorithm has improved the detection speed of people's face, for blocking with illumination certain robustness is arranged also.But have difficulties when this method dealing with complicated background and a plurality of people's face.In order to solve the illumination problem, there is the scholar to propose to compensate, and then detects the method for area of skin color to different light.The turning point that people's face detection speed improves is that Adaboost and the Cascade algorithm that P.Viola proposes realized real-time face detection system, makes people's face detect from truly moving towards practical.Learning algorithm based on AdaBoost.It can select the very little crucial characteristic of a part from a very big feature set, thereby produces an extremely effectively sorter.System based on Boosting and Cascade algorithm has very big superiority aspect the speed.It is on the basis of a series of Haar-like characteristics that propose, and through some Weak Classifiers of Boosting algorithm study, is combined into a strong device that divides.But a general strong classifier also is not enough to accomplish the task satisfactorily, and also wants a series of such strong classifiers of cascade.But to further improve accuracy of detection, just need the more strong classifier of cascade, but can reduce detection speed so again.We think can adopt better, calculate but simpler characteristics of image expression method on the one hand; Be exactly comprehensive Adaboost and some strong classifiers on the other hand.

Investigation and analysis through above-mentioned learn that cutting does not also have the real-time enhanced system of HD video on the home market so far, and method for detecting human face is comparatively single, are difficult to satisfy the demand that the complex environment hypograph strengthens.The fundamental research of figure image intensifying also remains further to go deep into and improve.

Through nearly 40 years research, the achievement in research in recognition of face field achieves great success, for further investigation provides abundant method and experience.Yet up to the present, any method all has its certain applications condition and limitation, can not be adapted to various situations fully.

Summary of the invention

Limited by application conditions in order to remedy above-mentioned people's face in the image is detected, the defective that range of application is limited to, the present invention proposes a kind of multi-angle method for detecting human face; Its technical scheme does; This method at first adopts the mode of maximum stable extremal region feature detection that image is detected to the image of input, obtains the maximum stable extremal region of this image, and normalizing is carried out in the zone that obtains; Obtain the rectangle image to be detected of candidate face; Adopt the LBP algorithm to extract the LBP characteristic image to be detected then, and adopt neural network RBF function as the human face posture sorter according to the LBP characteristic image, with image according to the attitude of people's face be divided into a left side, just, right three sub-category; Calculate through the image of continuous Adaboost algorithm again each attitude subclass; Calculate three face characteristics under the attitude subclass, at last the result of calculation under three attitude subclasses is merged, detect the people's face in the image.

Said maximum stable extremal region feature detection mode is meant in the image of input, distinguish image interior pixels gray scale all greater than or all regional less than the topography of its surrounding pixel gray scale, the steps include:

(1) with the gray scale of each pixel in the image according to the size of gray-scale value according to sorting;

(2) utilize 256 interior different values of scope interval [0,255] of gray scale that gray level image is carried out binaryzation, count the connected region in the bianry image then;

(3) make regional Q _iFor any connected region in the corresponding bianry image of image binaryzation rear region i, when binary-state threshold becomes i+ Δ and i-Δ respectively by i, the also corresponding Q that becomes of connected region _{The i+ Δ}And Q _{The i-Δ}, through the local minimum value function of q (i):

q (i) = \frac{| Q_{i + Δ} - Q_{i - Δ} |}{| Q_{i} |}

Pick out relative area and change the stabilized zone that less variation takes place, promptly obtain maximum stable extremal region with binary-state threshold, in the function | Q _i| represent regional Q _iArea, Q _{The i+ Δ}-Q _{The i-Δ}Expression Q _{The i+ Δ}The zone deducts Q _{The i-Δ}Remaining area behind the zone;

(4) maximum stable extremal region that obtains in the step (3) is carried out normalization.

The above-mentioned maximum stable extremal region that adopts in the step (4) acquisition of stating is mapped to the fixedly mode of the image of MxN size through bilinear interpolation, and image is carried out normalization; Wherein, M is a picture altitude, and N is a picture traverse.

In the present technique scheme, the continuous Adaboost algorithm that the image of each attitude is calculated is:

1. establish sample set S={ (x ₁, y ₁) ..., (x _m, y _m), x _i∈ X, y _i∈ Y={-1 ,+1}, i=1 ..., m, x _iFor in maximum stable extremal region, extracting the LBP characteristic that obtains, y _iFor being non-face mark;

2. with the step sample initialization of arriving 1., extract (x _i, y _i) ∈ S, w _t(x _i, y _i)=1/m, w _t(x _i, y _i) be sample weights, obtain a sorter h _tUnder classification results x _i→ y _i

3. upgrade sample weights

w_{t + 1} (x_{i}, y_{i}) = \frac{w_{t} (x_{i}, y_{i}) \exp (- α_{t} y_{i} h_{t} (x_{i}))}{Z_{t}}

Z _tBe sample weights normalization constant, make

4. iteration finishes, and the final cascade classifier that forms is:

H (x) = sign (Σ_{t = 1}^{T} α_{t} h_{t} (x)) .

In the present technique scheme, adopt based on maximization criterion (maximum rule) fusion method the rectangle testing result under three attitude subclasses that obtain merged,

If three are detected rectangle R ₁, R ₂, R ₃Between all do not have overlappingly, then testing result is R={R ₁, R ₂, R ₃;

If two rectangle R wherein _x, R _y, R _x, R _y∈ { R ₁, R ₂, R ₃And R _x≠ R _yHave overlappingly, then fusion results is R={R _z, argmax (f (R _x), f (R _y)), R wherein _zFor removing R _x, R _yRectangle, f (R _x) be to detect degree of confidence,

f (R_{x}) = Σ_{t = 1}^{T} α_{t} h_{t} (x);

If it is overlapping that three rectangles all have each other, then testing result is R={argmax (f (R ₁), f (R ₂), f (R ₃)).

This paper introduces maximum stable extremal region characteristic (Maximally stable extremal regions) characteristic, is called for short the MSER characteristic) alleviate the problem of illumination conversion and view transformation.At first in image, extract the MSER zone.The MSER characteristic has certain unchangeability to the conversion at visual angle and the conversion of illumination, can alleviate omission and false-alarm that this two aspects factor causes.On normalized MSER characteristic, extract the LBP characteristic and utilize colourful attitude human-face detector to detect.On the one hand this method has been alleviated the illumination problem, and this method has been got rid of a large amount of non-facely when extracting MSER on the other hand, therefore can improve the speed that people's face detects greatly.

Description of drawings

Fig. 1 is the schematic flow sheet of the detection method of an embodiment of the present invention;

Fig. 2 is the synoptic diagram of a kind of LBP operator of P=4, R=1;

Fig. 3 is the synoptic diagram of a kind of LBP operator of P=8, R=1;

Fig. 4 is the synoptic diagram of a kind of LBP operator of P=16, R=1;

Fig. 5 is a kind of schematic flow sheet of LBP textural characteristics computation process.

Embodiment

Below in conjunction with accompanying drawing the present invention is further specified.

With reference to Fig. 1, a kind of implementation process of this method.A kind of multi-angle method for detecting human face; This method at first adopts the mode of maximum stable extremal region feature detection that image is detected to the image of input, obtains the maximum stable extremal region of this image, and normalizing is carried out in the zone that obtains; Obtain the rectangle image to be detected of candidate face; Adopt the LBP algorithm to extract the LBP characteristic image to be detected then, and adopt neural network RBF function as the human face posture sorter according to the LBP characteristic image, with image according to the attitude of people's face be divided into a left side, just, right three sub-category; Calculate through the image of continuous Adaboost algorithm again each attitude subclass; Calculate three face characteristics under the attitude subclass, at last the result of calculation under three attitude subclasses is merged, detect the people's face in the image.

q (i) = \frac{| Q_{i + Δ} - Q_{i - Δ} |}{| Q_{i} |}

Adopt in the above-mentioned steps (4) maximum stable extremal region that obtains is mapped to the fixedly mode of the image of MxN size through bilinear interpolation, image is carried out normalization; Wherein, M is a picture altitude, and N is a picture traverse.

In the present technique scheme, the continuous Adaboost algorithm that the image of each attitude subclass of the image of dividing three attitude subclasses after the normalization is calculated is:

3. upgrade sample weights

w_{t + 1} (x_{i}, y_{i}) = \frac{w_{t} (x_{i}, y_{i}) \exp (- α_{t} y_{i} h_{t} (x_{i}))}{Z_{t}}

Z _tBe sample weights normalization constant, make

4. iteration finishes, and the final cascade classifier that forms is:

H (x) = sign (Σ_{t = 1}^{T} α_{t} h_{t} (x)) .

f (R_{x}) = Σ_{t = 1}^{T} α_{t} h_{t} (x);

If it is overlapping that three rectangles all have each other, then testing result is R={argmax (f (R ₁), f (R ₂), f (R ₂)).

This method is when obtaining maximum stable extremal region, and the binaryzation standard of employing is a 0-255 gray-scale value, and binaryzation is 256 times altogether, generally when practical operation, controls through a step delta; Like threshold value is the 00+ Δ, the 0+2 Δ ..., image is carried out repeatedly binaryzation, find a stable binary-state threshold.

In this method, adopt the mode of LBP operator to divide attitude, do, use each pixel in the image and radius thereof joint distribution T=t (g as P pixel on the annular neighborhood of R to the LBP feature extracting method of image to image _c, g ₀..., g _P-1) image texture features, wherein g described _cThe gray-scale value of expression local neighborhood central point, g _p(p=0,1 ..., P-1) corresponding radius is the gray-scale value of P Along ent on the annulus of R, it is different that (P R) makes up, and the LBP operator is also inequality, with reference to Fig. 2,3,4,3 kind of different LBP operator.Wherein, R=1 among Fig. 2, P=4; R=1 among Fig. 2, P=8; R=2 among Fig. 3, P=16.

In order to realize the unchangeability of this textural characteristics, with the gray-scale value g of P Along ent on the annular neighborhood to gray scale _p(p=0,1 ..., P-1) deduct the gray-scale value g of central point _c, joint distribution T is converted into

T＝t(g _c，g ₀-g _c，g ₁-g _c，…，g _P-1-g _c) I

g _cAnd g _pSeparate, formula I is approximate to be decomposed into

T≈t(g _c)t(g ₀-g _c，g ₁-g _c，…，g _P-1-g _c) II

In formula II, t (g _c) intensity profile of entire image has been described, to the not influence of local grain characteristic distribution of image, therefore, image texture characteristic can be described through the joint distribution of difference, promptly

T≈t(g ₀-g _c，g ₁-g _c，…，g _P-1-g _c) III

When the illumination generation additivity of image changes, generally can not change the relative size of grey scale pixel value on the center pixel neighborhood annular, i.e. g with it _p-g _cThe influence that not changed by the illumination additivity, thereby, can replace concrete numerical value to describe the texture of image with the sign function of center pixel and neighborhood territory pixel difference, promptly

T≈t(s(g ₀-g _c)，s(g ₁-g _c)，…，s(g _P-1-g _c)) IV

Among the formula IV: s is a sign function

s (x) = \{\begin{matrix} 1, x &GreaterEqual; 0 \\ 0, x < 0 \end{matrix}

The result that joint distribution T is obtained has constituted one 0/1 sequence by the particular order ordering of pixel on the annular neighborhood, by counterclockwise, is that initial pixel begins to calculate with the right neighborhood territory pixel of central pixel point, through giving each s (g in the present embodiment _p-g _c) give the binomial factor 2 ^p, can the local space texture structure of pixel be expressed as a unique decimal number, this decimal number is called as LBP P; The R number; This also is the reason that this texture operator is called as local binary pattern (Local Binary Pattern), LBPP, and the R number can pass through computes

{LBP}_{P, R} = Σ_{p = 0}^{P - 1} s (g_{p} - g_{c}) 2^{p}

Concrete LBP textural characteristics computation process describes with reference to Fig. 5, (among the figure, P=8, R=1).

With Fig. 5 left side template thresholding, each neighborhood territory pixel point and center pixel (131) are made comparisons, put 1 greater than 0; Putting 0 less than 0, obtain 0/1 form in centre position, is initial by counterclockwise sequential configuration 0/1 sequence (10100101) with the lower right corner; Calculate corresponding decimal number (165) at last; The LBP textural characteristics value of this pixel is exactly 165, and each pixel in the image is asked the LBP eigenwert, has just obtained the LBP textural characteristics figure of image.Because the LBP textural characteristics at image border place receives neighbourhood effect less, so this paper has kept the original pixels gray-scale value for the pixel of image border.

The LBP operator characteristics of image is all extracted finish after, the present invention adopts neural network RBF as the human face posture sorter again, is reference with the direction on vertical image plane, with the characteristic people's face after extracting according to a left side, just, right three attitudes divide, with [30 ^., 30 ^.] be positive attitude, [30 ^., 90 ^.] be right attitude, [90 ^.,-30 ^.] be left attitude, after having divided the attitude subclass, calculate through image again, and the image of three attitude subclasses is wanted to merge each attitude subclass, finally detect the facial image of image to be detected.

The present invention is directed to method for detecting human face in the visible images and can run into the problem of illumination conversion and view transformation usually, introduce the problem that the MSER characteristic is alleviated illumination conversion and view transformation.At first in image, extract the MSER zone, MSER has certain unchangeability to the conversion at visual angle and the conversion of illumination, can alleviate omission and false-alarm that this two aspects factor causes; On normalized MSER, extract the LBP characteristic and utilize colourful attitude human-face detector to detect; This method has been alleviated the illumination problem on the one hand; This method has been got rid of a large amount of non-facely when extracting MSER on the other hand, therefore can improve the speed that people's face detects greatly.

The above; Be preferable case study on implementation of the present invention; Be not that the present invention is done any restriction, every technical spirit changes any simple modification, change and the equivalent structure that above embodiment did according to the present invention, all still belongs in the protection domain of technical scheme of the present invention.

Claims

1. a multi-angle method for detecting human face is characterized in that, at first adopts the mode of maximum stable extremal region feature detection that image is detected to the image of importing; Obtain the maximum stable extremal region of this image; And the zone that obtains carried out normalizing, and obtain the rectangle image to be detected of candidate face, adopt the LBP algorithm to extract the LBP characteristic image to be detected then; And adopt neural network RBF function as the human face posture sorter according to the LBP characteristic image; With image according to the attitude of people's face be divided into a left side, just, right three sub-category, calculate through the image of continuous Adaboost algorithm again each attitude subclass, calculate three face characteristics under the attitude subclass; At last the result of calculation under three attitude subclasses is merged, detect the people's face in the image.

2. multi-angle method for detecting human face according to claim 1; It is characterized in that; Said maximum stable extremal region feature detection mode is meant in the image of input; Distinguish image interior pixels gray scale all greater than or all less than the topography of its surrounding pixel gray scale zone, the steps include:

q (i) = \frac{| Q_{i + Δ} - Q_{i - Δ} |}{| Q_{i} |}

3. multi-angle method for detecting human face according to claim 2 is characterized in that, adopts in the said step (4) maximum stable extremal region that obtains is mapped to the fixedly mode of the image of MxN size through bilinear interpolation, and image is carried out normalization; Wherein, M is a picture altitude, and N is a picture traverse.

4. multi-angle method for detecting human face according to claim 1 is characterized in that, the continuous Adaboost algorithm that the image of each attitude is calculated is:

3. upgrade sample weights

w_{t + 1} (x_{i}, y_{i}) = \frac{w_{t} (x_{i}, y_{i}) \exp (- α_{t} y_{i} h_{t} (x_{i}))}{Z_{t}}

Z _tBe sample weights normalization constant, make

4. iteration finishes, and the final cascade classifier that forms is:

H (x) = sign (Σ_{t = 1}^{T} α_{t} h_{t} (x)) .

5. the multi-angle method for detecting human face of commenting according to claim 1 is characterized in that, adopts based on maximization criterion (maximum rule) fusion method the rectangle testing result under three attitude subclasses that obtain is merged,

f (R_{x}) = Σ_{t = 1}^{T} α_{t} h_{t} (x);

If it is overlapping that three rectangles all have each other, then testing result is R={arg max (f (R ₁), f (R ₂), f (R ₃)).