CN111523345A - Face real-time tracking system and method - Google Patents
Face real-time tracking system and method Download PDFInfo
- Publication number
- CN111523345A CN111523345A CN201910103409.9A CN201910103409A CN111523345A CN 111523345 A CN111523345 A CN 111523345A CN 201910103409 A CN201910103409 A CN 201910103409A CN 111523345 A CN111523345 A CN 111523345A
- Authority
- CN
- China
- Prior art keywords
- face
- feature point
- feature points
- order
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a real-time face tracking system and a real-time face tracking method. The face detection module is used for calling different face detection models for each frame of image, searching whether a face appears, recording the position of the appearing face and sending the corresponding position to the face feature point positioning module; the face feature point positioning module is used for positioning the coordinates of second-order feature points of the face and correcting the coordinates of each feature point; the face tracking module is used for tracking the face in the video to obtain the continuous spatial attitude of the face. The face real-time tracking system and the face real-time tracking method can improve the accuracy, processing speed and stability of real-time face information tracking.
Description
Technical Field
The invention belongs to the technical field of face tracking, relates to a face real-time tracking system, and particularly relates to a face real-time tracking system and method based on second-order functional gradient.
Background
The human face tracking mainly refers to determining the motion track of a human face in a continuous video sequence, and has gained wide attention and research in a plurality of fields such as computer vision, artificial intelligence and the like; and has been widely applied in video monitoring, robots, human-computer interaction, etc. With the explosive growth of mobile devices, more and more applications such as mobile payment, virtual makeup, facial beautification and self-timer are required to be used for face tracking, and the traditional face tracking is difficult to be applied to the occasions, so that intensive research needs to be performed by combining with the latest algorithm.
The traditional face tracking algorithm is applied to mobile equipment and mainly has the following two problems 1) that the computing resource cost is high and the face tracking algorithm is difficult to directly transplant to the mobile equipment. Mobile equipment such as a mobile phone and the like has weak computing power and less memory, while the traditional high-precision model has large computing amount and needs a large amount of memory, so that after the model is simplified, the precision is inevitably reduced, and accurate tracking is difficult; 2) the robustness is poor, namely for the situation that the face with a large lateral angle is partially shielded, the positioning has deviation, and the tracking cannot be carried out.
Early face tracking required modeling of the face. The Shape modeling method includes a deformable template (Deformabletemplate), a point distribution Model (Active Shape Model), a graph Model, and the like. The Appearance modeling method comprises global Appearance modeling and local Appearance modeling, wherein the global method comprises Active Appearance models, namely a generative Model and a Boosted Appearance Model, and the local Appearance modeling is used for modeling Appearance information of a local area and comprises a color Model, a projection Model, a side cutting line Model and the like. The modeling-based method is limited by the model and has low accuracy, and the simple model is mainly difficult to express some difficult factors in practical application, including illumination, shielding, variable postures and the like.
In the method, a regression model is used to directly learn a mapping function from the appearance of a human face to the shape of the human face (or parameters of a human face shape model), and further establish a corresponding relation from the appearance to the shape. And the method does not need complicated shape and appearance modeling and is easy to apply. Many comparative tests show that the method is particularly suitable for uncontrollable and uncooperative scenes, and the uncontrollable and uncooled scenes are the main application scenes of the mobile equipment. In addition, the face alignment method based on deep learning also has remarkable results. The deep learning and the shape regression framework can further improve the accuracy of the positioning model, and become one of the mainstream methods of the current feature positioning. However, the data model of the deep learning model is huge (often containing tens of millions of variables), so that the deep learning model is not suitable for mobile equipment and is not discussed below.
In view of the above, there is an urgent need to design a face tracking method to overcome the above-mentioned defects of the existing face tracking methods.
Disclosure of Invention
The invention provides a face real-time tracking system and a face real-time tracking method, which can improve the accuracy, processing speed and stability of real-time face information tracking.
In order to solve the technical problem, according to one aspect of the present invention, the following technical solutions are adopted:
a real-time face tracking system, the real-time face tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning module for positioning the coordinates (x) of the second-order feature points of the facei,yi) (ii) a The face feature point positioning module gradually generates a plurality of decision trees, each decision tree inputs the current coordinates of the second-order feature points, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) Value of (a), each feature point (x)i,yi) Gradient of (2) Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, the face detection module considers that a face does appear only if the reliability of a certain position is greater than 0.95.
As an embodiment of the present invention, the face detection module calls at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
As an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning algorithm to an area where each face is located;
b1, initializing the coordinates (x) of the N human face characteristic points according to the standard templatei,yi) The initial coordinates of each point being from an average face, i.e. labelTaking the faces of a plurality of people as samples, and taking the average value of the samples for each feature point;
b2, second-order functional definition and optimization solution:
constructing a decision tree with T leaf nodes; the input of the method is the current coordinate of the characteristic point, and the target is to reduce the value of the following second-order functional;
wherein, IjIs the leaf node where sample i is located, (dx)j,dyj) Is the value of the optimal solution at that leaf node; giIs the first derivative of the loss function at each feature point, hiIs the second derivative of the loss function at each feature point,is the optimal solution of each feature point, the second order functionalThe extreme value problem is converted into the optimal value calculation of the T leaf nodes;
b3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; namely, the correction of the second-order characteristic point comes from taking the minimum value of the second-order functional; for leaf node TjWherein | IjL samples, defined as follows:
b4, correcting the coordinates of each feature point:
if the leaf node where the sample i is located is TjAnd then:
xi=xi+ηdxj,yi=yi+ηdxjwherein η is a set constant and is a common value;
and B5, continuously generating a plurality of decision trees, if the correction (sigma | dx |, sigma | dy |) between two continuous decision trees is smaller than a set threshold value, converging the module, recording the characteristic points, and sending the characteristic points to the face tracking module, otherwise, continuing the step B2.
As an embodiment of the present invention, the face tracking module is configured to calculate a spatial pose of a face according to second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
A real-time face tracking system, the real-time face tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
A real-time face tracking method comprises the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a step of positioning the face characteristic points, namely positioning the coordinates (x) of the second-order characteristic points of the facei,yi) (ii) a Generating a plurality of decision trees step by step, wherein the current coordinate of the input second-order characteristic point of each decision tree aims to reduce the value of the second-order functional, namely to improve the coordinate (x) along the direction of the optimized gradienti,yi) I.e. each feature point (x)i,yi) Gradient of (2)Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi;
A human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, in the face detection step, for each frame of image of the video, different face detection models are called, and each position of the image is traversed; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In the face feature point positioning step, a second-order feature point positioning process is called for each face region;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the characteristic points, and reducing the value of the following second-order functional;
wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold value, the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
A real-time face tracking method comprises the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step, namely positioning coordinates of second-order feature points of the face;
a human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, a new second-order functional gradient boost algorithm (gradient on second functional target) is adopted to realize real-time face tracking on a mobile device. The specific scheme is as follows:
second order functional gradient
For a data set with N samples and M characteristics, a Gradient Boosting algorithm is trained to obtain a series of addable functions { f }1,f2,f3,…To predict the output:
during the training process, the target value y of each sample is knowniThen predict the valueAnd a target value yiDifference between them using loss functionTo characterize. The training process of gradient boosting is to reduce the loss along the current gradient direction. On the basis of the step (t-1) of the previous step, the functional of the step (t) is defined as follows:
setting decision tree q to each sample xiMapping to leaf node TjI.e. q (x)i) J, the leaf node having a value ωjThen (1) can be simplified as follows:
if the high-order margin is omitted, the functional (2) is further expanded as follows
Let the mapping corresponding to the decision tree be Ij={i|q(xi) The corresponding functional expansion is as follows
Extremum for the above equation, at each leaf node:
the corresponding extreme values are as follows:
if the LOSS function is defined as a second order form,
then h isi1, so each leaf node takes the value ωjIs actually giIs measured. Namely, it is
And (3) a gradient lifting algorithm based on sparse random forest.
Combining a plurality of decision trees q together, and introducing random selection to improve the accuracy and generalization capability to form a random forest. Multistage regression based on random forests is the mainstream algorithm at present. The algorithm can quickly position the feature points of each face and then judge the pose and the motion track of the face according to the feature points. The method has high accuracy, the speed of positioning the characteristic points is high, and hundreds of frames of pictures can be processed on a PC per second. The mobile equipment has weak computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved.
But the standard model requires hundreds of megabytes of space. Simple simplifications (e.g., reducing the number of trees, or reducing the number of regression stages) only reduce the accuracy. As shown in fig. 2, by observing and analyzing the characteristics of the random forest model, it can be seen that the feature vectors have sparsity. I.e. in the eigenvectors corresponding to each node, often only a few quantities are high, while the other components are not important. Therefore, aiming at each node, a sparse representation algorithm is adopted, and the compression rate of more than 10 times is expected to be realized, so that the model is stored on a common mobile phone, and the precision is not influenced. Meanwhile, the speed is also improved due to the reduction of the model.
The invention has the beneficial effects that: the face real-time tracking system and the face real-time tracking method can improve the accuracy, processing speed and stability of real-time face information tracking.
The invention adopts an innovative second-order functional gradient boost algorithm (gradientboosting on second functional target) on the basis of a regression model so as to realize real-time face tracking on mobile equipment. The invention adopts sparse random forest regression algorithm to realize the processing speed of 20 frames per second on a common mobile phone, thereby achieving the effect of real-time tracking. The invention uses the continuous flow pattern model, regards the human face sequence in motion as continuous flow pattern transformation, has strong robustness, and can accurately position the human face under the difficult conditions of large side angle, partial shielding and the like.
The invention has the following advantages: 1) and is more accurate. The second-order functional gradient is adopted, a strict mathematical theoretical basis is provided, and the accuracy of the whole algorithm is guaranteed. 2) And faster. And a sparse random forest regression algorithm is adopted, and the processing speed of 20 frames per second is realized on a common mobile phone, so that the real-time tracking effect is achieved. 3) Is more stable. The continuous flow pattern model treats the moving human face sequence as continuous flow pattern transformation, has strong robustness, and can accurately position the human face under the difficult conditions of large side angle, partial shielding and the like.
Drawings
Fig. 1 is a schematic diagram illustrating a real-time face tracking system according to an embodiment of the present invention.
Fig. 2 is a flowchart of a real-time face tracking method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of face detection performed by a conventional random forest-based multistage regression algorithm.
Fig. 4 is a schematic diagram of detecting feature vectors based on a random forest model in the prior art.
Fig. 5 is a schematic diagram of a conventional face detection effect (including a front face and a side face).
FIG. 6 is a schematic diagram of the present invention using a continuous flow model for face detection.
Fig. 7 is a schematic diagram of a face detection performed by a conventional 68-dot mark injection molding method.
FIG. 8 is a schematic diagram of the present invention using quadratic adaptive sampling feature points.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For a further understanding of the invention, reference will now be made to the preferred embodiments of the invention by way of example, and it is to be understood that the description is intended to further illustrate features and advantages of the invention, and not to limit the scope of the claims.
The description in this section is for several exemplary embodiments only, and the present invention is not limited only to the scope of the embodiments described. It is within the scope of the present disclosure and protection that the same or similar prior art means and some features of the embodiments may be interchanged.
The invention discloses a human face real-time tracking system, and FIG. 1 is a schematic diagram of the human face real-time tracking system in one embodiment of the invention; referring to fig. 1, in an embodiment of the present invention, the real-time face tracking system includes: the system comprises a video frame image acquisition module 1, a face detection module 2, a face feature point positioning module 3 and a face tracking module 4.
The video frame image acquiring module 1 is used for acquiring each frame image of the video.
The face detection module 2 is used for calling different face detection models for each frame of image to find whether a face appears; these models can traverse each position of the image, determine if a face is present at that position, and return a degree of reliability. The module integrates a plurality of models, so that the reliability of face recognition is further improved; for a certain position, the face is considered to be really appeared only if the reliability is higher than a set value; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
the face feature point positioning module 3 is used for positioning the coordinates (x) of the second-order feature points of the facei,yi). In an embodiment of the present invention, the facial feature point location module generates multiple decision trees step by step (in an embodiment of the present invention, the facial feature point location module generates multiple decision trees step by step based on a gradient lifting algorithm), the current coordinates of the input second-order feature points of each decision tree are input, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) A value of (d); each feature point (x)i,yi) Gradient of (2)Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant.
The face tracking module 4 is used for tracking a face in a video to obtain a continuous spatial attitude of the face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
In an embodiment of the present invention, the face detection module considers that a face does appear only if the reliability of a certain position is greater than 0.95.
In an embodiment of the present invention, the face detection module calls different face detection models (such as MTCNN, YOLOv3, etc.) for each frame of image of the video, and traverses each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning process for each area where a face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, which inputs the current coordinates of the feature points and aims to reduce the value of the following second-order functional
Wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiη is a set constant, and in one embodiment of the present invention, η is 0.01.
Step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold (for example, one thousandth of the coordinate value), the module converges, records the feature points, and sends the feature points to the face tracking module, otherwise, the step B2 is continued.
In an embodiment of the present invention, the face tracking module is configured to calculate a spatial pose of a face according to a second-order feature point of the face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
The invention also discloses a face real-time tracking method, and FIG. 2 is a flow chart of the face real-time tracking method in an embodiment of the invention; referring to fig. 2, in an embodiment of the present invention, the method for real-time tracking a human face includes:
step S1, a video frame image obtaining step, wherein each frame image of the video is obtained;
step S2, calling different face detection models for each frame of image to find out whether a face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position, and return a reliability. The module integrates a plurality of models, so that the reliability of face recognition is further improved; for a certain position, the face is considered to be really appeared only if the reliability is higher than a set value; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
step S3, locating the coordinates (x) of the second-order feature points of the facei,yi). In an embodiment of the invention, based on a gradient lifting algorithm, a plurality of decision trees are generated step by step, the current coordinate of the input second-order feature point of each decision tree aims at reducing the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) A value of (d); each feature point (x)i,yi) Gradient of (2)Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi。
Step S4, a face tracking step, wherein the face in the video is tracked to obtain continuous space postures of the face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
In an embodiment of the present invention, in the face detection step, different face detection models are called for each frame of image of the video, and each position of the image is traversed; giving a confidence measure for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In an embodiment of the present invention, in the step of locating the face feature points, a second-order feature point locating process is invoked for an area where each face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, which inputs the current coordinates of the feature points and aims to reduce the value of the following second-order functional
Wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold (for example, one thousandth of the coordinate value), the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
In an embodiment of the invention, the face real-time tracking system and method adopt a sparse random forest regression algorithm. Multistage regression based on random forests is the mainstream algorithm at present, and the algorithm is shown in fig. 3. The algorithm can quickly position the feature points of each face and then judge the pose and the motion track of the face according to the feature points. The method has high accuracy, the speed of positioning the characteristic points is high, and hundreds of frames of pictures can be processed on a PC per second. The mobile equipment has weak computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved. But this is based on its well-trained big data model, which requires hundreds of megabases for the standard model. Simple simplifications (e.g., reducing the number of trees, or reducing the number of regression stages) only reduce the accuracy. As shown in fig. 4, by observing and analyzing the characteristics of the random forest model, it can be seen that the feature vectors have sparsity. I.e. in the eigenvectors corresponding to each node, often only a few quantities are high, while the other components are not important. Therefore, aiming at each node, a sparse representation algorithm is adopted, and the compression rate of more than 10 times is expected to be realized, so that the model is stored on a common mobile phone, and the precision is not influenced. Meanwhile, the speed is also improved due to the reduction of the model.
In an embodiment of the invention, the real-time face tracking system and method adopts a continuous flow model. When the feature points of the front face are well positioned, but when the deflection angle of the face is larger and larger, even only a half face exists, many feature points disappear, or the corresponding positions of the feature points are unknown (as shown in fig. 5). These missing feature points cannot be located, but rather play a disturbing role. In one embodiment of the present invention, a continuous flow pattern transformation model is used to deal with this problem, i.e., a moving face sequence is considered to be located in a continuous flow pattern space. Fig. 6 is a schematic diagram illustrating the recognition of real-time face tracking according to an embodiment of the present invention, and as shown in fig. 6, the reference pose of the face is determined by determining the spatial transformations. And determining which feature points are visible according to the posture at the moment, and for the disappeared feature points, the feature points are not used for positioning. The scheme has strong robustness, and is expected to accurately position the face under the difficult conditions of large side angle, partial shielding and the like.
In an embodiment of the invention, the real-time face tracking system and method adopt secondary self-adaptive sampling feature points. The conventional method is based on fixed face feature points, i.e. the number and positions of the feature points are fixed, such as the 68-point standard model shown in fig. 7. We encrypt the feature points automatically based on the standard feature points, as shown in fig. 8. Various constraint relationships between these encrypted points and the reference points to further improve the accuracy of the location. Moreover, the points are properly encrypted according to the reference points, the calculation amount is less, and real-time tracking can still be realized.
In summary, the face real-time tracking system and method provided by the invention can improve the accuracy, processing speed and stability of real-time tracking of face information.
The description and applications of the invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those skilled in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.
Claims (10)
1. A real-time face tracking system, comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning module for positioning the coordinates (x) of the second-order feature points of the facei,yi) (ii) a The face feature point positioning module gradually generates a plurality of decision trees, each decision tree inputs the current coordinates of the second-order feature points, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) Value of (a), each feature point (x)i,yi) Gradient of (2)Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
2. The real-time face tracking system of claim 1, wherein:
the face detection module considers that the face really appears only if the reliability of a certain position is greater than 0.95.
3. The real-time face tracking system of claim 1, wherein:
the face detection module calls at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
4. The real-time face tracking system of claim 1, wherein:
the face feature point positioning module calls a second-order feature point positioning algorithm for the area where each face is located;
b1, initializing the coordinates (x) of the N human face characteristic points according to the standard templatei,yi) The initial coordinates of each point are from an average face, namely, faces of a plurality of people are marked as samples, and the average value of the samples is taken for each feature point;
b2, second-order functional definition and optimization solution:
constructing a decision tree with T leaf nodes; the input of the method is the current coordinate of the characteristic point, and the target is to reduce the value of the following second-order functional;
wherein, IjIs the leaf node where sample i is located, (dx)j,dyj) Is the value of the optimal solution at that leaf node; giIs the first derivative of the loss function at each feature point, hiIs the second derivative of the loss function at each feature point,is the optimal solution of each feature point, the second order functionalThe extreme value problem is converted into the optimal value calculation of the T leaf nodes;
b3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; namely, the correction of the second-order characteristic point comes from taking the minimum value of the second-order functional; for leaf node TjWherein | IjL samples, defined as follows:
b4, correcting the coordinates of each feature point:
if the leaf node where the sample i is located is TjAnd then:
xi=xi+ηdxj,yi=yi+ηdxjwherein η is a set constant and is a common value;
and B5, continuously generating a plurality of decision trees, if the correction (sigma | dx |, sigma | dy |) between two continuous decision trees is smaller than a set threshold value, converging the module, recording the characteristic points, and sending the characteristic points to the face tracking module, otherwise, continuing the step B2.
5. The real-time face tracking system of claim 1, wherein:
the human face tracking module is used for calculating the spatial attitude of the human face according to the second-order feature points of the human face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
6. A real-time face tracking system, comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
7. A real-time face tracking method is characterized by comprising the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a step of positioning the face characteristic points, namely positioning the coordinates (x) of the second-order characteristic points of the facei,yi) (ii) a Generating a plurality of decision trees step by step, wherein the current coordinate of the input second-order characteristic point of each decision tree aims to reduce the value of the second-order functional, namely to improve the coordinate (x) along the direction of the optimized gradienti,yi) I.e. each feature point (x)i,yi) Gradient of (2)Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi;
A human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
8. The real-time face tracking method according to claim 7, characterized in that:
in the step of face detection, different face detection models are called for each frame of image of the video, and each position of the image is traversed; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R1+ω2R2+……;
wherein R is1,R2… confidence returned for different face detection models, respectivelyDegree, omega1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
9. The real-time face tracking method according to claim 7, characterized in that:
in the step of positioning the face feature points, a second-order feature point positioning process is called for the area where each face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the characteristic points, and reducing the value of the following second-order functional;
wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold value, the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
10. A real-time face tracking method is characterized by comprising the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step, namely positioning coordinates of second-order feature points of the face;
a human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910103409.9A CN111523345B (en) | 2019-02-01 | 2019-02-01 | Real-time human face tracking system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910103409.9A CN111523345B (en) | 2019-02-01 | 2019-02-01 | Real-time human face tracking system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111523345A true CN111523345A (en) | 2020-08-11 |
CN111523345B CN111523345B (en) | 2023-06-23 |
Family
ID=71899996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910103409.9A Active CN111523345B (en) | 2019-02-01 | 2019-02-01 | Real-time human face tracking system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111523345B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140864A (en) * | 2022-01-29 | 2022-03-04 | 深圳市中讯网联科技有限公司 | Trajectory tracking method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101344922A (en) * | 2008-08-27 | 2009-01-14 | 华为技术有限公司 | Human face detection method and device |
CN103310204A (en) * | 2013-06-28 | 2013-09-18 | 中国科学院自动化研究所 | Feature and model mutual matching face tracking method based on increment principal component analysis |
CN104182718A (en) * | 2013-05-21 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Human face feature point positioning method and device thereof |
WO2014205768A1 (en) * | 2013-06-28 | 2014-12-31 | 中国科学院自动化研究所 | Feature and model mutual matching face tracking method based on increment principal component analysis |
-
2019
- 2019-02-01 CN CN201910103409.9A patent/CN111523345B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101344922A (en) * | 2008-08-27 | 2009-01-14 | 华为技术有限公司 | Human face detection method and device |
CN104182718A (en) * | 2013-05-21 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Human face feature point positioning method and device thereof |
CN103310204A (en) * | 2013-06-28 | 2013-09-18 | 中国科学院自动化研究所 | Feature and model mutual matching face tracking method based on increment principal component analysis |
WO2014205768A1 (en) * | 2013-06-28 | 2014-12-31 | 中国科学院自动化研究所 | Feature and model mutual matching face tracking method based on increment principal component analysis |
Non-Patent Citations (1)
Title |
---|
战江涛;刘强;柴春雷;: "基于三维模型与Gabor小波的人脸特征点跟踪方法" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140864A (en) * | 2022-01-29 | 2022-03-04 | 深圳市中讯网联科技有限公司 | Trajectory tracking method and device, storage medium and electronic equipment |
CN114140864B (en) * | 2022-01-29 | 2022-07-05 | 深圳市中讯网联科技有限公司 | Trajectory tracking method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111523345B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022217840A1 (en) | Method for high-precision multi-target tracking against complex background | |
Rikert et al. | Gaze estimation using morphable models | |
CN109472198B (en) | Gesture robust video smiling face recognition method | |
CN108363973B (en) | Unconstrained 3D expression migration method | |
CN108171133B (en) | Dynamic gesture recognition method based on characteristic covariance matrix | |
CN108229416A (en) | Robot SLAM methods based on semantic segmentation technology | |
US20230230305A1 (en) | Online streamer avatar generation method and apparatus | |
CN116309731A (en) | Multi-target dynamic tracking method based on self-adaptive Kalman filtering | |
CN110276785A (en) | One kind is anti-to block infrared object tracking method | |
Nunes et al. | Robust event-based vision model estimation by dispersion minimisation | |
Feng et al. | Kalman filter for spatial-temporal regularized correlation filters | |
CN111402303A (en) | Target tracking architecture based on KFSTRCF | |
CN105608710A (en) | Non-rigid face detection and tracking positioning method | |
CN111860243A (en) | Robot action sequence generation method | |
US20050185834A1 (en) | Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras | |
CN113129332A (en) | Method and apparatus for performing target object tracking | |
CN110598595A (en) | Multi-attribute face generation algorithm based on face key points and postures | |
CN111523345A (en) | Face real-time tracking system and method | |
CN107194947B (en) | Target tracking method with self-adaptive self-correction function | |
Wang et al. | A Visual SLAM Algorithm Based on Image Semantic Segmentation in Dynamic Environment | |
CN115496859A (en) | Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning | |
Cordea et al. | 3-D head pose recovery for interactive virtual reality avatars | |
CN112507940A (en) | Skeleton action recognition method based on difference guidance representation learning network | |
CN113674323A (en) | Visual target tracking algorithm based on multidimensional confidence evaluation learning | |
CN113762149A (en) | Feature fusion human behavior recognition system and method based on segmentation attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |