Nothing Special   »   [go: up one dir, main page]

CN111523345A - Face real-time tracking system and method - Google Patents

Face real-time tracking system and method Download PDF

Info

Publication number
CN111523345A
CN111523345A CN201910103409.9A CN201910103409A CN111523345A CN 111523345 A CN111523345 A CN 111523345A CN 201910103409 A CN201910103409 A CN 201910103409A CN 111523345 A CN111523345 A CN 111523345A
Authority
CN
China
Prior art keywords
face
feature point
feature points
order
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910103409.9A
Other languages
Chinese (zh)
Other versions
CN111523345B (en
Inventor
陈英时
耿敢超
左建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kankan Intelligent Technology Co ltd
Original Assignee
Shanghai Kankan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kankan Intelligent Technology Co ltd filed Critical Shanghai Kankan Intelligent Technology Co ltd
Priority to CN201910103409.9A priority Critical patent/CN111523345B/en
Publication of CN111523345A publication Critical patent/CN111523345A/en
Application granted granted Critical
Publication of CN111523345B publication Critical patent/CN111523345B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time face tracking system and a real-time face tracking method. The face detection module is used for calling different face detection models for each frame of image, searching whether a face appears, recording the position of the appearing face and sending the corresponding position to the face feature point positioning module; the face feature point positioning module is used for positioning the coordinates of second-order feature points of the face and correcting the coordinates of each feature point; the face tracking module is used for tracking the face in the video to obtain the continuous spatial attitude of the face. The face real-time tracking system and the face real-time tracking method can improve the accuracy, processing speed and stability of real-time face information tracking.

Description

Face real-time tracking system and method
Technical Field
The invention belongs to the technical field of face tracking, relates to a face real-time tracking system, and particularly relates to a face real-time tracking system and method based on second-order functional gradient.
Background
The human face tracking mainly refers to determining the motion track of a human face in a continuous video sequence, and has gained wide attention and research in a plurality of fields such as computer vision, artificial intelligence and the like; and has been widely applied in video monitoring, robots, human-computer interaction, etc. With the explosive growth of mobile devices, more and more applications such as mobile payment, virtual makeup, facial beautification and self-timer are required to be used for face tracking, and the traditional face tracking is difficult to be applied to the occasions, so that intensive research needs to be performed by combining with the latest algorithm.
The traditional face tracking algorithm is applied to mobile equipment and mainly has the following two problems 1) that the computing resource cost is high and the face tracking algorithm is difficult to directly transplant to the mobile equipment. Mobile equipment such as a mobile phone and the like has weak computing power and less memory, while the traditional high-precision model has large computing amount and needs a large amount of memory, so that after the model is simplified, the precision is inevitably reduced, and accurate tracking is difficult; 2) the robustness is poor, namely for the situation that the face with a large lateral angle is partially shielded, the positioning has deviation, and the tracking cannot be carried out.
Early face tracking required modeling of the face. The Shape modeling method includes a deformable template (Deformabletemplate), a point distribution Model (Active Shape Model), a graph Model, and the like. The Appearance modeling method comprises global Appearance modeling and local Appearance modeling, wherein the global method comprises Active Appearance models, namely a generative Model and a Boosted Appearance Model, and the local Appearance modeling is used for modeling Appearance information of a local area and comprises a color Model, a projection Model, a side cutting line Model and the like. The modeling-based method is limited by the model and has low accuracy, and the simple model is mainly difficult to express some difficult factors in practical application, including illumination, shielding, variable postures and the like.
In the method, a regression model is used to directly learn a mapping function from the appearance of a human face to the shape of the human face (or parameters of a human face shape model), and further establish a corresponding relation from the appearance to the shape. And the method does not need complicated shape and appearance modeling and is easy to apply. Many comparative tests show that the method is particularly suitable for uncontrollable and uncooperative scenes, and the uncontrollable and uncooled scenes are the main application scenes of the mobile equipment. In addition, the face alignment method based on deep learning also has remarkable results. The deep learning and the shape regression framework can further improve the accuracy of the positioning model, and become one of the mainstream methods of the current feature positioning. However, the data model of the deep learning model is huge (often containing tens of millions of variables), so that the deep learning model is not suitable for mobile equipment and is not discussed below.
In view of the above, there is an urgent need to design a face tracking method to overcome the above-mentioned defects of the existing face tracking methods.
Disclosure of Invention
The invention provides a face real-time tracking system and a face real-time tracking method, which can improve the accuracy, processing speed and stability of real-time face information tracking.
In order to solve the technical problem, according to one aspect of the present invention, the following technical solutions are adopted:
a real-time face tracking system, the real-time face tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning module for positioning the coordinates (x) of the second-order feature points of the facei,yi) (ii) a The face feature point positioning module gradually generates a plurality of decision trees, each decision tree inputs the current coordinates of the second-order feature points, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) Value of (a), each feature point (x)i,yi) Gradient of (2)
Figure BDA0001966164310000021
Figure BDA0001966164310000022
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, the face detection module considers that a face does appear only if the reliability of a certain position is greater than 0.95.
As an embodiment of the present invention, the face detection module calls at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
As an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning algorithm to an area where each face is located;
b1, initializing the coordinates (x) of the N human face characteristic points according to the standard templatei,yi) The initial coordinates of each point being from an average face, i.e. labelTaking the faces of a plurality of people as samples, and taking the average value of the samples for each feature point;
b2, second-order functional definition and optimization solution:
constructing a decision tree with T leaf nodes; the input of the method is the current coordinate of the characteristic point, and the target is to reduce the value of the following second-order functional;
Figure BDA0001966164310000031
wherein, IjIs the leaf node where sample i is located, (dx)j,dyj) Is the value of the optimal solution at that leaf node; giIs the first derivative of the loss function at each feature point, hiIs the second derivative of the loss function at each feature point,
Figure BDA0001966164310000032
is the optimal solution of each feature point, the second order functional
Figure BDA0001966164310000033
The extreme value problem is converted into the optimal value calculation of the T leaf nodes;
b3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; namely, the correction of the second-order characteristic point comes from taking the minimum value of the second-order functional; for leaf node TjWherein | IjL samples, defined as follows:
Figure BDA0001966164310000034
b4, correcting the coordinates of each feature point:
if the leaf node where the sample i is located is TjAnd then:
xi=xi+ηdxj,yi=yi+ηdxjwherein η is a set constant and is a common value;
and B5, continuously generating a plurality of decision trees, if the correction (sigma | dx |, sigma | dy |) between two continuous decision trees is smaller than a set threshold value, converging the module, recording the characteristic points, and sending the characteristic points to the face tracking module, otherwise, continuing the step B2.
As an embodiment of the present invention, the face tracking module is configured to calculate a spatial pose of a face according to second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
A real-time face tracking system, the real-time face tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
A real-time face tracking method comprises the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a step of positioning the face characteristic points, namely positioning the coordinates (x) of the second-order characteristic points of the facei,yi) (ii) a Generating a plurality of decision trees step by step, wherein the current coordinate of the input second-order characteristic point of each decision tree aims to reduce the value of the second-order functional, namely to improve the coordinate (x) along the direction of the optimized gradienti,yi) I.e. each feature point (x)i,yi) Gradient of (2)
Figure BDA0001966164310000041
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi
A human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, in the face detection step, for each frame of image of the video, different face detection models are called, and each position of the image is traversed; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In the face feature point positioning step, a second-order feature point positioning process is called for each face region;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the characteristic points, and reducing the value of the following second-order functional;
Figure BDA0001966164310000051
wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,
Figure BDA0001966164310000052
is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
Figure BDA0001966164310000053
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold value, the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
A real-time face tracking method comprises the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step, namely positioning coordinates of second-order feature points of the face;
a human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
As an embodiment of the present invention, a new second-order functional gradient boost algorithm (gradient on second functional target) is adopted to realize real-time face tracking on a mobile device. The specific scheme is as follows:
second order functional gradient
For a data set with N samples and M characteristics, a Gradient Boosting algorithm is trained to obtain a series of addable functions { f }1,f2,f3,…To predict the output:
Figure BDA0001966164310000061
during the training process, the target value y of each sample is knowniThen predict the value
Figure BDA0001966164310000062
And a target value yiDifference between them using loss function
Figure BDA0001966164310000063
To characterize. The training process of gradient boosting is to reduce the loss along the current gradient direction. On the basis of the step (t-1) of the previous step, the functional of the step (t) is defined as follows:
Figure BDA0001966164310000064
setting decision tree q to each sample xiMapping to leaf node TjI.e. q (x)i) J, the leaf node having a value ωjThen (1) can be simplified as follows:
Figure BDA0001966164310000065
if the high-order margin is omitted, the functional (2) is further expanded as follows
Figure BDA0001966164310000066
Wherein g isi,hiAre respectively a loss function
Figure BDA0001966164310000067
First and second derivatives of (i.e. of)
Figure BDA0001966164310000068
Due to the fact that
Figure BDA0001966164310000069
Is a constant, and the second-order functional (4) is further simplified into
Figure BDA00019661643100000610
Let the mapping corresponding to the decision tree be Ij={i|q(xi) The corresponding functional expansion is as follows
Figure BDA00019661643100000611
Extremum for the above equation, at each leaf node:
Figure BDA0001966164310000071
the corresponding extreme values are as follows:
Figure BDA0001966164310000072
if the LOSS function is defined as a second order form,
Figure BDA0001966164310000073
then h isi1, so each leaf node takes the value ωjIs actually giIs measured. Namely, it is
Figure BDA0001966164310000074
And (3) a gradient lifting algorithm based on sparse random forest.
Combining a plurality of decision trees q together, and introducing random selection to improve the accuracy and generalization capability to form a random forest. Multistage regression based on random forests is the mainstream algorithm at present. The algorithm can quickly position the feature points of each face and then judge the pose and the motion track of the face according to the feature points. The method has high accuracy, the speed of positioning the characteristic points is high, and hundreds of frames of pictures can be processed on a PC per second. The mobile equipment has weak computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved.
But the standard model requires hundreds of megabytes of space. Simple simplifications (e.g., reducing the number of trees, or reducing the number of regression stages) only reduce the accuracy. As shown in fig. 2, by observing and analyzing the characteristics of the random forest model, it can be seen that the feature vectors have sparsity. I.e. in the eigenvectors corresponding to each node, often only a few quantities are high, while the other components are not important. Therefore, aiming at each node, a sparse representation algorithm is adopted, and the compression rate of more than 10 times is expected to be realized, so that the model is stored on a common mobile phone, and the precision is not influenced. Meanwhile, the speed is also improved due to the reduction of the model.
The invention has the beneficial effects that: the face real-time tracking system and the face real-time tracking method can improve the accuracy, processing speed and stability of real-time face information tracking.
The invention adopts an innovative second-order functional gradient boost algorithm (gradientboosting on second functional target) on the basis of a regression model so as to realize real-time face tracking on mobile equipment. The invention adopts sparse random forest regression algorithm to realize the processing speed of 20 frames per second on a common mobile phone, thereby achieving the effect of real-time tracking. The invention uses the continuous flow pattern model, regards the human face sequence in motion as continuous flow pattern transformation, has strong robustness, and can accurately position the human face under the difficult conditions of large side angle, partial shielding and the like.
The invention has the following advantages: 1) and is more accurate. The second-order functional gradient is adopted, a strict mathematical theoretical basis is provided, and the accuracy of the whole algorithm is guaranteed. 2) And faster. And a sparse random forest regression algorithm is adopted, and the processing speed of 20 frames per second is realized on a common mobile phone, so that the real-time tracking effect is achieved. 3) Is more stable. The continuous flow pattern model treats the moving human face sequence as continuous flow pattern transformation, has strong robustness, and can accurately position the human face under the difficult conditions of large side angle, partial shielding and the like.
Drawings
Fig. 1 is a schematic diagram illustrating a real-time face tracking system according to an embodiment of the present invention.
Fig. 2 is a flowchart of a real-time face tracking method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of face detection performed by a conventional random forest-based multistage regression algorithm.
Fig. 4 is a schematic diagram of detecting feature vectors based on a random forest model in the prior art.
Fig. 5 is a schematic diagram of a conventional face detection effect (including a front face and a side face).
FIG. 6 is a schematic diagram of the present invention using a continuous flow model for face detection.
Fig. 7 is a schematic diagram of a face detection performed by a conventional 68-dot mark injection molding method.
FIG. 8 is a schematic diagram of the present invention using quadratic adaptive sampling feature points.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For a further understanding of the invention, reference will now be made to the preferred embodiments of the invention by way of example, and it is to be understood that the description is intended to further illustrate features and advantages of the invention, and not to limit the scope of the claims.
The description in this section is for several exemplary embodiments only, and the present invention is not limited only to the scope of the embodiments described. It is within the scope of the present disclosure and protection that the same or similar prior art means and some features of the embodiments may be interchanged.
The invention discloses a human face real-time tracking system, and FIG. 1 is a schematic diagram of the human face real-time tracking system in one embodiment of the invention; referring to fig. 1, in an embodiment of the present invention, the real-time face tracking system includes: the system comprises a video frame image acquisition module 1, a face detection module 2, a face feature point positioning module 3 and a face tracking module 4.
The video frame image acquiring module 1 is used for acquiring each frame image of the video.
The face detection module 2 is used for calling different face detection models for each frame of image to find whether a face appears; these models can traverse each position of the image, determine if a face is present at that position, and return a degree of reliability. The module integrates a plurality of models, so that the reliability of face recognition is further improved; for a certain position, the face is considered to be really appeared only if the reliability is higher than a set value; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
the face feature point positioning module 3 is used for positioning the coordinates (x) of the second-order feature points of the facei,yi). In an embodiment of the present invention, the facial feature point location module generates multiple decision trees step by step (in an embodiment of the present invention, the facial feature point location module generates multiple decision trees step by step based on a gradient lifting algorithm), the current coordinates of the input second-order feature points of each decision tree are input, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) A value of (d); each feature point (x)i,yi) Gradient of (2)
Figure BDA0001966164310000091
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant.
The face tracking module 4 is used for tracking a face in a video to obtain a continuous spatial attitude of the face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
In an embodiment of the present invention, the face detection module considers that a face does appear only if the reliability of a certain position is greater than 0.95.
In an embodiment of the present invention, the face detection module calls different face detection models (such as MTCNN, YOLOv3, etc.) for each frame of image of the video, and traverses each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning process for each area where a face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, which inputs the current coordinates of the feature points and aims to reduce the value of the following second-order functional
Figure BDA0001966164310000092
Wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,
Figure BDA0001966164310000093
is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
Figure BDA0001966164310000101
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiη is a set constant, and in one embodiment of the present invention, η is 0.01.
Step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold (for example, one thousandth of the coordinate value), the module converges, records the feature points, and sends the feature points to the face tracking module, otherwise, the step B2 is continued.
In an embodiment of the present invention, the face tracking module is configured to calculate a spatial pose of a face according to a second-order feature point of the face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
The invention also discloses a face real-time tracking method, and FIG. 2 is a flow chart of the face real-time tracking method in an embodiment of the invention; referring to fig. 2, in an embodiment of the present invention, the method for real-time tracking a human face includes:
step S1, a video frame image obtaining step, wherein each frame image of the video is obtained;
step S2, calling different face detection models for each frame of image to find out whether a face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position, and return a reliability. The module integrates a plurality of models, so that the reliability of face recognition is further improved; for a certain position, the face is considered to be really appeared only if the reliability is higher than a set value; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
step S3, locating the coordinates (x) of the second-order feature points of the facei,yi). In an embodiment of the invention, based on a gradient lifting algorithm, a plurality of decision trees are generated step by step, the current coordinate of the input second-order feature point of each decision tree aims at reducing the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) A value of (d); each feature point (x)i,yi) Gradient of (2)
Figure BDA0001966164310000102
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi
Step S4, a face tracking step, wherein the face in the video is tracked to obtain continuous space postures of the face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
In an embodiment of the present invention, in the face detection step, different face detection models are called for each frame of image of the video, and each position of the image is traversed; giving a confidence measure for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
In an embodiment of the present invention, in the step of locating the face feature points, a second-order feature point locating process is invoked for an area where each face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, which inputs the current coordinates of the feature points and aims to reduce the value of the following second-order functional
Figure BDA0001966164310000111
Wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,
Figure BDA0001966164310000112
is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
Figure BDA0001966164310000113
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold (for example, one thousandth of the coordinate value), the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
In an embodiment of the invention, the face real-time tracking system and method adopt a sparse random forest regression algorithm. Multistage regression based on random forests is the mainstream algorithm at present, and the algorithm is shown in fig. 3. The algorithm can quickly position the feature points of each face and then judge the pose and the motion track of the face according to the feature points. The method has high accuracy, the speed of positioning the characteristic points is high, and hundreds of frames of pictures can be processed on a PC per second. The mobile equipment has weak computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved. But this is based on its well-trained big data model, which requires hundreds of megabases for the standard model. Simple simplifications (e.g., reducing the number of trees, or reducing the number of regression stages) only reduce the accuracy. As shown in fig. 4, by observing and analyzing the characteristics of the random forest model, it can be seen that the feature vectors have sparsity. I.e. in the eigenvectors corresponding to each node, often only a few quantities are high, while the other components are not important. Therefore, aiming at each node, a sparse representation algorithm is adopted, and the compression rate of more than 10 times is expected to be realized, so that the model is stored on a common mobile phone, and the precision is not influenced. Meanwhile, the speed is also improved due to the reduction of the model.
In an embodiment of the invention, the real-time face tracking system and method adopts a continuous flow model. When the feature points of the front face are well positioned, but when the deflection angle of the face is larger and larger, even only a half face exists, many feature points disappear, or the corresponding positions of the feature points are unknown (as shown in fig. 5). These missing feature points cannot be located, but rather play a disturbing role. In one embodiment of the present invention, a continuous flow pattern transformation model is used to deal with this problem, i.e., a moving face sequence is considered to be located in a continuous flow pattern space. Fig. 6 is a schematic diagram illustrating the recognition of real-time face tracking according to an embodiment of the present invention, and as shown in fig. 6, the reference pose of the face is determined by determining the spatial transformations. And determining which feature points are visible according to the posture at the moment, and for the disappeared feature points, the feature points are not used for positioning. The scheme has strong robustness, and is expected to accurately position the face under the difficult conditions of large side angle, partial shielding and the like.
In an embodiment of the invention, the real-time face tracking system and method adopt secondary self-adaptive sampling feature points. The conventional method is based on fixed face feature points, i.e. the number and positions of the feature points are fixed, such as the 68-point standard model shown in fig. 7. We encrypt the feature points automatically based on the standard feature points, as shown in fig. 8. Various constraint relationships between these encrypted points and the reference points to further improve the accuracy of the location. Moreover, the points are properly encrypted according to the reference points, the calculation amount is less, and real-time tracking can still be realized.
In summary, the face real-time tracking system and method provided by the invention can improve the accuracy, processing speed and stability of real-time tracking of face information.
The description and applications of the invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those skilled in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (10)

1. A real-time face tracking system, comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning module for positioning the coordinates (x) of the second-order feature points of the facei,yi) (ii) a The face feature point positioning module gradually generates a plurality of decision trees, each decision tree inputs the current coordinates of the second-order feature points, and the goal is to reduce the value of the second-order functional; improving the coordinate (x) along the direction of the optimized gradienti,yi) Value of (a), each feature point (x)i,yi) Gradient of (2)
Figure FDA0001966164300000011
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxiη is a set constant;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
2. The real-time face tracking system of claim 1, wherein:
the face detection module considers that the face really appears only if the reliability of a certain position is greater than 0.95.
3. The real-time face tracking system of claim 1, wherein:
the face detection module calls at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… are confidence levels, ω, respectively returned for different face detection models1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
4. The real-time face tracking system of claim 1, wherein:
the face feature point positioning module calls a second-order feature point positioning algorithm for the area where each face is located;
b1, initializing the coordinates (x) of the N human face characteristic points according to the standard templatei,yi) The initial coordinates of each point are from an average face, namely, faces of a plurality of people are marked as samples, and the average value of the samples is taken for each feature point;
b2, second-order functional definition and optimization solution:
constructing a decision tree with T leaf nodes; the input of the method is the current coordinate of the characteristic point, and the target is to reduce the value of the following second-order functional;
Figure FDA0001966164300000021
wherein, IjIs the leaf node where sample i is located, (dx)j,dyj) Is the value of the optimal solution at that leaf node; giIs the first derivative of the loss function at each feature point, hiIs the second derivative of the loss function at each feature point,
Figure FDA0001966164300000022
is the optimal solution of each feature point, the second order functional
Figure FDA0001966164300000023
The extreme value problem is converted into the optimal value calculation of the T leaf nodes;
b3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; namely, the correction of the second-order characteristic point comes from taking the minimum value of the second-order functional; for leaf node TjWherein | IjL samples, defined as follows:
Figure FDA0001966164300000024
b4, correcting the coordinates of each feature point:
if the leaf node where the sample i is located is TjAnd then:
xi=xi+ηdxj,yi=yi+ηdxjwherein η is a set constant and is a common value;
and B5, continuously generating a plurality of decision trees, if the correction (sigma | dx |, sigma | dy |) between two continuous decision trees is smaller than a set threshold value, converging the module, recording the characteristic points, and sending the characteristic points to the face tracking module, otherwise, continuing the step B2.
5. The real-time face tracking system of claim 1, wherein:
the human face tracking module is used for calculating the spatial attitude of the human face according to the second-order feature points of the human face; the second-order feature points are used for determining key positions of the face; the human face tracking module is used for judging whether to blink according to the second-order feature points at the eyes, and the human face tracking module is used for judging whether to open the mouth according to the second-order feature points at the mouth; the face tracking module is used for judging the left-right, up-down postures of the face according to the second-order feature points of the face;
the human face tracking module is used for analyzing the characteristic point sequence to obtain the motion trail of the human face.
6. A real-time face tracking system, comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image and searching whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face;
the face tracking module is used for tracking the face in the video to obtain a continuous spatial attitude; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
7. A real-time face tracking method is characterized by comprising the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a step of positioning the face characteristic points, namely positioning the coordinates (x) of the second-order characteristic points of the facei,yi) (ii) a Generating a plurality of decision trees step by step, wherein the current coordinate of the input second-order characteristic point of each decision tree aims to reduce the value of the second-order functional, namely to improve the coordinate (x) along the direction of the optimized gradienti,yi) I.e. each feature point (x)i,yi) Gradient of (2)
Figure FDA0001966164300000031
Wherein, IjIs the leaf node, | I, where the feature point is locatedjI is the number of all feature points on the leaf node; and correcting the coordinates of each characteristic point as follows: x is the number ofi=xi+ηdxi,yi=yi+ηdxi
A human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
8. The real-time face tracking method according to claim 7, characterized in that:
in the step of face detection, different face detection models are called for each frame of image of the video, and each position of the image is traversed; giving a confidence coefficient rate for each possible face position; calculate the combined confidence of these confidences:
confidence rate=ω1R12R2+……;
wherein R is1,R2… confidence returned for different face detection models, respectivelyDegree, omega1,ω2… are set coefficients, respectively;
if the comprehensive confidence coefficient rate is greater than 0.95, the position has a face; recording the position, and sending the position information to a face feature point positioning module; and if the human face cannot be detected, continuing to detect the next frame of image.
9. The real-time face tracking method according to claim 7, characterized in that:
in the step of positioning the face feature points, a second-order feature point positioning process is called for the area where each face is located;
step B1, initializing N characteristic points (x) according to the standard templatei,yi) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the characteristic points, and reducing the value of the following second-order functional;
Figure FDA0001966164300000041
wherein, IjIs the leaf node, ω, in which the feature point is locatedjIs the value of the leaf node; giIs the first order of the loss function at each feature point,
Figure FDA0001966164300000042
is the optimal solution, h, for each feature pointiIs the second derivative of the loss function at each feature point;
step B3, taking the minimum value of the functional, and correspondingly correcting the coordinate of each feature point; that is, the second-order feature point is corrected by taking the minimum value of the second-order functional, which is defined as follows:
Figure FDA0001966164300000043
step B4, correcting the coordinates of each feature point:
xi=xi+ηdxi,yi=yi+ηdxiwherein η is a set constant;
step B5, generating multiple decision trees in succession, such as modifications (dx) between two successive decision treesi,dyi) If the value is less than the set threshold value, the module converges, records the feature points, and goes to the face tracking step, otherwise, goes to step B2.
10. A real-time face tracking method is characterized by comprising the following steps:
a video frame image acquisition step, wherein each frame image of a video is acquired;
a human face detection step, namely calling different human face detection models for each frame of image to find out whether a human face appears; the face detection models can traverse each position of the image, judge whether a face appears at the position and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to be really appeared, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step, namely positioning coordinates of second-order feature points of the face;
a human face tracking step, namely tracking a human face in a video to obtain a continuous spatial attitude of the human face; for each frame of video, the second-order feature points of the human face can be accurately positioned, and the sequence formed by the feature points represents the motion track of the human face; judging the air posture of the face based on the feature points; the change of the feature points among different frames reflects various changes and motions of the human face.
CN201910103409.9A 2019-02-01 2019-02-01 Real-time human face tracking system and method Active CN111523345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910103409.9A CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910103409.9A CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Publications (2)

Publication Number Publication Date
CN111523345A true CN111523345A (en) 2020-08-11
CN111523345B CN111523345B (en) 2023-06-23

Family

ID=71899996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910103409.9A Active CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Country Status (1)

Country Link
CN (1) CN111523345B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140864A (en) * 2022-01-29 2022-03-04 深圳市中讯网联科技有限公司 Trajectory tracking method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344922A (en) * 2008-08-27 2009-01-14 华为技术有限公司 Human face detection method and device
CN103310204A (en) * 2013-06-28 2013-09-18 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis
CN104182718A (en) * 2013-05-21 2014-12-03 腾讯科技(深圳)有限公司 Human face feature point positioning method and device thereof
WO2014205768A1 (en) * 2013-06-28 2014-12-31 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344922A (en) * 2008-08-27 2009-01-14 华为技术有限公司 Human face detection method and device
CN104182718A (en) * 2013-05-21 2014-12-03 腾讯科技(深圳)有限公司 Human face feature point positioning method and device thereof
CN103310204A (en) * 2013-06-28 2013-09-18 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis
WO2014205768A1 (en) * 2013-06-28 2014-12-31 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
战江涛;刘强;柴春雷;: "基于三维模型与Gabor小波的人脸特征点跟踪方法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140864A (en) * 2022-01-29 2022-03-04 深圳市中讯网联科技有限公司 Trajectory tracking method and device, storage medium and electronic equipment
CN114140864B (en) * 2022-01-29 2022-07-05 深圳市中讯网联科技有限公司 Trajectory tracking method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111523345B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2022217840A1 (en) Method for high-precision multi-target tracking against complex background
Rikert et al. Gaze estimation using morphable models
CN109472198B (en) Gesture robust video smiling face recognition method
CN108363973B (en) Unconstrained 3D expression migration method
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
CN108229416A (en) Robot SLAM methods based on semantic segmentation technology
US20230230305A1 (en) Online streamer avatar generation method and apparatus
CN116309731A (en) Multi-target dynamic tracking method based on self-adaptive Kalman filtering
CN110276785A (en) One kind is anti-to block infrared object tracking method
Nunes et al. Robust event-based vision model estimation by dispersion minimisation
Feng et al. Kalman filter for spatial-temporal regularized correlation filters
CN111402303A (en) Target tracking architecture based on KFSTRCF
CN105608710A (en) Non-rigid face detection and tracking positioning method
CN111860243A (en) Robot action sequence generation method
US20050185834A1 (en) Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras
CN113129332A (en) Method and apparatus for performing target object tracking
CN110598595A (en) Multi-attribute face generation algorithm based on face key points and postures
CN111523345A (en) Face real-time tracking system and method
CN107194947B (en) Target tracking method with self-adaptive self-correction function
Wang et al. A Visual SLAM Algorithm Based on Image Semantic Segmentation in Dynamic Environment
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
Cordea et al. 3-D head pose recovery for interactive virtual reality avatars
CN112507940A (en) Skeleton action recognition method based on difference guidance representation learning network
CN113674323A (en) Visual target tracking algorithm based on multidimensional confidence evaluation learning
CN113762149A (en) Feature fusion human behavior recognition system and method based on segmentation attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant