CN118609189A

CN118609189A - Face angle determining method, device, equipment and medium based on computer vision

Info

Publication number: CN118609189A
Application number: CN202410834513.6A
Authority: CN
Inventors: 周凯旋; 赵武阳; 陈瀚; 梁飒; 赵银妹
Original assignee: Beijing Shengzhe Science & Technology Co ltd
Current assignee: Beijing Shengzhe Science & Technology Co ltd
Priority date: 2024-06-26
Filing date: 2024-06-26
Publication date: 2024-09-06

Abstract

The invention discloses a face angle determining method, device, equipment and medium based on computer vision. The method comprises the following steps: acquiring current video data corresponding to a target area and containing a current video frame; identifying and processing the current video frame based on the target area detection model, generating a basic face identification area containing face detection information, and performing expansion cutting on the basic face identification area based on the face detection information to generate a target face identification area corresponding to the current video frame; extracting key points from the target face recognition area based on the improved target three-dimensional key point detection model, and generating a target face key point set corresponding to the current video frame; and calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard, and determining a target face angle corresponding to the current video frame. According to the technical scheme, the face angle can be calculated, and the calculation accuracy of the face angle is improved.

Description

Face angle determining method, device, equipment and medium based on computer vision

Technical Field

The present invention relates to the field of computer vision, and in particular, to a method, an apparatus, a device, and a medium for determining a face angle based on computer vision.

Background

Along with the rapid development of computer vision technology, the face angle and posture estimation is widely used in the fields of security monitoring control systems, virtual reality and augmented reality application, medical image analysis, biological recognition and the like. Therefore, it becomes important to accurately calculate the face angle.

The prior art generally uses two-dimensional (2 d) key points of a face detected on an image to calculate the angle of the face. However, the face angle calculation method in the prior art is easy to have larger errors, and particularly when the face angle is larger, the calculated face angle errors are larger. Therefore, how to accurately calculate the face angle and improve the calculation accuracy of the face angle is a problem to be solved urgently at present.

Disclosure of Invention

The invention provides a face angle determining method, device, equipment and medium based on computer vision, which can solve the problem of lower calculation accuracy of face angles.

According to an aspect of the present invention, there is provided a face angle determining method based on computer vision, including:

Acquiring current video data corresponding to a target area; wherein the current video data comprises at least one current video frame;

Identifying and processing the current video frame based on a target area detection model, generating a basic face identification area containing face detection information, and performing expansion cutting on the basic face identification area based on the face detection information to generate a target face identification area corresponding to the current video frame; wherein the face detection information includes coordinate position information corresponding to the basic face recognition area;

Extracting key points from the target face recognition area based on an improved target three-dimensional key point detection model, and generating a target face key point set corresponding to the current video frame;

and calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard, and determining a target face angle corresponding to the current video frame.

According to another aspect of the present invention, there is provided a face angle determining apparatus based on computer vision, including:

The data acquisition module is used for acquiring current video data corresponding to the target area; wherein the current video data comprises at least one current video frame;

The region identification module is used for identifying and processing the current video frame based on a target region detection model, generating a basic face identification region containing face detection information, and performing expansion cutting on the basic face identification region based on the face detection information to generate a target face identification region corresponding to the current video frame; wherein the face detection information includes coordinate position information corresponding to the basic face recognition area;

the key point extraction module is used for extracting key points of the target face recognition area based on the improved target three-dimensional key point detection model, and generating a target face key point set corresponding to the current video frame;

and the angle calculation module is used for calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard, and determining the target face angle corresponding to the current video frame.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the computer vision-based face angle determination method of any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the face angle determining method based on computer vision according to any one of the embodiments of the present invention when executed.

According to another aspect of the present invention, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a computer vision-based face angle determination method according to any of the embodiments of the present invention.

According to the technical scheme, the current video frame corresponding to the target area is identified and processed through the target area detection model, the basic face identification area containing face detection information is generated, the basic face identification area is subjected to outward expansion cutting based on the face detection information, the target face identification area corresponding to the current video frame is generated, further, the target face identification area is subjected to key point extraction based on the improved target three-dimensional key point detection model, the target face key point set corresponding to the current video frame is generated, finally, the target face angle corresponding to the current video frame is determined by calculation and evaluation based on the basic three-dimensional face key point template and a preset large angle judgment standard, the problem that the calculation accuracy of the face angle is low is solved, the problem that errors occur in the face angle calculation result due to inaccurate detection of part of key points under the condition of large angles can be avoided through three-dimensional face key points and large angle judgment, the face angle can be accurately calculated, and the face angle calculation accuracy is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a face angle determining method based on computer vision according to a first embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an improved three-dimensional object keypoint detection model according to a first embodiment of the invention;

fig. 3 is a flowchart of a face angle determining method based on computer vision according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a face angle determining device based on computer vision according to a third embodiment of the present invention;

Fig. 5 is a schematic structural diagram of an electronic device implementing a face angle determining method based on computer vision according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

Notably, the acquisition, storage, use, processing and the like of the data in the technical scheme of the application all meet the relevant regulations of national laws and regulations.

Example 1

Fig. 1 is a flowchart of a face angle determining method based on computer vision according to an embodiment of the present invention, where the method may be performed by a face angle determining device based on computer vision, and the face angle determining device based on computer vision may be implemented in hardware and/or software, and the face angle determining device based on computer vision may be configured in an electronic device, and may be configured in a server, for example. As shown in fig. 1, the method includes:

S110, acquiring current video data corresponding to a target area; wherein the current video data comprises at least one current video frame.

The target area may refer to an area where the face angle needs to be calculated. The current video data may refer to real-time video corresponding to the target area. The current video data can be acquired by a camera installed in the target area by way of example.

Wherein the current video frame may refer to each video frame contained within the current video data. Typically, at least one face image is contained within the current video frame. It is noted that, in the embodiment of the present invention, the current video data is a video captured after the authorization of the user.

S120, identifying and processing the current video frame based on a target area detection model, generating a basic face identification area containing face detection information, and performing expansion cutting on the basic face identification area based on the face detection information to generate a target face identification area corresponding to the current video frame; wherein the face detection information includes coordinate position information corresponding to the basic face recognition area.

The target region detection model may refer to a pre-trained face region detection model. By way of example, the target region Detection model may be a sample and compute reassignment (SAMPLE AND Computation Redistribution for EFFICIENT FACE Detection, SCRFD) face Detection model for efficient face Detection. The face in the current video frame can be accurately identified and framed through the target area detection model. The basic face recognition area may refer to a partial area of the current video frame that contains only a face image. Typically, a face image corresponds to a basic face recognition area. The face detection information may refer to coordinate position information corresponding to the basic face recognition area. For example, the upper left corner position and the lower right corner position of the detection frame corresponding to the basic face recognition area may be taken as the basic face recognition area.

The expansion clipping may be an operation of expanding the basic face recognition area according to a detection frame. For example, the length and width of the detection frame can be enlarged by 1.2 times according to the enlargement standard of 1:1.2, and then cut. The target face recognition area may refer to an image area generated by performing expansion clipping on the basic face recognition area. In general, a current video frame may contain multiple target facial recognition regions.

Specifically, the target area detection model may be used to perform face detection on the current video frame to obtain a basic face recognition area including the face detection frame and coordinate position information of the detection frame, and then, the face detection frame is subjected to expansion clipping according to the coordinate position information of the detection frame to generate a target face recognition area corresponding to the basic face recognition area. Therefore, the target face recognition area can be ensured to contain a larger range of face images, the condition that the detection frame is positioned at the edge of the face is avoided, and an effective basis is provided for subsequent operation.

S130, extracting key points of the target face recognition area based on the improved target three-dimensional key point detection model, and generating a target face key point set corresponding to the current video frame.

The target face key point set may refer to a set formed by all key point coordinates corresponding to the same face region in the current video frame. Typically, a face region corresponds to a set of target face keypoints. The target three-dimensional keypoint detection model may refer to a pre-trained three-dimensional keypoint detection model. By way of example, the target three-dimensional keypoint detection model may be a Joint three-dimensional face reconstruction and dense alignment network (join 3D Face Reconstruction and Dense Alignment with Position Map Regression Network,PRNet) that is a modified location map regression network-based. The three-dimensional key points of the human face in the target face recognition area can be extracted through the target three-dimensional key point detection model.

In an alternative embodiment, the feature extraction component of the improved target three-dimensional keypoint detection model is a visual geometry group convolutional neural network re-highlighting trunk model. Fig. 2 is a schematic structural diagram of an improved three-dimensional target keypoint detection model according to an embodiment of the present invention. Specifically, the original Residual Network (ResNet) in PRNet is replaced by the backbone model of the visual geometry group convolutional neural Network (Making Visual Geometry Group-style ConvNets GREAT AGAIN, REPVGG), and the result after the preamble addition operation is introduced into each subsequent addition operation, namely, a plurality of branches for transmitting shallow semantic information downwards are added, so that the deep feature map contains enough shallow semantic information. Therefore, the nonlinear function (RECTIFIED LINEAR Unit, reLU) and the weight parameters of the convolution kernel can be separated, and the two processes are decoupled, so that the network structure is simplified, the interpretability and training efficiency of the network are improved, and the model can obtain a better effect under the condition of not increasing the parameters.

And S140, calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard, and determining a target face angle corresponding to the current video frame.

The basic three-dimensional face key point template can refer to a standard three-dimensional face key point set constructed in advance. And carrying out matching evaluation on each target face key point set through the basic three-dimensional face key point template. The preset large angle judgment standard may refer to a preset large angle evaluation standard. When the face is offset in a large angle, partial key point detection is possibly inaccurate, so that a larger error exists in the offset angle calculated based on the basic three-dimensional face key point template, and therefore the offset angle can be judged secondarily by the preset large angle judgment standard, and the accuracy of the finally obtained target face angle is ensured.

The target face angle may refer to a deflection angle corresponding to a face image in the current video frame. It should be noted that, in the embodiment of the present invention, the target face angle corresponds to the basic face recognition area, that is, one basic face recognition area corresponds to one target face angle, and if there are multiple basic face recognition areas in the same current video frame, the current video frame also corresponds to multiple target face angles.

According to the technical scheme, the current video frame corresponding to the target area is identified and processed through the target area detection model, the basic face identification area containing face detection information is generated, the basic face identification area is subjected to expansion cutting based on the face detection information, the target face identification area corresponding to the current video frame is generated, further, the target face identification area is subjected to key point extraction based on the improved target three-dimensional key point detection model, the target face key point set corresponding to the current video frame is generated, finally, the target face angle corresponding to the current video frame is determined by calculation and evaluation based on the basic three-dimensional face key point template and the preset large angle judgment standard, the problem that the calculation accuracy of the face angle is low is solved, the problem that errors occur in the face angle calculation result due to partial key point detection inaccuracy under the large angle condition can be avoided through three-dimensional face key points and large angle judgment, the face angle can be accurately calculated, and the face angle calculation accuracy is improved.

Example two

Fig. 3 is a flowchart of a face angle determining method based on computer vision, which is provided in a second embodiment of the present invention, and the thinning is performed based on the foregoing embodiment, in this embodiment, specifically, the computing and evaluating the target face key point set based on a basic three-dimensional face key point template and a preset large angle judgment standard, and the thinning of the operation of determining the target face angle corresponding to the current video frame may specifically include: performing matching processing on the target face key point set based on the basic three-dimensional face key point template to generate deviation coordinates corresponding to the target face key point set; determining a basic face angle corresponding to the deviation coordinates based on a preset rotation matrix; and determining an angle class corresponding to the target face key point based on a preset large angle judgment standard, and taking the basic face angle as a target face angle corresponding to the current video frame if the angle class is a preset angle class. As shown in fig. 3, the method includes:

S210, acquiring current video data corresponding to a target area.

Wherein the current video data comprises at least one current video frame.

S220, identifying and processing the current video frame based on a target area detection model, generating a basic face identification area containing face detection information, and performing expansion cutting on the basic face identification area based on the face detection information to generate a target face identification area corresponding to the current video frame.

Wherein the face detection information includes coordinate position information corresponding to the basic face recognition area.

In an alternative embodiment, the method further comprises: acquiring historical video data, and performing data acquisition on the historical video data to generate a historical image data set corresponding to the historical video data; wherein the historical image dataset comprises historical image data frames; performing face labeling on each historical image data frame based on a preset region detection model to generate target face detection samples containing face boundary frames, and combining and processing each target face detection sample to generate a target face detection data set; and training the basic region detection model based on the target face detection data set to obtain a trained target region detection model.

Wherein, the historical video data can refer to video containing face images acquired during a historical period of time. It should be noted that the historical video data may be video data collected in a target area, or may be video data collected in other areas, which is not limited in this embodiment of the present invention. The historical image data frames may refer to individual video frames contained within the historical video data. A historical image dataset may refer to a dataset that contains all historical image data frames in the same historical video data. Generally, the historical video data can be extracted frame by frame, so as to obtain historical image data frames corresponding to the historical video data, and then the historical image data frames are arranged in sequence to form a historical image data set.

The preset region detection model may refer to a preset region detection model. In general, the preset region detection model may be a trained region detection model with a certain accuracy, and face detection may be performed on the historical image data frame directly by using the preset region detection model. The face bounding box may refer to a face region detection box. Illustratively, the face bounding box may be a rectangular box. Face labeling may refer to the operation of framing a face region with a face bounding box.

The target face detection sample may refer to sample data including only a face region. Typically, one face region corresponds to one target face detection sample. The target face detection data set may refer to a data set composed of all target face detection samples corresponding to the historical video data. It should be noted that, in order to ensure accuracy of the sample data, after face labeling is performed on each historical image data frame based on the preset region detection model, a manual correction mode may be used to correct the output result of the model in a missed or false mark manner, and the corrected result is used as the target face detection sample.

The basic region detection model may specify a face region detection model selected for subsequent face recognition. Illustratively, SCRFD face detection models are possible. It should be noted that, in the embodiment of the present invention, the basic region detection model and the target region detection model are the same region detection model in different states, and have the same model structure, that is, the basic region detection model obtains the target region detection model after training. Specifically, when the basic region detection model is trained by using the target face detection data set, the basic region detection model can be trained by using a regression loss function or a classification loss function, and a regularization term or other customized loss components can be used to further optimize the training effect and generalization capability of the model.

S230, extracting key points of the target face recognition area based on the improved target three-dimensional key point detection model, and generating a target face key point set corresponding to the current video frame.

In an optional implementation manner, after the face labeling is performed on each historical image data frame based on the preset area detection model, generating target face detection samples including a face boundary box, and combining and processing each target face detection sample, generating a target face detection data set, the method further includes: performing outward expansion cutting on each target face detection sample in the target face detection data set to obtain an initial key point labeling sample; extracting key points from each initial key point labeling sample based on a preset two-dimensional key point detection model, and generating an initial key point sample containing two-dimensional face key points; performing angle rotation on each initial key point sample based on a preset angle enhancement technology, and generating a basic key point sample corresponding to each initial key point sample; converting the basic key point sample into target key point samples containing three-dimensional key point coordinates based on a preset three-dimensional conversion technology, and combining and processing all the target key point samples to generate a target three-dimensional key point data set; and training the improved basic three-dimensional key point detection model based on the target three-dimensional key point data set to obtain a trained target three-dimensional key point detection model.

The initial keypoint labeling sample may refer to a basic sample from which keypoint extraction is performed. The preset two-dimensional key point detection model may refer to a preset two-dimensional key point extraction model. In general, the preset two-dimensional key point detection model may be a trained two-dimensional key point detection model with a certain accuracy, and the preset two-dimensional key point detection model may be directly used to extract key points from the initial key point labeling sample. The initial keypoint sample may refer to a two-dimensional facial keypoint extracted based on a preset two-dimensional keypoint detection model. Typically, the initial keypoint sample contains 68 keypoint coordinates, and one target face detection sample corresponds to one initial keypoint sample. It should be noted that, in order to ensure accuracy of sample data, after extracting the key points of each initial key point labeling sample based on a preset two-dimensional key point detection model, a manual correction mode may be utilized to correct the model output result, and the corrected result is used as an initial key point sample.

The preset angle enhancement technique may refer to a preset face angle enhancement technique. The face angles on the front side can be randomly rotated at all angles according to the marked initial key point samples through a preset angle enhancement technology, so that the diversity of the data samples is enhanced. The base keypoint samples may refer to two-dimensional keypoint samples generated after angle enhancement.

The preset three-dimensional transformation technique may refer to a preset three-dimensional key point transformation technique. The two-dimensional key point coordinates in the basic key point sample can be converted into three-dimensional key point coordinates through a preset three-dimensional conversion technology. By way of example, the preset three-dimensional transformation technique may be a three-dimensional deformable face model (3Dimensions Morphable Face Model,3DMM). The target keypoint sample may refer to a three-dimensional keypoint sample that matches the underlying keypoint sample. The target three-dimensional keypoint data set may refer to a data set made up of target keypoint samples.

The basic three-dimensional key point detection model can be used for selecting a model for extracting the subsequent three-dimensional key points. By way of example, a modified PRNet model is possible. It is noted that, in the embodiment of the present invention, the basic three-dimensional keypoint detection model and the target three-dimensional keypoint detection model are the same three-dimensional keypoint detection model in different states, and have the same model structure, that is, the basic three-dimensional keypoint detection model is trained to obtain the target three-dimensional keypoint detection model.

S240, performing matching processing on the target face key point set based on the basic three-dimensional face key point template, and generating deviation coordinates corresponding to the target face key point set.

The deviation coordinates may refer to coordinate deviation values of the target face key point set relative to the basic three-dimensional face key point template.

In an optional implementation manner, before the calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and the preset large angle judgment standard and determining the target face angle corresponding to the current video frame, the method further includes: determining the standard face images of the target number in a standard face database based on a preset screening standard; determining standard face key points corresponding to each standard face image based on a preset three-dimensional conversion technology; and (5) processing basic key point coordinates of the same set position in the key points of each standard face by the average value to generate a basic three-dimensional face key point template.

The preset screening criteria may refer to preset screening conditions. By way of example, the preset screening criteria may be face offset angle, age, or face pixel size, etc. The standard facial database may refer to an existing public database of better quality facial data. The standard face image may refer to each face image contained in the face database. The target number may refer to a numerical value defining the number of extracted standard face images. For example, if the standard facial database contains 2000 tens of thousands of facial images, the target number may be 500 tens of thousands. The standard face key points may refer to three-dimensional face key point coordinates corresponding to each standard face image. The base keypoint coordinates may refer to keypoint coordinates selected for calculation by the quasi-facial keypoints. The mean processing may refer to an operation of calculating a mean value of the basic key point coordinates of the same set position.

Specifically, after the standard face images of the target number are determined in the standard face database based on the preset screening standard, standard face key points in each standard face image can be extracted by using a preset three-dimensional conversion technology to obtain three-dimensional key point coordinates corresponding to each standard face image, and further, average processing is performed on basic key point coordinates in the same position in each standard face key point. For example, if 500 ten thousand standard face images are determined from the standard face database, that is, the number of standard face key points is 500 ten thousand, if the standard face key points all include 34 three-dimensional key point coordinates, the key point coordinates of the 0 number position in the 500 ten thousand standard face key points need to be subjected to average calculation in sequence, the key point coordinates of the 1 number position need to be subjected to average calculation, and so on, the average calculation of 34 key point coordinates is completed, and finally, the coordinate result after the average calculation of the corresponding positions is marked as the coordinate of the corresponding position in the basic three-dimensional face key point template, that is, the coordinate result can be used as the key point coordinate of the 0 number position in the basic three-dimensional face key point template after the key point coordinates of the 0 number position in the 500 ten thousand standard face key points are subjected to average calculation. Therefore, the construction of the basic three-dimensional face key point template is realized, and an effective basis is provided for angle calculation.

S250, determining a basic face angle corresponding to the deviation coordinate based on a preset rotation matrix.

The preset rotation matrix may refer to a preset angle calculation function. By way of example, it may be a rotation matrix of an open source computer vision library (Open Source Computer Vision Library, openCV). The basic face angle may refer to a face offset angle that is calculated primarily from a basic three-dimensional face key point template.

And S260, determining an angle class corresponding to the target face key point based on a preset large angle judgment standard, and taking the basic face angle as a target face angle corresponding to the current video frame if the angle class is a preset angle class.

The angle class may refer to a class that evaluates a face angle offset range. For example, the angle categories may be classified into a small angle category having a small angle offset range and a large angle category having a large angle offset range. The preset angle category may refer to a small angle category.

In an optional embodiment, the determining, based on a preset large angle judgment criterion, an angle category corresponding to the target face key point includes: determining eye distance and eye lip distance corresponding to the current video frame based on the target face key point set; and determining the angle category corresponding to the target facial key point based on a first deviation relation between the eye distance and a first set threshold value and a second deviation relation between the eye lip distance and a second set threshold value.

The distance between eyes may refer to the distance between two eyes in the same face area, i.e., the distance between the centers of the two eyeballs. The first set threshold may refer to a preset value for evaluating the eye distance. In general, the setting may be made according to the actual angle of the camera in the target area. The first deviation relationship may refer to a magnitude relationship between the eye distance and a first set threshold.

The distance between the eyes and lips may be the distance between the centers of eyes and the centers of corners of the mouth in the same facial area. The second set threshold may refer to a preset value for evaluating the inter-ocular lip distance. In general, the setting may be made according to the actual angle of the camera in the target area. The second deviation relationship may refer to a magnitude relationship between the inter-ocular lip distance and a second set threshold.

Specifically, the coordinate positions of the two eyes and the coordinate positions of the two corners of the mouth corresponding to the current face region can be determined according to the target face key point set, then, the eye distance and the eye-lip distance are calculated, further, the eye distance is compared with a first set threshold value, the eye-lip distance is compared with a second set threshold value, if the eye distance is greater than the first set threshold value and the eye-lip distance is greater than the second set threshold value, the target face key point is determined to be in a small angle category, and otherwise, the target face key point is determined to be in a large angle category. Therefore, whether the angle class is a large-angle side face can be judged through a first deviation relation between the eye distance and a first set threshold value, whether the angle class is a large-angle low head or a head raising can be judged through a second deviation relation between the eye and lip distance and a second set threshold value, and an effective basis is provided for subsequent secondary screening of basic face angles according to the angle class.

It should be noted that, in the embodiment of the present invention, after determining the angle class corresponding to the target face key point based on the preset large angle judgment standard, if the angle class is a small angle class, the basic face angle is taken as the target face angle corresponding to the current video frame, the timing information corresponding to the current video frame may be further acquired, and the timing information corresponding to the current video frame and the target face angle may be combined and then stored in the background memory for subsequent use.

According to the technical scheme, the current video frame corresponding to the target area is identified and processed through the target area detection model, the basic face identification area containing face detection information is generated, the basic face identification area is subjected to expansion cutting based on the face detection information, the target face identification area corresponding to the current video frame is generated, further, the target face key point set corresponding to the current video frame is generated based on the improved target three-dimensional key point detection model, further, the target face key point set is subjected to matching processing based on the basic three-dimensional face key point template, deviation coordinates corresponding to the target face key point set are generated, the basic face angle corresponding to the deviation coordinates is determined based on the preset rotation matrix, finally, the angle category corresponding to the target face key point is determined based on the preset large angle judgment standard, if the angle category is the preset angle category, the basic face angle is taken as the target face angle corresponding to the current video frame, the problem that the calculation accuracy of the face angle is low is solved, under the condition that the face angle calculation result is inaccurate due to the fact that part of key point detection is not accurate is avoided, the face angle calculation accuracy can be improved, and the face accuracy is improved.

Example III

Fig. 4 is a schematic structural diagram of a face angle determining device based on computer vision according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: a data acquisition module 310, a region identification module 320, a key point extraction module 330 and an angle calculation module 340;

The data obtaining module 310 is configured to obtain current video data corresponding to the target area; wherein the current video data comprises at least one current video frame;

The region identifying module 320 is configured to identify and process the current video frame based on a target region detection model, generate a basic face identifying region including face detection information, and perform expansion clipping on the basic face identifying region based on the face detection information, so as to generate a target face identifying region corresponding to the current video frame; wherein the face detection information includes coordinate position information corresponding to the basic face recognition area;

The key point extraction module 330 is configured to perform key point extraction on the target face recognition area based on an improved target three-dimensional key point detection model, and generate a target face key point set corresponding to the current video frame;

the angle calculation module 340 is configured to calculate and evaluate the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard, and determine a target face angle corresponding to the current video frame.

Optionally, the angle calculation module 340 may specifically include: the device comprises a coordinate matching unit, a first angle calculating unit and a second angle calculating unit;

The coordinate matching unit is used for carrying out matching processing on the target face key point set based on the basic three-dimensional face key point template to generate deviation coordinates corresponding to the target face key point set;

The first angle calculation unit is used for determining a basic face angle corresponding to the deviation coordinates based on a preset rotation matrix;

And the second angle calculation unit is used for determining an angle class corresponding to the target face key point based on a preset large angle judgment standard, and taking the basic face angle as a target face angle corresponding to the current video frame if the angle class is a preset angle class.

Optionally, the second angle calculating unit may specifically be configured to:

determining eye distance and eye lip distance corresponding to the current video frame based on the target face key point set;

And determining the angle category corresponding to the target facial key point based on a first deviation relation between the eye distance and a first set threshold value and a second deviation relation between the eye lip distance and a second set threshold value.

Optionally, the face angle determining device based on computer vision may further include: the first model training module is used for acquiring historical video data, carrying out data acquisition on the historical video data and generating a historical image data set corresponding to the historical video data; wherein the historical image dataset comprises historical image data frames; performing face labeling on each historical image data frame based on a preset region detection model to generate target face detection samples containing face boundary frames, and combining and processing each target face detection sample to generate a target face detection data set; and training the basic region detection model based on the target face detection data set to obtain a trained target region detection model.

Optionally, the face angle determining device based on computer vision may further include: the second model training module is used for carrying out face labeling on each historical image data frame based on the preset region detection model, generating target face detection samples comprising a face boundary frame, combining and processing each target face detection sample, and carrying out expansion cutting on each target face detection sample in the target face detection data set after generating a target face detection data set to obtain an initial key point labeling sample; extracting key points from each initial key point labeling sample based on a preset two-dimensional key point detection model, and generating an initial key point sample containing two-dimensional face key points; performing angle rotation on each initial key point sample based on a preset angle enhancement technology, and generating a basic key point sample corresponding to each initial key point sample; converting the basic key point sample into target key point samples containing three-dimensional key point coordinates based on a preset three-dimensional conversion technology, and combining and processing all the target key point samples to generate a target three-dimensional key point data set; and training the improved basic three-dimensional key point detection model based on the target three-dimensional key point data set to obtain a trained target three-dimensional key point detection model.

Optionally, the face angle determining device based on computer vision may further include: the template generation module is used for determining standard face images of target quantity in a standard face database based on preset screening criteria before calculating and evaluating the target face key point set based on the basic three-dimensional face key point template and the preset large-angle judgment criteria and determining the target face angle corresponding to the current video frame; determining standard face key points corresponding to each standard face image based on a preset three-dimensional conversion technology; and (5) processing basic key point coordinates of the same set position in the key points of each standard face by the average value to generate a basic three-dimensional face key point template.

Optionally, the feature extraction part in the improved target three-dimensional key point detection model is a visual geometry group convolutional neural network re-highlighting trunk model.

The face angle determining device based on computer vision provided by the embodiment of the invention can execute the face angle determining method based on computer vision provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Example IV

Fig. 5 shows a schematic diagram of an electronic device 410 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 410 includes at least one processor 420, and a memory, such as a Read Only Memory (ROM) 430, a Random Access Memory (RAM) 440, etc., communicatively coupled to the at least one processor 420, wherein the memory stores computer programs executable by the at least one processor, and the processor 420 may perform various suitable actions and processes according to the computer programs stored in the Read Only Memory (ROM) 430 or the computer programs loaded from the storage unit 490 into the Random Access Memory (RAM) 440. In RAM440, various programs and data required for the operation of electronic device 410 may also be stored. The processor 420, ROM 430, and RAM440 are connected to each other by a bus 450. An input/output (I/O) interface 460 is also connected to bus 450.

Various components in the electronic device 410 are connected to the I/O interface 460, including: an input unit 470 such as a keyboard, a mouse, etc.; an output unit 480 such as various types of displays, speakers, and the like; a storage unit 490, such as a magnetic disk, an optical disk, or the like; and a communication unit 4100, such as a network card, modem, wireless communication transceiver, etc. The communication unit 4100 allows the electronic device 410 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunications networks.

Processor 420 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of processor 420 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 420 performs the various methods and processes described above, such as a face angle determination method based on computer vision.

The method comprises the following steps:

In some embodiments, the computer vision based face angle determination method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 490. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 410 via the ROM 430 and/or the communication unit 4100. When the computer program is loaded into RAM 440 and executed by processor 420, one or more of the steps of the computer vision-based face angle determination method described above may be performed. Alternatively, in other embodiments, the processor 420 may be configured to perform the computer vision based face angle determination method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

The embodiment of the application also discloses a computer program product, which comprises a computer program, and the computer program realizes the face angle determining method based on computer vision provided by any embodiment of the application when being executed by a processor. The program product and the face angle determining method based on computer vision disclosed in the embodiments of the present application belong to the same inventive concept, and are not described herein.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A face angle determination method based on computer vision, comprising:

2. The method according to claim 1, wherein the computing and evaluating the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard to determine a target face angle corresponding to the current video frame includes:

Performing matching processing on the target face key point set based on the basic three-dimensional face key point template to generate deviation coordinates corresponding to the target face key point set;

Determining a basic face angle corresponding to the deviation coordinates based on a preset rotation matrix;

and determining an angle class corresponding to the target face key point based on a preset large angle judgment standard, and taking the basic face angle as a target face angle corresponding to the current video frame if the angle class is a preset angle class.

3. The method according to claim 2, wherein the determining the angle class corresponding to the target face key point based on the preset large angle judgment criterion includes:

4. The method according to claim 1, wherein the method further comprises:

Acquiring historical video data, and performing data acquisition on the historical video data to generate a historical image data set corresponding to the historical video data; wherein the historical image dataset comprises historical image data frames;

performing face labeling on each historical image data frame based on a preset region detection model to generate target face detection samples containing face boundary frames, and combining and processing each target face detection sample to generate a target face detection data set;

And training the basic region detection model based on the target face detection data set to obtain a trained target region detection model.

5. The method of claim 4, further comprising, after said face labeling each of the historical image data frames based on the predetermined region detection model, generating target face detection samples including a face bounding box, and combining each of the target face detection samples to generate a target face detection data set:

performing outward expansion cutting on each target face detection sample in the target face detection data set to obtain an initial key point labeling sample;

Extracting key points from each initial key point labeling sample based on a preset two-dimensional key point detection model, and generating an initial key point sample containing two-dimensional face key points;

performing angle rotation on each initial key point sample based on a preset angle enhancement technology, and generating a basic key point sample corresponding to each initial key point sample;

converting the basic key point sample into target key point samples containing three-dimensional key point coordinates based on a preset three-dimensional conversion technology, and combining and processing all the target key point samples to generate a target three-dimensional key point data set;

and training the improved basic three-dimensional key point detection model based on the target three-dimensional key point data set to obtain a trained target three-dimensional key point detection model.

6. The method of claim 1, wherein before performing calculation and evaluation on the target face key point set based on the basic three-dimensional face key point template and a preset large angle judgment standard to determine a target face angle corresponding to a current video frame, further comprising:

Determining the standard face images of the target number in a standard face database based on a preset screening standard;

determining standard face key points corresponding to each standard face image based on a preset three-dimensional conversion technology;

And (5) processing basic key point coordinates of the same set position in the key points of each standard face by the average value to generate a basic three-dimensional face key point template.

7. The method of claim 1, wherein the feature extraction component of the improved target three-dimensional keypoint detection model is a visual geometry group convolutional neural network re-protrusion trunk model.

8. A face angle determining apparatus based on computer vision, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the computer vision-based face angle determination method of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions for causing a processor to perform the computer vision based face angle determination method of any one of claims 1-7.