WO2016165614A1

WO2016165614A1 - Method for expression recognition in instant video and electronic equipment

Info

Publication number: WO2016165614A1
Application number: PCT/CN2016/079115
Authority: WO
Inventors: 武俊敏
Original assignee: 美国掌赢信息科技有限公司; 武俊敏
Priority date: 2015-04-16
Filing date: 2016-04-13
Publication date: 2016-10-20
Also published as: CN104794444A

Abstract

A method for expression recognition in an instant video belongs to the field of videos. The method comprises: acquiring a characteristic vector corresponding to at least one characteristic point of a human face in an instant video frame, the characteristic point being used for description of a current expression of the human face (401); recognizing the characteristic vector corresponding to the at least one characteristic point to generate a recognition result (402); and determining, according to the recognition result, that the current expression is one of a plurality of pre-stored expressions (403). In the method, human face expressions in an instant video are recognized according to characteristic vectors, so that the diversified demands of users are met and user experience is improved.

Description

Expression recognition method and electronic device in instant video

Technical field

The present invention relates to the field of video, and in particular, to an expression recognition method and an electronic device in an instant video.

Background technique

With the popularity of instant video applications on mobile terminals, more and more users realize interaction with others through instant video applications. Therefore, an expression recognition method in instant video is needed to satisfy users. The instant video application realizes the personalized needs when interacting with others, and improves the user experience in the interactive scenario.

The prior art provides an expression recognition method, which specifically includes: acquiring a current frame picture to be recognized from a pre-recorded video, identifying a facial expression in the current frame picture, and continuing the above steps on other frame images. To identify facial expressions in the video frame picture in the video.

However, since the method cannot recognize the facial expression in the real-time video in real time, and in the implementation process, since the method occupies a large amount of processing resources and storage resources of the device, the method has high requirements on the device, and the method cannot be applied. For mobile terminals such as smart phones and tablets, it is unable to meet the diverse needs of users and reduce the user experience.

Summary of the invention

In order to meet the diversified needs of the user and improve the user experience, the embodiment of the present invention provides an expression recognition method and an electronic device in an instant video. The technical solution is as follows:

In a first aspect, an expression recognition method in an instant video is provided, the method comprising:

Obtaining, by the feature vector corresponding to the at least one feature point of the face in the instant video frame, the feature point is used to describe the current expression of the face;

Identifying a feature vector corresponding to the at least one feature point to generate a recognition result;

And determining, according to the recognition result, the current expression as one of a plurality of expressions stored in advance.

With reference to the first aspect, in a first possible implementation, the feature vector includes feature point coordinates and texture feature point coordinates under a standard pose matrix, the texture feature points are used to uniquely determine the feature points, Obtaining a feature vector corresponding to at least one feature point of the face in the instant video frame includes:

Obtaining the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix;

Generating a feature vector corresponding to the at least one feature point according to the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.

In conjunction with the first possible implementation of the first aspect, in the second possible implementation, the acquiring the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix includes:

Obtaining the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate;

And normalizing the at least one feature point to obtain the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.

In conjunction with the second possible implementation of the first aspect, in a third possible implementation, the at least one feature point is normalized to obtain the at least one of the standard pose matrix The at least one texture feature point coordinate of the feature point includes:

Obtaining the at least one feature point and the at least one texture feature of the face in the instant video frame according to the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate The current pose matrix corresponding to the point;

Rotating the current pose matrix into a standard pose matrix, and acquiring the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.

In conjunction with the third possible implementation of the first aspect, in the fourth possible implementation, the identifying the feature vector corresponding to the at least one feature point includes:

Inputting a feature vector corresponding to the at least one feature point into a preset expression model library for calculation Calculating, obtaining a calculation result in at least one preset expression model, the calculation result being used to represent the recognition result.

In conjunction with the fourth possible implementation of the first aspect, in a fifth possible implementation, the determining, according to the recognition result, that the current expression is one of a plurality of pre-stored expressions includes:

If the recognition result is within the preset range, it is determined that the expression corresponding to the feature vector is one of a plurality of pre-stored expressions.

In a second aspect, an electronic device is provided, the electronic device comprising:

An acquiring module, configured to acquire a feature vector corresponding to at least one feature point of a face in an instant video frame, where the feature point is used to describe a current expression of the face;

An identification module, configured to identify a feature vector corresponding to the at least one feature point, and generate a recognition result;

And a determining module, configured to determine, according to the recognition result, the current expression as one of a plurality of pre-stored expressions.

In combination with the second aspect, in a first possible implementation manner,

The acquiring module is further configured to acquire the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix;

The identification module is further configured to generate a feature vector corresponding to the at least one feature point according to the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.

In conjunction with the first possible implementation of the second aspect, in a second possible implementation,

The acquiring module is further configured to acquire the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate;

The device further includes a processing module, configured to perform normalization processing on the at least one feature point to obtain the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.

In conjunction with the second possible implementation of the second aspect, in a third possible implementation manner,

The obtaining module is further configured to: according to the at least one feature point of a face in the instant video frame Obtaining, by the coordinates and the at least one texture feature point coordinate, the at least one feature point of the face in the instant video frame and the current pose matrix corresponding to the at least one texture feature point;

The processing module is further configured to rotate the current pose matrix into a standard pose matrix, and acquire the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.

With reference to the first or second possible implementation of the second aspect, in a fourth possible implementation, the device further includes:

And a calculation module, configured to input a feature vector corresponding to the at least one feature point into a preset expression model library for calculation, and obtain the recognition result.

With reference to the fourth possible implementation of the second aspect, in a fifth possible implementation, the determining module is specifically configured to:

In a third aspect, an electronic device is provided, including a video input module, a video output module, a sending module, a receiving module, a memory, and the video input module, the video output module, the sending module, and the receiving And a processor coupled to the memory, wherein the memory stores a set of program code, the processor is configured to invoke program code stored in the memory, and perform the following operations:

In conjunction with the third aspect, in a first possible implementation, the processor is further configured to invoke program code stored in the memory, and perform the following operations:

And the at least one feature point coordinate and the at least one texture according to the standard pose matrix Feature point coordinates, generating feature vectors corresponding to the at least one feature point.

In conjunction with the first possible implementation of the third aspect, in a second possible implementation, the processor is further configured to invoke program code stored in the memory, and perform the following operations:

In conjunction with the second possible implementation of the third aspect, in a third possible implementation, the processor is further configured to invoke the program code stored in the memory, and perform the following operations:

In conjunction with the third possible implementation of the third aspect, in a fourth possible implementation, the processor is further configured to invoke the program code stored in the memory, and perform the following operations:

And inputting a feature vector corresponding to the at least one feature point into a preset expression model library for calculation, and acquiring the recognition result.

In conjunction with the fourth possible implementation of the third aspect, in a fifth possible implementation, the processor is further configured to invoke the program code stored in the memory, and perform the following operations:

An embodiment of the present invention provides an expression recognition method and an electronic device in an instant video, including: acquiring a feature vector corresponding to at least one feature point of a face in an instant video frame, where the feature point is used to describe the current face of the face An expression; identifying a feature vector corresponding to the at least one feature point to generate a recognition result; and determining, according to the recognition result, the current expression as one of a plurality of pre-stored expressions. Obtaining feature points through feature points by acquiring feature points for describing a current expression of a face in an instant video The corresponding feature vector can more accurately represent the current expression of the face, and then obtain the recognition result according to the feature vector by identifying the feature vector, which simplifies the complexity of the algorithm for recognizing the face in the instant video, so that the embodiment of the present invention provides The method can be run on the mobile terminal to meet the diverse needs of the user and improve the user experience.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

FIG. 1 is a schematic diagram of an interaction system according to an embodiment of the present invention;

2 is a schematic diagram of an interaction system according to an embodiment of the present invention;

3 is a schematic diagram of an interaction system according to an embodiment of the present invention;

4 is a flowchart of an expression recognition method in an instant video according to an embodiment of the present invention;

FIG. 5 is a flowchart of an expression recognition method in an instant video according to an embodiment of the present invention; FIG.

6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. Some embodiments of the invention, rather than all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

An embodiment of the present invention provides an expression recognition method in an instant video, where the method is applied to an interactive system including at least two mobile terminals and a server, wherein the mobile terminal can run an instant video program, and the user can run Instant video program on the mobile terminal to achieve interaction with others The mobile terminal may be a smart phone, a tablet computer, or another mobile terminal. The specific mobile terminal is not limited in the embodiment of the present invention. The mobile terminal at least includes a video input module and a video display module, the video input module may include a camera, and the video display module may include a display screen, and the instant video program may implement real-time video input by controlling a video input module of the mobile terminal, and may also control the video. The display module enables the display of instant video.

The interactive system can be referred to FIG. 1 , in which the mobile terminal 1 is an instant video sender, the mobile terminal 2 is an instant video receiver, and the instant video sent by the mobile terminal 1 is forwarded to the mobile terminal 2 via the server; The user of the mobile terminal 1 and the user of the mobile terminal 2 can interact through the interactive system.

In particular, the execution body of the method provided by the embodiment of the present invention, that is, the electronic device, may be any one of the mobile terminal 1, the mobile terminal 2, and the server. If the execution subject of the method is the mobile terminal 1, the mobile terminal 1 After receiving the instant video input through the video input module of the user, performing facial expression recognition on the face in the instant video, forwarding the recognition result to the mobile terminal 2 via the server, and/or outputting the recognition result through the display screen of the same; The execution body of the method is a server, and after the mobile terminal 1 and/or the mobile terminal 2 input the live video through the video input module of the method, the instant video is sent to the server, and the server recognizes the facial expression in the instant video. The recognition result is sent to the mobile terminal 1 and/or the mobile terminal 2; if the execution subject of the method is the mobile terminal 2, the mobile terminal 1 sends the live video to the server after inputting the live video through its own video input module. The server sends the instant video to the mobile terminal 2, and the mobile terminal 2 performs the facial expression in the instant video. Identification, forwarding the recognition result to the mobile terminal 1 via the server, and/or outputting the recognition result through its own display screen. The specific implementation body of the method in the interaction system is not limited in the embodiment of the present invention.

In addition, the method provided by the embodiment of the present invention can also be applied to an interactive system including only the mobile terminal 1 and the mobile terminal 2. The interactive system can be referred to FIG. 2, wherein the The mobile terminal in the interactive system is the same as the mobile terminal in the interactive system shown in FIG. 1, and details are not described herein again.

In particular, the execution body of the method provided by the embodiment of the present invention, that is, the electronic device, may be Any one of the mobile terminal 1 and the mobile terminal 2, if the execution subject of the method is the mobile terminal 1, the mobile terminal 1 performs an expression on the face in the instant video after inputting the live video through its own video input module Identifying, then transmitting the recognition result to the communication device 2, and/or outputting the recognition result through its own display screen; if the execution subject of the method is the mobile terminal 2, the mobile terminal 1 inputs the live video through its own video input module Sending the live video to the mobile terminal 2, the mobile terminal 2 performs expression recognition on the face in the instant video, and then transmits the recognition result to the mobile terminal 1, and/or outputs the recognition result through its own display screen. The specific implementation body of the method in the interaction system is not limited in the embodiment of the present invention.

In addition, the method provided by the embodiment of the present invention can also be applied to an interactive system including only the mobile terminal 1 and the user. The interactive system can be referred to FIG. 3, wherein the mobile terminal 1 includes at least a video input. The module and the video display module, the video input module may include a camera, the video display module may include a display screen, and at least one instant video program may be run in the mobile terminal, and the instant video program controls the video input module and the video display module of the mobile terminal to perform instant video. Specifically, the mobile terminal receives the instant video input by the user, performs facial expression recognition on the instant video, and outputs the recognition result through the display screen of the user.

It should be noted that the mobile terminal in the embodiment of the present invention may be one or multiple, and the specific mobile terminal is not limited in the embodiment of the present invention.

In addition, the embodiment of the present invention may further include other application scenarios, and the specific application scenario is not limited in the embodiment of the present invention.

Embodiment 1

An embodiment of the present invention provides an expression recognition method in an instant video. As shown in FIG. 4, the method includes:

401. Acquire a feature vector corresponding to at least one feature point of a face in an instant video frame, where the feature point is used to describe a current expression of the face.

The feature vector includes feature point coordinates and texture feature point coordinates under a standard pose matrix, and the texture feature points are used to uniquely determine feature points.

Specifically, acquiring the feature vector corresponding to the at least one feature point of the face in the instant video frame includes:

Obtaining at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix;

It should be noted that the process of obtaining at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix may be:

Obtaining at least one feature point coordinate of the face in the instant video frame and at least one texture feature point coordinate;

The at least one feature point is normalized, and at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix are obtained.

It should be noted that the process of normalizing at least one feature point and acquiring at least one texture feature point coordinate of at least one feature point under the standard pose matrix may be:

Obtaining, according to at least one feature point coordinate of the face and at least one texture feature point coordinate of the face in the instant video frame, acquiring at least one feature point of the face in the instant video frame and a current pose matrix corresponding to the at least one texture feature point;

Rotating the current pose matrix into a standard pose matrix, and acquiring at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix.

After normalizing at least one feature point and acquiring at least one texture feature point coordinate of at least one feature point under the standard pose matrix, performing the following steps:

Generating a feature vector corresponding to the at least one feature point according to at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix.

402. Identify a feature vector corresponding to the at least one feature point, and generate a recognition result.

Specifically, the feature vector corresponding to the at least one feature point is input into the preset expression model library for calculation, and the recognition result is obtained.

403. Determine, according to the recognition result, that the current expression is one of a plurality of expressions stored in advance.

Specifically, if the recognition result is within the preset range, it is determined that the expression corresponding to the feature vector is one of a plurality of pre-stored expressions.

Embodiments of the present invention provide an expression recognition method and an electronic device in an instant video. By acquiring feature points for describing the current expression of the face in the instant video, the feature vector corresponding to the feature point is obtained by the feature point, and the current expression of the face is more accurately represented, and then the feature is recognized. The vector obtains the recognition result according to the feature vector, which simplifies the complexity of the algorithm for recognizing the face in the instant video, so that the method provided by the embodiment of the present invention can be run on the mobile terminal, satisfies the diversified needs of the user, and improves the vector. user experience.

Embodiment 2

An embodiment of the present invention provides an expression recognition method in an instant video. Referring to FIG. 5, the method flow includes:

501. Obtain at least one feature point coordinate of the face in the instant video frame and at least one texture feature point coordinate.

Specifically, the at least one feature point is used to describe a current expression of a face in an instant video.

Since the expression of the face is determined by the face detail, the at least one feature point is used to describe the outline of the face detail, and the face detail includes at least the eyes, the mouth, the eyebrows, and the nose. The manner in which the face feature points are obtained is not limited in the embodiment of the present invention.

Obtaining a feature parameter for describing the feature point according to the acquired feature point of the face, the feature parameter may include coordinates of the feature point in a vector including at least a face face, and may further include the feature point including at least a face of the face The scale and direction of the vector indicated in the section.

Obtaining, according to the acquired feature point parameter, coordinates of the feature point at least including a face of the face.

A texture feature point is acquired near each feature point, and the texture feature point is used to uniquely determine the feature point, and the texture feature point does not change with changes in light, angle, and the like.

It is worth noting that feature points and texture feature points can be extracted from the face by a preset extraction model or an extraction algorithm. In addition, feature points and textures can be extracted from the face by other means. Feature points, the specific extraction model, the extraction algorithm, and the extraction method are not limited in the embodiment of the present invention.

Since the texture feature point describes the region where the feature point is located, the texture feature point can be used to uniquely determine the feature point, so that the face detail is determined according to the feature point and the texture feature point, and the feature point in the instant video is guaranteed to be the same as the actual feature point. A position ensures the recognition quality of the image details, thereby improving the reliability of the expression recognition.

502. Acquire at least one feature point of the face in the instant video frame and a current pose matrix corresponding to the at least one texture feature point according to the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate.

Specifically, the attitude matrix is used to indicate the scale and direction of the vector indicated by the three-dimensional coordinates of the feature point and the feature texture point corresponding to the feature point.

The process can be:

a normalizing at least one feature point and at least one texture feature point to obtain at least one feature point of the face in the instant video frame and a current pose matrix corresponding to the at least one texture feature point, the normalization process may be:

b. Obtain three-dimensional coordinates, scales, and directions corresponding to the at least one feature point and the texture feature points corresponding to each feature point.

Since the feature points acquired in the instant video picture and the coordinates corresponding to the texture feature points corresponding to each feature point are two-dimensional coordinates, the corresponding scale and direction are the scale and direction in the two-dimensional coordinates, so a preset conversion algorithm converts coordinates, scales and directions corresponding to the texture feature points corresponding to the at least one feature point and each feature point into two-dimensional coordinates, the at least one feature point and each feature in two-dimensional coordinates The coordinates, scales, and directions corresponding to the texture feature points corresponding to the points; the specific algorithm and the conversion mode are not limited in the embodiment of the present invention.

c. generating a current pose corresponding to all the feature points and the texture feature points corresponding to the feature points according to all feature points for describing the same detail and the scale and direction of the texture feature points corresponding to the feature points a matrix; wherein the current pose matrix is used to indicate the scale and direction of the vector indicated by the all feature points.

Optionally, step c can also be implemented in the following manner:

Generating a current pose matrix corresponding to the feature point and the texture feature point corresponding to the feature point according to a feature point and a scale and direction of the texture feature point corresponding to the feature point;

The current pose matrix is used to indicate the scale and direction of the vector indicated by the feature point;

The above steps are continued for the next feature point until the pose matrix corresponding to the feature point is generated.

Since each feature point is processed compared to all feature points describing the same detail, the distortion rate during image processing is improved, and the reliability of image processing is increased.

503. Rotate the current pose matrix into a standard pose matrix, and obtain at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix.

Specifically, the embodiment of the present invention does not limit the specific manner of rotating the current posture matrix into a standard posture matrix.

It should be noted that, in step 502 to step 503, at least one feature point is normalized to obtain at least one texture feature point coordinate of at least one feature point in the standard pose matrix, and in addition, other methods may be adopted. The specific manner is not limited in the embodiment of the present invention.

The embodiment of the present invention normalizes the at least one feature point coordinate of the face in the instant video and the at least one texture feature point coordinate, so that the acquired pose matrix is not affected by, for example, illumination changes and perspective changes, and Compared with the traditional expression recognition, the expression recognition in the instant video is not changed by the change of the attitude zoom, so that the expression recognition is more accurate.

It should be noted that the steps 501 to 503 are processes for acquiring at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix, and the process may be implemented in other manners, in the embodiment of the present invention. The specific method is not limited.

The obtained at least one feature point and the at least one texture feature point are acquired in the standard pose matrix, and the influence of external factors such as illumination and angle on the instant video face is excluded, so that the acquired feature point and the texture feature point are more comparable. Sex makes the expression in the total recognition of instant video more accurate.

504. Generate, according to at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix, a feature vector corresponding to the at least one feature point.

The orientation matrix indicates the direction and the scale of the feature point. Therefore, at least one feature point coordinate corresponding to the standard pose matrix and at least one texture feature point coordinate corresponding to the at least one feature point may be acquired according to the standard pose matrix.

The embodiment of the present invention does not limit the manner in which the feature vector corresponding to at least one feature point is generated according to at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix. set.

It is to be noted that the steps 501 to 504 are the process of acquiring the feature vector corresponding to the at least one feature point of the face in the instant video frame, and the process may be implemented in other manners. The way is not limited.

505. Enter a feature vector corresponding to the at least one feature point into a preset expression model library for calculation, and obtain a recognition result.

Specifically, the feature vector is input into a preset expression model corresponding to each expression for calculation.

The preset expression model can be a regression equation, which can be:

Where A is the regression coefficient, x is the feature vector, and y is the recognition result.

Y∈(0,1)

The result y is calculated according to the feature vector in the preset expression model corresponding to each expression, and the recognition result in the at least one preset expression model is obtained.

It is to be noted that the step is to identify the feature vector corresponding to the at least one feature point, and to generate the process of the recognition result. In addition, the process may be implemented in other manners, and the specific method is not limited in the embodiment of the present invention. .

By using the recognition result of the logistic regression equation to realize the recognition of facial expressions in the instant video, the computational complexity is reduced, and the recognition of the face is faster in the process of instant video, reducing the occupation of the system process, processing resources and storage resources. Occupied, improving the operating efficiency of the processor.

506. If the recognition result is within a preset range, determine that the expression corresponding to the feature vector is one of a plurality of pre-stored expressions.

The current expression is determined to be one of a plurality of pre-stored expressions according to the y value included in the recognition result of the feature vector in the preset expression model corresponding to each expression.

Specifically, if the difference between y and 1 is within a preset range, indicating that the facial expression in the instant video is the expression indicated by the preset expression model;

If the difference between y and 0 is within the preset range, it indicates that the facial expression in the instant video is not the preset The expression indicated by the expression model.

It is to be noted that the step 506 is a process for determining that the current expression is one of a plurality of pre-stored expressions according to the recognition result. In addition to the foregoing manner, the process may be implemented in other manners. The process is not limited.

Optionally, in addition to the foregoing process, after the step 506, the method process further includes:

507. Smoothing the instant video.

Specifically, the number n of frames for recognizing the expression is determined in the process of the instant video, and the sum of the scores of each of the acquired expressions in the n frames is calculated, and the highest sum of the scores is the recognized expression in the n frames.

Where n is an integer greater than or equal to 2.

Since the facial expression in the instant video is constantly changing, at least one recognition result is generated by identifying the facial expression in the two or more instant video frames, and then determining the instant video frame according to the at least one recognition result. In the facial expression, the recognition result is generated by recognizing the facial expression in one frame, and the facial expression in the instant video is determined according to the recognition result, and the recognition result is more accurate, and the expression recognition can be further improved. Reliability to improve the user experience.

Optionally, before the step 501, the method process further includes:

508. Establish an expression model corresponding to each expression.

Specifically, the models of each expression are respectively trained, and the preset expressions to be established are taken as positive samples, and the other preset expressions are used as negative samples, and the logistic regression equation indicated in step 505 is used for training. The process may be:

The expression to be trained is taken as a positive sample, and the other expressions are used as negative samples. When the input value is a positive sample, the output result y=1, and when the input value is a negative sample, the output result y=0;

Wherein, the parameter A acquisition process in the logistic regression equation can be:

The instant expressions of all the obtained users in the instant video are input into a preset optimization formula to generate a parameter A, and the preset optimization formula may be:

Where J(A) represents the parameter A, y _i is the predicted A value of the prediction function, and y _i ' is the true value of A.

It should be noted that, when performing the method described in steps 501 to 506, the recognition of the expression can be realized by the pre-established expression model, so that step 508 is not required to be performed each time step 501 to step 506 is performed.

Embodiments of the present invention provide an expression recognition method and an electronic device in an instant video. By acquiring feature points for describing the current expression of the face in the instant video, the feature vector corresponding to the feature point is obtained by the feature point, and the current expression of the face is more accurately represented, and then the feature vector is obtained according to the feature vector. Obtaining the recognition result simplifies the complexity of the algorithm for recognizing the face in the instant video, so that the method provided by the embodiment of the present invention can be run on the mobile terminal, satisfies the diversified needs of the user, and improves the user experience. In addition, since the texture feature point describes the region where the feature point is located, the texture feature point can be used to uniquely determine the feature point, so that the face detail is determined according to the feature point and the texture feature point, and the feature point and the actual feature point in the instant video are guaranteed. In the same position, the recognition quality of the image details is ensured, thereby improving the reliability of the expression recognition. In addition, since all the feature points are processed compared to all the feature points describing the same detail, the distortion rate in image processing is improved, and the reliability of image processing is increased. In addition, by performing normalization processing on acquiring at least one feature point coordinate of the face in the instant video and at least one texture feature point coordinate, the acquired pose matrix is not affected by, for example, illumination changes and viewing angle changes, and the like. Compared with the expression recognition, the expression recognition in the instant video is not changed by the change of the attitude zoom, so that the expression recognition is more accurate. In addition, the acquired at least one feature point and the at least one texture feature point are acquired in the standard pose matrix, and the influence of external factors such as illumination and angle on the instant video face is excluded, so that the acquired feature point and the texture feature point are more Comparable, making the expression of total recognition in real-time video more accurate. In addition, by using the calculation result of the logistic regression equation to identify the facial expression in the instant video, the computational complexity is reduced, and the recognition of the face is faster in the process of instant video, reducing the occupation of the system process, the occupation of processing resources and storage resources. , improve the operating efficiency of the processor.

Embodiment 3

An embodiment of the present invention provides an electronic device 6. Referring to FIG. 6, the electronic device 6 includes:

The obtaining module 61 is configured to acquire a feature corresponding to at least one feature point of the face in the instant video frame Vector, feature points are used to describe the current expression of the face;

The identification module 62 is configured to identify a feature vector corresponding to the at least one feature point, and generate a recognition result;

The determining module 63 is configured to determine, according to the recognition result, that the current expression is one of a plurality of expressions stored in advance.

Optional,

The obtaining module 61 is further configured to acquire at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix;

The identification module 62 is further configured to generate a feature vector corresponding to the at least one feature point according to the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.

Optional,

The obtaining module 61 is further configured to: acquire at least one feature point coordinate of the face in the instant video frame and at least one texture feature point coordinate;

The device further includes a processing module, configured to normalize the at least one feature point to obtain at least one feature point coordinate and at least one texture feature point coordinate in the standard pose matrix.

Optional,

The obtaining module 61 is further configured to acquire, according to at least one feature point coordinate of the face and at least one texture feature point coordinate of the face in the instant video frame, at least one feature point of the face in the instant video frame and a current pose corresponding to the at least one texture feature point. matrix;

The processing module is further configured to rotate the current pose matrix into a standard pose matrix, and acquire at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix.

Optionally, the electronic device 6 further includes:

The calculation module is configured to input the feature vector corresponding to the at least one feature point into the preset expression model library for calculation, and obtain the recognition result.

Optionally, the determining module 63 is specifically configured to:

An embodiment of the present invention provides an electronic device, which is obtained by using an instant video. Describe the feature points of the current expression of the face, so that the feature vector corresponding to the feature point is more accurately represented by the feature point, and then the recognition result is obtained by identifying the feature vector, and the recognition result is simplified according to the feature vector. The complexity of the algorithm for recognizing a face in the instant video enables the method provided by the embodiment of the present invention to be run on the mobile terminal, satisfies the diverse needs of the user, and improves the user experience.

Embodiment 4

The embodiment of the present invention provides an electronic device 7. Referring to FIG. 7, the electronic device 7 includes a video input module 71, a video output module 72, a sending module 73, a receiving module 74, a memory 75, and a video input module 71, and a video output. The module 72, the transmitting module 73, the receiving module 74 and the processor 75 are connected to the processor 76, wherein the memory 75 stores a set of program codes, and the processor 76 is configured to call the program code stored in the memory 75 to perform the following operations:

Obtaining a feature vector corresponding to at least one feature point of the face in the instant video frame, where the feature point is used to describe the current expression of the face;

Based on the recognition result, it is determined that the current expression is one of a plurality of expressions stored in advance.

Optionally, the processor 76 is configured to call the program code stored in the memory 75, and perform the following operations:

The at least one feature point is normalized to obtain at least one feature point coordinate and at least one texture feature point coordinate under the standard pose matrix.

Acquiring at least one feature point of the face in the instant video frame and at least one texture feature point according to at least one feature point coordinate of the face in the instant video frame and at least one texture feature point coordinate Front pose matrix

The feature vector corresponding to the at least one feature point is input into the preset expression model library for calculation, and the recognition result is obtained.

An embodiment of the present invention provides an electronic device that obtains a feature point for describing a current expression of a face in an instant video, so that the feature vector corresponding to the feature point is more accurately represented by the feature point. The current expression of the face, by identifying the feature vector, and obtaining the recognition result according to the feature vector, simplifies the complexity of the algorithm for recognizing the face in the instant video, so that the method provided by the embodiment of the present invention can be run on the mobile terminal, satisfying The diverse needs of users have improved the user experience.

It should be noted that, when performing the expression recognition method in the instant video, the electronic device provided by the foregoing embodiment is only illustrated by the division of each functional module. In actual applications, the functions may be assigned differently according to needs. The function module is completed, that is, the internal structure of the electronic device is divided into different functional modules to complete all or part of the functions described above. In addition, the electronic device and the method embodiment of the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.

A person skilled in the art can understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be executed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium. The storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

An expression recognition method in instant video, characterized in that the method comprises:

Obtaining, by the feature vector corresponding to the at least one feature point of the face in the instant video frame, the feature point is used to describe the current expression of the face;

Identifying a feature vector corresponding to the at least one feature point to generate a recognition result;

And determining, according to the recognition result, the current expression as one of a plurality of expressions stored in advance.
The method of claim 1 wherein said feature vector comprises feature point coordinates and texture feature point coordinates under a standard pose matrix, said texture feature points being used to uniquely determine said feature points.
The method according to claim 2, wherein the acquiring the feature vector corresponding to the at least one feature point of the face in the instant video frame comprises:

Obtaining the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix;

Generating a feature vector corresponding to the at least one feature point according to the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.
The method according to claim 3, wherein the acquiring the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix comprises:

Obtaining the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate;

And normalizing the at least one feature point to obtain the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.
The method according to claim 4, wherein the normalizing the at least one feature point to obtain the at least one texture feature point coordinate of the at least one feature point under the standard pose matrix include:

And according to the at least one feature point coordinate of the face in the instant video frame and the at least one pattern And acquiring, by the feature point coordinates, the at least one feature point of the face in the instant video frame and the current pose matrix corresponding to the at least one texture feature point;

Rotating the current pose matrix into a standard pose matrix, and acquiring the at least one feature point coordinate and the at least one texture feature point coordinate under the standard pose matrix.
The method according to claim 1, wherein the identifying the feature vector corresponding to the at least one feature point comprises:

And inputting a feature vector corresponding to the at least one feature point into a preset expression model library for calculation, and acquiring the recognition result.
An electronic device, comprising:

An acquiring module, configured to acquire a feature vector corresponding to at least one feature point of a face in an instant video frame, where the feature point is used to describe a current expression of the face;

An identification module, configured to identify a feature vector corresponding to the at least one feature point, and generate a recognition result;

And a determining module, configured to determine, according to the recognition result, the current expression as one of a plurality of pre-stored expressions.
The device according to claim 7, wherein

The acquiring module is further configured to acquire the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix;

The identification module is further configured to generate a feature vector corresponding to the at least one feature point according to the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.
The device according to claim 7, wherein

The acquiring module is further configured to acquire the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate;

The device further includes a processing module, configured to perform normalization processing on the at least one feature point to obtain the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.
The device according to claim 9, wherein

The acquiring module is further configured to acquire the at least one feature of the face in the instant video frame according to the at least one feature point coordinate of the face in the instant video frame and the at least one texture feature point coordinate a current pose matrix corresponding to the at least one texture feature point;

The processing module is further configured to rotate the current pose matrix into a standard pose matrix, and acquire the at least one feature point coordinate and the at least one texture feature point coordinate in the standard pose matrix.
The device according to claim 7, wherein the device further comprises:

And a calculation module, configured to input a feature vector corresponding to the at least one feature point into a preset expression model library for calculation, and obtain the recognition result.
An electronic device, comprising: a video input module, a video output module, a transmitting module, a receiving module, a memory, and the video input module, the video output module, the transmitting module, the receiving module, and The memory-connected processor, wherein the memory stores a set of program code, the processor is configured to invoke program code stored in the memory, and perform the following operations:

Obtaining, by the feature vector corresponding to the at least one feature point of the face in the instant video frame, the feature point is used to describe the current expression of the face;

Identifying a feature vector corresponding to the at least one feature point to generate a recognition result;

And determining, according to the recognition result, the current expression as one of a plurality of expressions stored in advance.