CN101908149A

CN101908149A - Method for identifying facial expressions from human face image sequence

Info

Publication number: CN101908149A
Application number: CN2010102185432A
Authority: CN
Inventors: 吕坤; 张欣
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2010-07-06
Filing date: 2010-07-06
Publication date: 2010-12-08

Abstract

The invention relates to a method for identifying facial expressions from a human face image sequence, belonging to the technical field of analyzing and identifying human facial expressions. The method of the invention comprises the following steps of: firstly, adopting a method for tracing feature points, sequentially extracting the displacement amount of the normalized facial key point and the length of the special geometrical characteristic for each frame image of the expression image sequence, and combining the data to form a characteristic column vector; secondly, sequentially arranging all characteristic column vectors of the sequence to form a characteristic matrix, wherein each characteristic matrix represents a facial expression image sequence; finally, comparing the similarities among the characteristic matrixes by using a canonical correlation analysis method, thereby determining the human face images to be identified into one of the basic expressions of happiness, sadness, fear, hate, surprise and anger. In the invention, the canonical correlation analysis method is successfully applied to identifying the human facial expressions, the dynamic information in the expression generation course is utilized effectively and the higher recognition rate and the shorter CPU computation time are acquired.

Description

Method for recognizing facial expression from human face image sequence

Technical Field

The invention relates to a method for identifying facial expressions from a facial image sequence, and belongs to the technical field of facial expression analysis and identification.

Background

With the rapid development of computer technology, automatic facial expression analysis and recognition technology will make facial expression a new channel for human-computer interaction, and make the interaction process more natural and effective. Facial expression analysis and recognition involves three basic problems: finding and positioning a human face in an image; how to extract effective expression features from the detected facial image or facial image sequence; and designing a proper classification method to identify the expression types. In recent years, there has been much research effort devoted to identifying facial expressions from image sequences: cohn et al propose an Optical flow-based method to identify Subtle changes in Facial Expression in the document Feature-Point Tracking by Optical flow discovery sub Differences Expression (Int' l conf. automatic Face and Gesture Recognition, pp.396-401 (1998)). Laj evardi et al, in the literature, "Facial expression recognition from image sequences using optimal temporal extraction selection" (IVCNZ 2008, New Zealand, Page(s): 1-6(2008)), disclose a method for recognizing Facial expressions using a Naive Bayes (NB) classifier through an optimized feature selection process. Sunkening et al, in the document LSVM algorithm for video sequence expression classification (computer aided design and graphics declaration, volume 21, phase 4 (2009)), disclose a method for extracting expression geometric features from video faces by using an Active Shape Model (ASM) based on point tracking, and classifying the expressions by using a Local Support Vector Machine (LSVM) classifier. The disadvantages of these methods are: the features are only extracted from the peak expression frame, and important time domain dynamic information contained in the expression generation process is ignored, so that the recognition accuracy is not high.

In addition, "Active and Dynamic Information Fusion for Facial expression from Image Sequences" (IEEE TRANSACTIONS ON PATTERNANALYSIS AND MACHINE INTELLIGENCE, VOL.27, NO.5, MAY 2005), "Artificial analysis of Facial expression (Image and Vision Computing 24(2006)605 and 614)," Facial expression from Image Sequences: although the methods proposed by the documents such as temporal and static modeling (Computer Vision and Image interpretation 91(2003)160-187) use the time domain dynamic information contained in the expression generation process, the methods are complex in calculation and high in calculation cost.

An important prior art used in the present invention is: typical correlation Analysis (CCA).

Typical correlation analysis is a classical tool in statistical analysis that can be used to measure linear relationships between two or more data sets. A typical correlation coefficient is defined as two d-dimensional linear subspaces L₁And L₂Main angle theta between_iCosine value of (d):

<math><mrow><msub><mrow><mi>cos</mi><mi>θ</mi></mrow><mi>i</mi></msub><mo>=</mo><munder><mi>max</mi><mrow><msub><mi>u</mi><mi>i</mi></msub><mo>&Element;</mo><msub><mi>L</mi><mn>1</mn></msub><mo>,</mo><msub><mi>v</mi><mi>i</mi></msub><mo>&Element;</mo><msub><mi>L</mi><mn>2</mn></msub></mrow></munder><msup><msub><mi>u</mi><mi>i</mi></msub><mi>T</mi></msup><msub><mi>v</mi><mi>i</mi></msub><mrow><mo>(</mo><mn>1</mn><mo>≤</mo><mi>i</mi><mo>≤</mo><mi>d</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>

wherein:

<math><mrow><msup><msub><mi>u</mi><mi>i</mi></msub><mi>T</mi></msup><msub><mi>u</mi><mi>i</mi></msub><mo>=</mo><msup><msub><mi>v</mi><mi>i</mi></msub><mi>T</mi></msup><msub><mi>v</mi><mi>i</mi></msub><mo>=</mo><mn>1</mn><mo>,</mo><msup><msub><mi>u</mi><mi>i</mi></msub><mi>T</mi></msup><msub><mi>u</mi><mi>j</mi></msub><mo>=</mo><msup><msub><mi>v</mi><mi>i</mi></msub><mi>T</mi></msup><msub><mi>v</mi><mi>j</mi></msub><mo>=</mo><mn>0</mn><mrow><mo>(</mo><mi>i</mi><mo>&NotEqual;</mo><mi>j</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>

the parameter d represents the linear subspace dimension.

In recent years, the typical correlation analysis technology has been successfully applied to the fields of image set matching, face or object recognition and the like, so that the use of the typical correlation analysis technology for solving the expression recognition problem is a simple and effective method in theory. However, in the expression recognition problem, the facial images of different expressions of the same person are not very different, and even the images of two opposite expressions are not very different, so that a good effect cannot be obtained by simply applying the typical correlation analysis technology to the expression recognition. So far, no relevant documents and practical applications for using typical correlation analysis techniques in facial expression recognition have been found.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for recognizing facial expressions from a human face image sequence. The invention uses a facial feature point tracking method to sequentially extract normalized displacement of facial key points and the length of specific geometric features for each frame of image in an expression image sequence, and the data are combined into a feature column vector; all the characteristic column vectors in the sequence are arranged in sequence to form a characteristic matrix, and each characteristic matrix represents a facial expression image sequence; then, the similarity between the feature matrixes is compared by using a typical correlation analysis method, so that the face image to be recognized is determined to be one of six basic expressions (happiness, sadness, fear, disgust, surprise and anger).

The purpose of the invention is realized by the following technical scheme.

A method for recognizing facial expressions from a facial image sequence comprises the following specific operation steps:

step one, selecting an image sequence

Selecting image sequences representing six basic expressions such as happiness, sadness, fear, disgust, surprise and anger from a facial expression database, wherein the number of the image sequences of each basic expression is more than 20; and selecting m (m is more than or equal to 10 and m is a positive integer) frame images from each expression image sequence, wherein each expression image sequence starts from a neutral expression image and ends at a peak expression image.

Step two, identifying facial feature points

On the basis of the first step, identifying facial feature points; the method specifically comprises the following steps:

step 1: sequentially identifying 20 facial feature points in a first frame image in each expression image sequence; the 1 st characteristic point and the 2 nd characteristic point are respectively positioned at the eyebrow positions of the right eyebrow and the left eyebrow; the 3 rd and 4 th characteristic points are respectively positioned at the eyebrow tail positions of the right eyebrow and the left eyebrow; the 5 th and 6 th characteristic points are respectively positioned at the inner canthus positions of the right eye and the left eye; the 7 th and 8 th characteristic points are respectively positioned at the lowest points of the right eye and the left eye; the 9 th and 10 th characteristic points are respectively positioned at the external canthus positions of the right eye and the left eye; the 11 th and 12 th characteristic points are respectively positioned at the highest points of the right eye and the left eye; the 13 th characteristic point and the 14 th characteristic point are respectively positioned at the rightmost position of the nasal wing and the leftmost position of the nasal wing; the 15 th characteristic point is positioned at the tip of the nose; the 16 th characteristic point and the 17 th characteristic point are respectively positioned at the rightmost position of the mouth corner and the leftmost position of the mouth corner; the 18 th and 19 th characteristic points are respectively positioned at the highest point and the lowest point of the intersection of the lip center line and the lip contour line; the 20 th feature point is located at the lowest point where the face center line intersects the face contour line.

The method for identifying 20 facial feature points includes but is not limited to: manual marking; secondly, the automatic positioning of 20 face feature points is realized by adopting a Gabor feature-based enhanced classifier method provided by Vukdadinovic et al in the document ' full automatic facial feature point detecting and detecting device and devices ' (Proc. IEEE ' l Conf. on Systems, Man and Cybernetics, pp.1692-1698 (2005)).

Step 2: calculating the positions of the 21 st and 22 nd feature points in the first frame image in each expression image sequence according to the statistical data of the spatial relationship between the eyes and the cheeks, the nose and the cheeks of the human being proposed by Farkas in the Anthropometry of the Head and Face (New York: Raven Press (1994)); the 21 st and 22 nd feature points are located at the position of the cheekbones of the right and left cheeks, respectively.

And 3, step 3: the Particle filter tracking method based on likelihood function factorization proposed in Patras et al, document "Particle filtering with factitious after faces for tracking facial features" (Proc. int' l Conf. automatic face & Gesture Recognition, pp.97-102(2004)) is adopted to track 22 facial feature points in the subsequent frame images in each expression image sequence according to the positions of 22 feature points in the first frame image in each expression image sequence.

And 4, step 4: and (3) adjusting the position and the size of the face in the image by adopting an affine transformation method, so that the face in the same image sequence has the same size and the same position. The method specifically comprises the following steps:

firstly, keeping a connecting line of two inner eye corner points in a first frame of image in each image sequence horizontal; then, according to the positions of 3 points, namely two inner eye corner points in the first frame of image in the image sequence and the uppermost point in the person, mapping and normalizing 22 face feature points in the rest frames; after affine transformation processing, the human faces of all the images in the same image sequence are equal in size, and the positions of 3 points, namely the two inner eye corner points and the uppermost point in the human, are consistent with the positions of the 3 points in the first frame.

Step three, extracting facial expression characteristics

On the basis of the operation of the second step, facial expression characteristics are extracted from each image in sequence; the method specifically comprises the following steps:

step 1: establishing an Xb-Yb coordinate system, wherein the coordinate system takes the lower left corner of each image as an origin, the horizontal right direction is an Xb axis, and the vertical upward direction is an Yb axis; obtaining the coordinate values (Xb) of 22 characteristic points based on an Xb-Yb coordinate system on each image in sequence according to the positions of the pixels where the 22 characteristic points are located on each image_i，yb_i) And the coordinate value (x) of the uppermost point in the human face of each image_origin，y_origin) Wherein i is 1-22, and i is a positive integer;

step 2: establishing an X-Y coordinate system, wherein the coordinate system takes the uppermost point in the human face of each image as an original point, the horizontal right direction is an X axis, and the vertical upward direction is a Y axis; coordinate values (X) of 22 feature points based on the X-Y coordinate system on each image are obtained by formula 1 and formula 2_i，y_i)；

x_i＝xb_i-x_origin (1)

y_i＝yb_i-y_origin (2)

And 3, step 3: the abscissa displacement amounts of 22 feature points of each image are obtained by formula 3 and formula 4

Displacement of ordinate

<math><mrow><msubsup><mi>Δx</mi><mi>i</mi><mo>′</mo></msubsup><mo>=</mo><msub><mi>x</mi><mi>i</mi></msub><mo>-</mo><msub><mover><mi>x</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math>

<math><mrow><msubsup><mi>Δy</mi><mi>i</mi><mo>′</mo></msubsup><mo>=</mo><msub><mi>y</mi><mi>i</mi></msub><mo>-</mo><msub><mover><mi>y</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>4</mn><mo>)</mo></mrow></mrow></math>

Wherein,respectively the abscissa value and the ordinate value of the corresponding feature point of the first frame image in the image sequence in which the image is located.

And 4, step 4: obtaining the normalized abscissa displacement quantity delta x of the 22 characteristic points of each image through a formula 5 and a formula 6 in sequence_iThe longitudinal coordinate displacement amount Deltay_i；

Wherein x is_base＝x₆-x₅，y_base＝y₆-y₅；x₅And x₆Respectively the abscissa values of the 5 th and 6 th feature points of the image; y is₅And y₆The ordinate values of the 5 th and 6 th feature points of the image are respectively.

And 5, step 5: obtaining the 10 geometric distance features mfv in each image₁～mfv₁₀(ii) a The method specifically comprises the following steps:

obtaining an eye openness value mfv according to equation 7₁：

mfv₁＝((y₁₁-y₇)+(y₁₂-y₈))/2 (7)

Wherein, y₇、y₈、y₁₁、y₁₂The ordinate values of the 7 th, 8 th, 11 th, and 12 th feature points of the image are shown.

Obtaining an eye width value mfv according to equation 8₂：

mfv₂＝((x₅-x₉)+(x₁₀-x₆))/2 (8)

Wherein x is₅、x₆、x₉、x₁₀The abscissa values of the 5 th, 6 th, 9 th, and 10 th feature points of the image are shown.

Obtaining a eyebrow height value mfv according to equation 9₃：

mfv₃＝(y₁+y₂)/2 (9)

Wherein, y₁、y₂The vertical coordinate values of the 1 st and 2 nd feature points of the image are respectively.

Obtaining the brow-tail height value mfv according to equation 10₄：

mfv₄＝(y₃+y₄)/2 (10)

Wherein, y₃、y₄The ordinate values of the 3 rd and 4 th feature points of the image are respectively.

Obtaining the eyebrow width value mfv according to equation 11₅：

mfv₅＝((x₁-x₃)+(x₄-x₂))/2 (11)

Wherein x is₁、x₂、x₃、x₄The abscissa values of the 1 st, 2 nd, 3 th and 4 th feature points of the image are shown.

The mouth opening degree value mfv is obtained according to equation 12₆：

mfv₆＝y₁₈-y₁₉ (12)

Wherein, y₁₈、y₁₉The ordinate values of the 18 th and 19 th feature points of the image are shown.

The mouth width value mfv is obtained according to equation 13₇：

mfv₇＝x₁₇-x₁₆ (13)

Wherein x is₁₆、x₁₇The abscissa values of the 16 th and 17 th feature points of the image are shown, respectively.

Obtaining a nose tip-mouth angle distance value mfv according to equation 14₈：

mfv₈＝((y₁₅-y₁₆)+(y₁₅-y₁₇))/2 (14)

Wherein, y₁₅、y₁₆、y₁₇The ordinate values of the 15 th, 16 th and 17 th feature points of the image are shown.

Obtaining an eye-cheek distance value mfv according to equation 15₉：

mfv₉＝(((y₁₁+y₇)/2-y₂₁)+((y₁₂+y₈)/2-y₂₂))/2 (15)

Wherein, y₂₁、y₂₂The ordinate values of the 21 st and 22 nd feature points of the image are respectively.

Nasal tip-chin distance value mfv obtained according to equation 16₁₀：

mfv₁₀＝y₁₅-y₂₀ (16)

Wherein, y₁₅、y₂₀The ordinate values of the 15 th and 20 th feature points of the image are respectively.

And 6, step 6: the 10 geometric distance features mfv in each image are shown by equation 17₁～mfv₁₀Carrying out normalization processing;

<math><mrow><msub><mi>mfv</mi><mi>j</mi></msub><mo>=</mo><msub><mi>mfv</mi><mi>j</mi></msub><mo>/</mo><msub><mover><mi>mfv</mi><mo>&OverBar;</mo></mover><mi>j</mi></msub><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>17</mn><mo>)</mo></mrow></mrow></math>

wherein j is 1-10, and j is a positive integer;

and the geometric distance characteristic corresponding to the image in the first frame of image in the image sequence is obtained.

And 7, step 7: using 10 geometric distance features mfv of each image respectively₁～mfv₁₀And the abscissa displacement delta x obtained after normalization processing of 22 characteristic points of each image_iThe longitudinal coordinate displacement amount Deltay_iForming a 54-dimensional column vector z representing expression information of a facial image_k(ii) a Wherein: z is a radical of_k∈R⁵⁴K is more than or equal to 1 and less than or equal to m, and R represents a real number);

and 8, step 8: using the feature matrix Z ═ Z₁，z₂，...，z_m}∈R^54×mRepresenting a sequence of emoticons; wherein z is₁Denotes neutral expression, z_mIndicating a peak expression.

Step four, classifying the trial image sequence to be detected;

on the basis of the third step, classifying the image sequence to be tested by adopting typical correlation analysis; the method specifically comprises the following steps:

step 1: randomly dividing the image sequence of each basic expression selected in the step one into 2 parts, wherein one part is used as training data, and the other part is used as test data; the number of the training data is Q, Q is more than or equal to 20, and Q is a positive integer; one piece of training data is an expression image sequence; a piece of test data is also a sequence of expression images.

Step 2: the training data obtained in step 1 is processed by a typical correlation discriminant Analysis method proposed by T.K. Kim et al in the literature, "characterization Learning and correlation of Image Set Classes using canonical Correlations" (IEEE Transactions On Pattern Analysis and machine Analysis, Vol.29, No.6(2007)), to obtain a transformation matrix T ∈ R^54×nN is less than 54, and n is a positive integer; and then, converting a feature matrix Z of the images (including training data and test data) in all the expression image sequences selected in the step one by using a transformation matrix T to obtain Z' ═ T^TZ。

And 3, step 3: randomly selecting an expression image sequence from the test data in the step 1, and calculating the typical correlation coefficient sum of the feature matrix Z 'of the expression image sequence and the feature matrix Z' of each training data.

And 4, step 4: on the basis of the result of the step 3, respectively calculating the average value of the sum of the typical correlation coefficients of the expression image sequence and each basic expression; and selecting the expression corresponding to the minimum value in the 6 average values as a classification result.

Through the steps, the expression recognition of the image sequence to be tested can be completed.

Advantageous effects

Compared with the existing recognition method, the method for recognizing the facial expression from the facial image sequence successfully applies the typical correlation analysis method to the facial expression recognition, effectively utilizes the dynamic information in the expression generation process, and obtains higher recognition rate and shorter CPU operation time.

Drawings

FIG. 1 is a diagram of 7 images in a 15 image sequence according to an embodiment of the present invention;

FIG. 2 is a diagram of feature points in a face image and Xb-Yb and X-Y coordinate systems according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the general structural framework of the method of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

In this embodiment, a Cohn-Kanade facial expression database is used to select 212 image sequences of 50 persons representing six basic expressions of happiness, sadness, fear, disgust, surprise, anger, and the like. 15 frames of images are selected from each expression image sequence, starting from a neutral expression image and ending at a peak expression image. Fig. 1 shows 7 frames of a 15-frame image sequence. The 22 feature points in each face image, and the Xb-Yb coordinate system and the X-Y coordinate system are shown in FIG. 2. And selecting 35 personal image sequences as a training set and the rest image sequences as a test set so as to ensure the facial expression classification effect independent of individuals. Each test was performed 5 times using a randomly selected test and training set, and the average result was calculated. In the learning and classification process, the parameter d representing the dimension of the linear subspace is set to 10, and the parameter n representing the dimension of the transformation matrix T is set to 20.

The general structural framework of the process of the invention is schematically illustrated in figure 3. The confusion matrix of the recognition result is shown in table 1 by applying the method of the present invention. The diagonal elements of the matrix are the percentage of facial expressions that are correctly classified and the non-diagonal elements correspond to the percentage of misclassifications. The average accuracy of the method herein is over 90%.

TABLE 1 confusion matrix (%) -of the results identified by the method herein

	Happiness	Sadness and sorrow	Fear of	Aversion to	Is surprised	Generating qi
							Happiness	95.2	0	2.6	0	2.2	0
Sadness and sorrow	0	92.5	0	2.8	0	4.7
							Fear of	8.8	0	87.2	0	0	4.0
Aversion to	0	1.5	7.4	91.1	0	0
							Is surprised	5.2	0	1.2	0	90.5	3.1
Generating qi	0	10.3	0	4.2	0	85.5

To illustrate the effects of the present invention, the same data were used to perform experiments using the optimization feature selection method and the LSVM method, respectively, and the results are shown in table 2.

TABLE 2 comparison of the recognition rates (%)

	Happiness	Sadness and sorrow	Fear of	Aversion to	Is surprised	Generating qi	Average rate of accuracy
								Methods of the invention	95.2	92.5	87.2	91.1	90.5	85.5	90.3
Optimizing feature selection	76.3	67.6	55	100	78	100	79.5
								LSVM	91.3	86.4	89.6	88.3	92.5	86.5	89.1

Tests show that the method has higher accuracy, and the operation steps of the method are simple and easy to implement.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications may be made or equivalents may be substituted for some of the features thereof without departing from the scope of the present invention, and such modifications and substitutions should also be considered as the protection scope of the present invention.

Claims

1. A method of recognizing facial expressions from a sequence of facial images, characterized by: the specific operation steps are as follows:

step one, selecting an image sequence

Selecting image sequences representing six basic expressions such as happiness, sadness, fear, disgust, surprise and anger from a facial expression database, wherein the number of the image sequences of each basic expression is more than 20; selecting m frames of images from each expression image sequence, wherein: m is more than or equal to 10 and is a positive integer; each expression image sequence starts from the neutral expression image and ends from the peak expression image;

step two, identifying facial feature points

step 1: sequentially identifying 20 facial feature points in a first frame image in each expression image sequence; the 1 st characteristic point and the 2 nd characteristic point are respectively positioned at the eyebrow positions of the right eyebrow and the left eyebrow; the 3 rd and 4 th characteristic points are respectively positioned at the eyebrow tail positions of the right eyebrow and the left eyebrow; the 5 th and 6 th characteristic points are respectively positioned at the inner canthus positions of the right eye and the left eye; the 7 th and 8 th characteristic points are respectively positioned at the lowest points of the right eye and the left eye; the 9 th and 10 th characteristic points are respectively positioned at the external canthus positions of the right eye and the left eye; the 11 th and 12 th characteristic points are respectively positioned at the highest points of the right eye and the left eye; the 13 th characteristic point and the 14 th characteristic point are respectively positioned at the rightmost position of the nasal wing and the leftmost position of the nasal wing; the 15 th characteristic point is positioned at the tip of the nose; the 16 th characteristic point and the 17 th characteristic point are respectively positioned at the rightmost position of the mouth corner and the leftmost position of the mouth corner; the 18 th and 19 th characteristic points are respectively positioned at the highest point and the lowest point of the intersection of the lip center line and the lip contour line; the 20 th characteristic point is positioned at the lowest point of the intersection of the face central line and the face contour line;

step 2: calculating the positions of the 21 st and 22 nd feature points in the first frame images in each expression image sequence according to the statistical data of the spatial relationship between the eyes and the cheeks, the nose and the cheeks of the human, which is proposed by Farkas in the Anthropomology of the Head and Face; the 21 st and 22 nd characteristic points are respectively positioned at the cheekbone positions of the right cheek and the left cheek;

and 3, step 3: tracking 22 facial feature points in a subsequent frame image in each expression image sequence according to the positions of 22 feature points in a first frame image in each expression image sequence by adopting a Particle filtering tracking method based on likelihood function factorization, which is proposed by Patras et al in the document of Particle filtering with factitious pixels for tracking facial features;

and 4, step 4: adopting an affine transformation method to adjust the position and the size of the face in the image, so that the face in the same image sequence has the same size and the positions are kept consistent;

step three, extracting facial expression characteristics

x_i＝xb_i-x_origin (1)

y_i＝yb_i-y_origin (2)

Displacement of ordinate

Wherein,

respectively the abscissa value and the ordinate value of the corresponding characteristic point of the first frame image in the image sequence where the image is located;

Wherein x is_base＝x₆-x₅，y_base＝y₆-y₅；x₅And x₆Respectively the abscissa values of the 5 th and 6 th feature points of the image; y is₅And y₆The longitudinal coordinate values of the 5 th and 6 th characteristic points of the image are respectively;

and 5, step 5: to obtain10 geometric distance features mfv in each image₁～mfv₁₀(ii) a The method specifically comprises the following steps:

obtaining an eye openness value mfv according to equation 7₁：

mfv₁＝((y₁₁-y₇)+(y₁₂-y₈))/2 (7)

Wherein, y₇、y₈、y₁₁、y₁₂The ordinate values of the 7 th, 8 th, 11 th and 12 th feature points of the image respectively;

obtaining an eye width value mfv according to equation 8₂：

mfv₂＝((x₅-x₉)+(x₁₀-x₆))/2 (8)

Wherein x is₅、x₆、x₉、x₁₀Respectively the abscissa values of the 5 th, 6 th, 9 th and 10 th feature points of the image;

obtaining a eyebrow height value mfv according to equation 9₃：

mfv₃＝(y₁+y₂)/2 (9)

Wherein, y₁、y₂The longitudinal coordinate values of the 1 st and 2 nd characteristic points of the image are respectively;

obtaining the brow-tail height value mfv according to equation 10₄：

mfv₄＝(y₃+y₄)/2 (10)

Wherein, y₃、y₄Longitudinal coordinate values of the 3 rd and 4 th characteristic points of the image respectively;

obtaining the eyebrow width value mfv according to equation 11₅：

mfv₅＝((x₁-x₃)+(x₄-x₂))/2 (11)

Wherein x is₁、x₂、x₃、x₄Respectively the abscissa values of the 1 st, 2 nd, 3 th and 4 th feature points of the image;

the mouth opening degree value mfv is obtained according to equation 12₆：

mfv₆＝y₁₈-y₁₉ (12)

Wherein, y₁₈、y₁₉Longitudinal coordinate values of 18 th and 19 th characteristic points of the image respectively;

the mouth width value mfv is obtained according to equation 13₇：

mfv₇＝x₁₇-x₁₆ (13)

Wherein x is₁₆、x₁₇Respectively the abscissa values of the 16 th and 17 th feature points of the image;

mfv₈＝((y₁₅-y₁₆)+(y₁₅-y₁₇))/2 (14)

Wherein, y₁₅、y₁₆、y₁₇The ordinate values of the 15 th, 16 th and 17 th feature points of the image are respectively;

obtaining an eye-cheek distance value mfv according to equation 15₉；

mfv₉＝(((y₁₁+y₇)/2-y₂₁)+((y₁₂+y₈)/2-y₂₂))/2 (15)

Wherein, y₂₁、y₂₂Longitudinal coordinate values of 21 st and 22 th characteristic points of the image respectively;

nasal tip-chin distance value mfv obtained according to equation 16₁₀：

mfv₁₀＝y₁₅-y₂₀ (16)

Wherein, y₁₅、y₂₀Longitudinal coordinate values of 15 th and 20 th characteristic points of the image respectively;

wherein j is 1-10, and j is a positive integer;

the image is the corresponding geometric distance characteristic in the first frame image in the image sequence where the image is located;

and 8, step 8: using the feature matrix Z ═ Z₁，z₂，...，z_m}∈R^54×mRepresenting a sequence of emoticons; wherein z is₁Denotes neutral expression, z_mRepresents a peak expression;

step four, classifying the trial image sequence to be tested

step 1: randomly dividing the image sequence of each basic expression selected in the step one into 2 parts, wherein one part is used as training data, and the other part is used as test data; the number of the training data is Q, Q is more than or equal to 20, and Q is a positive integer; one piece of training data is an expression image sequence; one piece of test data is also an expression image sequence;

step 2: aiming at the training data obtained in the step 1, a typical correlation discriminant analysis method provided by T.K. Kim et al in the literature, "discrete Learning and Recognition of Image Set Classes using canoni cal Correlations" is adopted for processing to obtain a transformation matrix T ∈ R^54×n，n＜54，And n is a positive integer; and then, converting the feature matrix Z of the images in all the expression image sequences selected in the step one by using a transformation matrix T to obtain Z' ═ T^TZ；

And 3, step 3: randomly selecting an expression image sequence from the test data in the step 1, and calculating the sum of typical correlation coefficients of a feature matrix Z 'of the expression image sequence and a feature matrix Z' of each training data;

and 4, step 4: on the basis of the result of the step 3, respectively calculating the average value of the sum of the typical correlation coefficients of the expression image sequence and each basic expression; selecting the expression corresponding to the minimum value in the 6 average values as a classification result;

2. A method of recognizing facial expressions from a sequence of facial images as claimed in claim 1, wherein: step two, step 1, the method for identifying 20 facial feature points in the first frame image in each expression image sequence includes but is not limited to: manual marking; secondly, a Gabor feature-based enhanced classifier method proposed by Vukdadinovic et al in the document 'full automatic facial feature point detection using Gabor feature based classifiers' is adopted to realize the automatic positioning of 20 facial feature points.

3. A method of recognizing facial expressions from a sequence of facial images as claimed in claim 1, wherein: step two and step 4, the position and the size of the face in the image are adjusted by adopting an affine transformation method, so that the face in the same image sequence has the same size and the positions are kept consistent, specifically: