CN107832746A

CN107832746A - Expression recognition method and device

Info

Publication number: CN107832746A
Application number: CN201711249535.2A
Authority: CN
Inventors: 杨松
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-12-01
Filing date: 2017-12-01
Publication date: 2018-03-23

Abstract

The disclosure is directed to a kind of expression recognition method and device.This method includes：Obtain the multiple image that video includes；Feature extraction is carried out respectively to each two field picture, obtains face feature vector corresponding to each two field picture；Face feature vector corresponding to each two field picture is synthesized, obtains face feature vector corresponding to the video；Expression Recognition is carried out according to face feature vector corresponding to the video.The expression recognition method and device of the disclosure, Expression Recognition can be carried out based on the multiple image that video includes, improve the accuracy rate of Expression Recognition.

Description

Expression recognition method and device

Technical field

This disclosure relates to technical field of image processing, more particularly to expression recognition method and device.

Background technology

At present, human facial expressions can be generally divided into 7 classes, angry, sadness, detests, fears, being taken aback, be glad and normal. In correlation technique, the identification that single picture carries out human facial expressions is typically based on.But human facial expressions are often a company The action passed through, the problem of accuracy rate may being caused relatively low is identified based on single picture.

The content of the invention

To overcome problem present in correlation technique, the disclosure provides a kind of expression recognition method and device.

According to the first aspect of the embodiment of the present disclosure, there is provided a kind of expression recognition method, including：

Obtain the multiple image that video includes；

Feature extraction is carried out respectively to each two field picture, obtains face feature vector corresponding to each two field picture；

Face feature vector corresponding to each two field picture is synthesized, obtain face characteristic corresponding to the video to Amount；

Expression Recognition is carried out according to face feature vector corresponding to the video.

In a kind of possible implementation, feature extraction is carried out respectively to each two field picture, obtains each two field picture pair The face feature vector answered, including：

Obtain the human face region in image；

Feature extraction is carried out to the human face region, obtains the global feature vector of the human face region；

Feature extraction is carried out to the human face region, obtains multiple local feature vectors of the human face region；

According to the multiple local feature vectors of global feature vector sum of the human face region, it is corresponding to obtain described image Face feature vector.

In a kind of possible implementation, feature extraction is carried out to the human face region, obtains the human face region Multiple local feature vectors, including：

Determine the key point position in the human face region；

Multiple regional areas are intercepted from the human face region according to the key point position；

Feature extraction is carried out to the multiple regional area respectively, obtains the multiple local feature vectors.

It is the multiple local special according to the global feature vector sum of the human face region in a kind of possible implementation Sign vector, obtains face feature vector corresponding to described image, including：

The multiple local feature vectors of global feature vector sum of the human face region are spliced, after obtaining splicing Characteristic vector；

Dimensionality reduction is carried out to the spliced characteristic vector, obtains face feature vector corresponding to described image.

In a kind of possible implementation, face feature vector corresponding to each two field picture is synthesized, obtained Face feature vector corresponding to the video, including：

Face feature vector corresponding to each two field picture is synthesized by Recognition with Recurrent Neural Network, obtains the video Corresponding face feature vector.

According to the second aspect of the embodiment of the present disclosure, there is provided a kind of expression recognition apparatus, including：

Acquisition module, the multiple image included for obtaining video；

Extraction module, for carrying out feature extraction respectively to each two field picture, it is special to obtain face corresponding to each two field picture Sign vector；

Synthesis module, for being synthesized to face feature vector corresponding to each two field picture, obtain the video pair The face feature vector answered；

Identification module, Expression Recognition is carried out for the face feature vector according to corresponding to the video.

In a kind of possible implementation, the extraction module includes：

Acquisition submodule, for obtaining the human face region in image；

First extracting sub-module, for carrying out feature extraction to the human face region, obtain the entirety of the human face region Characteristic vector；

Second extracting sub-module, for carrying out feature extraction to the human face region, obtain the multiple of the human face region Local feature vectors；

Submodule is handled, for the multiple local feature vectors of global feature vector sum according to the human face region, Obtain face feature vector corresponding to described image.

In a kind of possible implementation, second extracting sub-module includes：

Determination sub-module, for determining the key point position in the human face region；

Submodule is intercepted, for intercepting multiple regional areas from the human face region according to the key point position；

Extracting sub-module, for carrying out feature extraction to the multiple regional area respectively, obtain the multiple local special Sign vector.

In a kind of possible implementation, the processing submodule includes：

Splice submodule, for the multiple local feature vectors of the global feature vector sum of the human face region to be carried out Splicing, obtains spliced characteristic vector；

Dimensionality reduction submodule, for carrying out dimensionality reduction to the spliced characteristic vector, obtain face corresponding to described image Characteristic vector.

In a kind of possible implementation, the synthesis module is used for：

According to the third aspect of the embodiment of the present disclosure, there is provided a kind of expression recognition apparatus, including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as performing above-mentioned method.

According to the fourth aspect of the embodiment of the present disclosure, there is provided a kind of non-volatile computer readable storage medium storing program for executing, deposit thereon Computer program instructions are contained, the computer program instructions realize above-mentioned method when being executed by processor.

The technical scheme provided by this disclosed embodiment can include the following benefits：The expression recognition method of the disclosure And device, the multiple image included by obtaining video, feature extraction is carried out respectively to each two field picture, obtains each two field picture pair The face feature vector answered, face feature vector corresponding to each two field picture is synthesized, it is special to obtain face corresponding to the video Sign vector, Expression Recognition is carried out according to face feature vector corresponding to the video, thus, it is possible to the multiframe included based on video Image carries out Expression Recognition, improves the accuracy rate of Expression Recognition.

It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not The disclosure can be limited.

Brief description of the drawings

Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and be used to together with specification to explain the principle of the disclosure.

Fig. 1 is a kind of flow chart of expression recognition method according to an exemplary embodiment.

Fig. 2 is a schematical stream of step S12 in a kind of expression recognition method according to an exemplary embodiment Cheng Tu.

Fig. 3 is the schematic diagram of the key point position in the human face region according to an exemplary embodiment.

Fig. 4 is a kind of block diagram of expression recognition apparatus according to an exemplary embodiment.

Fig. 5 is an a kind of schematical block diagram of expression recognition apparatus according to an exemplary embodiment.

Fig. 6 is a kind of block diagram of device 800 for Expression Recognition according to an exemplary embodiment.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is a kind of flow chart of expression recognition method according to an exemplary embodiment.This method is used for intelligence The terminal devices such as mobile phone, intelligent television, tablet personal computer or PC (Personal Computer, personal computer), the disclosure is to this It is not limited.As shown in figure 1, the expression recognition method can include step S11 to step S14.

In step s 11, the multiple image that video includes is obtained.

Wherein, each two field picture can be a static picture.In other words, each two field picture is equivalent to a pictures.

In a kind of possible implementation, the multiple image (step S11) that obtaining video includes can include：Obtain Each two field picture that video includes.The accuracy rate of Expression Recognition can be improved by this kind of mode.For example, wrapped altogether in video A 12 two field pictures are included, then terminal device obtains 12 two field pictures that video A includes.

In a kind of possible implementation, the multiple image (step S11) that obtaining video includes can include：Every Every the frame or multiple image that the first frame number acquisition video includes, the multiple image that video includes is obtained.Pass through this kind of side Formula can improve the speed of Expression Recognition.Wherein, the first frame number can be the numerical value pre-set, and the disclosure is not limited this System.

As an example of the implementation, include 12 two field pictures altogether in video A, if being obtained at interval of 2 frames in video Including a two field picture, then terminal device obtain video A the 1st two field picture, the 4th two field picture, the 7th two field picture and the 10th that include Two field picture.

As another example of the implementation, include 12 two field pictures altogether in video A, if obtaining video at interval of 2 frames Two two field pictures included, then terminal device obtain video A include the 1st two field picture, the 2nd two field picture, the 5th two field picture, the 6th Two field picture, the 9th two field picture, the 10th two field picture.

In step s 12, feature extraction is carried out respectively to each two field picture, obtain face characteristic corresponding to each two field picture to Amount.

In a kind of possible implementation, feature extraction is carried out respectively to each two field picture, obtained corresponding to each two field picture Face feature vector (step S12) can include：Pass through CNN (Convolutional Neural Network, convolutional Neural net Network) feature extraction is carried out respectively to each two field picture, obtain face feature vector corresponding to each two field picture.

In step s 13, face feature vector corresponding to each two field picture is synthesized, obtains face corresponding to the video Characteristic vector.

In a kind of possible implementation, face feature vector corresponding to each two field picture is synthesized, this is obtained and regards Face feature vector (step S13) can include corresponding to frequency：Pass through RNN (Recurrent neural Network, circulation god Through network) face feature vector corresponding to each two field picture is synthesized, obtain face feature vector corresponding to the video.Its In, face feature vector corresponding to the video is the vector that the face feature vector corresponding to each two field picture synthesizes to obtain.

In step S14, Expression Recognition is carried out according to face feature vector corresponding to the video.

In a kind of possible implementation, Expression Recognition (step is carried out according to face feature vector corresponding to the video S14) can include：Face feature vector corresponding to the video is inputted into Softmax functions, obtains operation result；According to the fortune Calculate result and carry out Expression Recognition.

The expression recognition method of the disclosure, Expression Recognition can be carried out based on the multiple image that video includes, improve table The accuracy rate of feelings identification.In addition, CNN and RNN are combined for the Expression Recognition in video, deep learning can be given full play to Powerful feature extraction and modeling ability, compared to the recognition methods based on manual features or the identification side based on single image Method, effectively raise the accuracy rate of Expression Recognition.

Fig. 2 is a schematical stream of step S12 in a kind of expression recognition method according to an exemplary embodiment Cheng Tu.As shown in Fig. 2 carrying out feature extraction respectively to each two field picture, face feature vector corresponding to each two field picture is obtained, can be with Including step S121 to step S124.

In step S121, the human face region in image is obtained.

In a kind of possible implementation, using AdaBoost detectors or FastRCNN detectors, obtain in image Human face region.

In step S122, feature extraction is carried out to the human face region, obtains the global feature vector of the human face region.

In a kind of possible implementation, feature extraction is carried out to the human face region, obtains the entirety of the human face region Characteristic vector (step S122) can include：Feature extraction is carried out to the human face region by CNN, obtains the whole of the human face region Body characteristicses vector.

In step S123, feature extraction is carried out to the human face region, obtain multiple local features of the human face region to Amount.

In a kind of possible implementation, feature extraction is carried out to the human face region, obtains the multiple of the human face region Local feature vectors (step S123) can include：Determine the key point position in the human face region；According to key point position from Multiple regional areas are intercepted in the human face region；Feature extraction is carried out to multiple regional areas respectively, obtains multiple local features Vector.

Wherein, key point can be the point pre-set.Key point position can be that key point is residing in human face region Position.Multiple key points can be set for human face region, the disclosure does not limit the number of key point.Fig. 3 is shown according to one The schematic diagram of key point position in the human face region that example property implementation exemplifies.As shown in figure 3, human face region is provided with 94 passes Key point.

In a kind of possible implementation, using AAM (Active Appearance Models, active outward appearance mould Type), SDM (Supervised Descent Method, face alignment principle) or CNN, determine the key point in human face region Put.

In a kind of possible implementation, it is determined that after key point position in human face region, terminal device is with people Each key point in face region is set to center, and multiple regional areas are intercepted from human face region.Assuming that human face region is set There is N number of key point, then N number of regional area can be intercepted from human face region.As shown in figure 3, human face region is provided with 94 passes Key point, then 94 regional areas can be intercepted from human face region.

In a kind of possible implementation, feature extraction is carried out to multiple regional areas respectively, obtained multiple local special Sign vector, can include：Feature extraction is carried out to multiple regional areas by CNN respectively, obtains multiple local feature vectors.

In step S124, according to the multiple local feature vectors of global feature vector sum of the human face region, the figure is obtained The face feature vector as corresponding to.

In a kind of possible implementation, according to the multiple local features of global feature vector sum of the human face region to Amount, obtaining face feature vector (step S124) corresponding to the image can include：By the global feature vector of the human face region Spliced with multiple local feature vectors, obtain spliced characteristic vector；Dimensionality reduction is carried out to the spliced characteristic vector, Obtain face feature vector corresponding to the image.

Wherein, the spliced characteristic vector be from human face region the multiple local features of global feature vector sum to The vector that amount splicing obtains.For example, the global feature vector of human face region is L1=(x1, x2 ... ..., xn), it is local special Sign vector is L2=(y1, y2 ... ..., ym), L3=(z1, z2 ... ..., zk), then the spliced characteristic vector for L=(x1, X2 ... ..., xn, y1, y2 ... ..., ym, z1, z2 ... ..., zk).

In a kind of possible implementation, dimensionality reduction is carried out to the spliced characteristic vector, obtained corresponding to the image Face feature vector, it can include：PCA (Principal Component are carried out to the spliced characteristic vector Analysis, principal component analysis) dimensionality reduction, obtain face feature vector corresponding to the image.Figure can be reduced by this kind of mode The dimension of face feature vector as corresponding to, so as to simplified operation process.

At present, video Expression Recognition has wide application.For example, live class application can analyze the table of main broadcaster in real time Feelings, DAS (Driver Assistant System) can monitor the expression of driver in real time, when TV can obtain spectators' viewing program by camera Expression etc..The expression recognition method of the disclosure, Expression Recognition can be carried out based on the multiple image that video includes, improve table The accuracy rate of feelings identification.

Fig. 4 is a kind of block diagram of expression recognition apparatus according to an exemplary embodiment.Reference picture 4：

The expression recognition apparatus includes：Acquisition module 41, the multiple image included for obtaining video；Extraction module 42, for carrying out feature extraction respectively to each two field picture, obtain face feature vector corresponding to each two field picture；Synthesis module 43, for being synthesized to face feature vector corresponding to each two field picture, obtain face characteristic corresponding to the video to Amount；Identification module 44, Expression Recognition is carried out for the face feature vector according to corresponding to the video.

Fig. 5 is an a kind of schematical block diagram of expression recognition apparatus according to an exemplary embodiment.Reference picture 5：

In a kind of possible implementation, the extraction module 42 includes：Acquisition submodule 421, for obtaining image In human face region；First extracting sub-module 422, for carrying out feature extraction to the human face region, obtain the face area The global feature vector in domain；Second extracting sub-module 423, for carrying out feature extraction to the human face region, obtain the people Multiple local feature vectors in face region；Submodule 424 is handled, for the global feature vector sum institute according to the human face region Multiple local feature vectors are stated, obtain face feature vector corresponding to described image.

In a kind of possible implementation, second extracting sub-module 423 includes：Determination sub-module, for determining Key point position in the human face region；Intercept submodule, for according to the key point position from the human face region Intercept multiple regional areas；Extracting sub-module, for carrying out feature extraction to the multiple regional area respectively, obtain described more Individual local feature vectors.

In a kind of possible implementation, the processing submodule 424 includes：Splice submodule, for by the people The multiple local feature vectors of global feature vector sum in face region are spliced, and obtain spliced characteristic vector；Dimensionality reduction Submodule, for carrying out dimensionality reduction to the spliced characteristic vector, obtain face feature vector corresponding to described image.

In a kind of possible implementation, the synthesis module 43 is used for：By Recognition with Recurrent Neural Network to each frame Face feature vector is synthesized corresponding to image, obtains face feature vector corresponding to the video.

On the device in above-described embodiment, wherein modules perform the concrete mode of operation in relevant this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

The expression recognition apparatus of the disclosure, Expression Recognition can be carried out based on the multiple image that video includes, improve table The accuracy rate of feelings identification.

Fig. 6 is a kind of block diagram of device 800 for Expression Recognition according to an exemplary embodiment.For example, dress It can be mobile phone to put 800, computer, digital broadcast terminal, messaging devices, game console, tablet device, medical treatment Equipment, body-building equipment, personal digital assistant etc..

Reference picture 6, device 800 can include following one or more assemblies：Processing component 802, memory 804, power supply Component 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of input/output (I/O), sensor cluster 814, and Communication component 816.

The integrated operation of the usual control device 800 of processing component 802, such as communicated with display, call, data, phase The operation that machine operates and record operation is associated.Processing component 802 can refer to including one or more processors 820 to perform Order, to complete all or part of step of above-mentioned method.In addition, processing component 802 can include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 can include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in device 800.These data are shown Example includes the instruction of any application program or method for being operated on device 800, contact data, telephone book data, disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group Close and realize, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) are erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 can include power management system System, one or more power supplys, and other components associated with generating, managing and distributing electric power for device 800.

Multimedia groupware 808 is included in the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch sensings Device is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or sliding action Border, but also detect and touched or the related duration and pressure of slide with described.In certain embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When device 800 is in operator scheme, such as screening-mode or During video mode, front camera and/or rear camera can receive outside multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio-frequency assembly 810 is configured as output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike Wind (MIC), when device 800 is in operator scheme, during such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The audio signal received can be further stored in memory 804 or via communication set Part 816 is sent.In certain embodiments, audio-frequency assembly 810 also includes a loudspeaker, for exports audio signal.

I/O interfaces 812 provide interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor cluster 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor cluster 814 can detect opening/closed mode of device 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor cluster 814 can be with 800 1 components of detection means 800 or device Position change, the existence or non-existence that user contacts with device 800, the orientation of device 800 or acceleration/deceleration and device 800 Temperature change.Sensor cluster 814 can include proximity transducer, be configured to detect in no any physical contact The presence of neighbouring object.Sensor cluster 814 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In certain embodiments, the sensor cluster 814 can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 also includes near-field communication (NFC) module, to promote junction service.Example Such as, in NFC module radio frequency identification (RFID) technology can be based on, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuits (ASIC), numeral Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 804 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 820 of device 800.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Person's adaptations follow the general principle of the disclosure and including the undocumented common knowledges in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by following Claim is pointed out.

It should be appreciated that the precision architecture that the disclosure is not limited to be described above and is shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.The scope of the present disclosure is only limited by appended claim.

Claims

A kind of 1. expression recognition method, it is characterised in that including：

Obtain the multiple image that video includes；

Feature extraction is carried out respectively to each two field picture, obtains face feature vector corresponding to each two field picture；

Face feature vector corresponding to each two field picture is synthesized, obtains face feature vector corresponding to the video；

Expression Recognition is carried out according to face feature vector corresponding to the video.
2. expression recognition method according to claim 1, it is characterised in that feature extraction is carried out respectively to each two field picture, Face feature vector corresponding to each two field picture is obtained, including：

Obtain the human face region in image；

Feature extraction is carried out to the human face region, obtains the global feature vector of the human face region；

Feature extraction is carried out to the human face region, obtains multiple local feature vectors of the human face region；

According to the multiple local feature vectors of global feature vector sum of the human face region, people corresponding to described image is obtained Face characteristic vector.
3. expression recognition method according to claim 2, it is characterised in that feature extraction is carried out to the human face region, Multiple local feature vectors of the human face region are obtained, including：

Determine the key point position in the human face region；

Multiple regional areas are intercepted from the human face region according to the key point position；

Feature extraction is carried out to the multiple regional area respectively, obtains the multiple local feature vectors.
4. expression recognition method according to claim 2, it is characterised in that according to the global feature of the human face region to Amount and the multiple local feature vectors, obtain face feature vector corresponding to described image, including：

The multiple local feature vectors of global feature vector sum of the human face region are spliced, obtain spliced spy Sign vector；

Dimensionality reduction is carried out to the spliced characteristic vector, obtains face feature vector corresponding to described image.
5. expression recognition method according to claim 1, it is characterised in that to face characteristic corresponding to each two field picture Vector is synthesized, and obtains face feature vector corresponding to the video, including：

Face feature vector corresponding to each two field picture is synthesized by Recognition with Recurrent Neural Network, it is corresponding to obtain the video Face feature vector.
A kind of 6. expression recognition apparatus, it is characterised in that including：

Acquisition module, the multiple image included for obtaining video；

Extraction module, for carrying out feature extraction respectively to each two field picture, obtain face characteristic corresponding to each two field picture to Amount；

Synthesis module, for being synthesized to face feature vector corresponding to each two field picture, obtain corresponding to the video Face feature vector；

Identification module, Expression Recognition is carried out for the face feature vector according to corresponding to the video.
7. expression recognition apparatus according to claim 6, it is characterised in that the extraction module includes：

Acquisition submodule, for obtaining the human face region in image；

First extracting sub-module, for carrying out feature extraction to the human face region, obtain the global feature of the human face region Vector；

Second extracting sub-module, for carrying out feature extraction to the human face region, obtain multiple parts of the human face region Characteristic vector；

Submodule is handled, for the multiple local feature vectors of global feature vector sum according to the human face region, is obtained Face feature vector corresponding to described image.
8. expression recognition apparatus according to claim 7, it is characterised in that second extracting sub-module includes：

Determination sub-module, for determining the key point position in the human face region；

Submodule is intercepted, for intercepting multiple regional areas from the human face region according to the key point position；

Extracting sub-module, for respectively to the multiple regional area carry out feature extraction, obtain the multiple local feature to Amount.
9. expression recognition apparatus according to claim 7, it is characterised in that the processing submodule includes：

Splice submodule, for the multiple local feature vectors of global feature vector sum of the human face region to be spelled Connect, obtain spliced characteristic vector；

Dimensionality reduction submodule, for carrying out dimensionality reduction to the spliced characteristic vector, obtain face characteristic corresponding to described image Vector.
10. expression recognition apparatus according to claim 6, it is characterised in that the synthesis module is used for：

Face feature vector corresponding to each two field picture is synthesized by Recognition with Recurrent Neural Network, it is corresponding to obtain the video Face feature vector.
A kind of 11. expression recognition apparatus, it is characterised in that including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as the method described in any one in perform claim requirement 1 to 5.
12. a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions, it is characterised in that institute State and method in claim 1 to 5 described in any one is realized when computer program instructions are executed by processor.