Nothing Special   »   [go: up one dir, main page]

CN112800998B - Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA - Google Patents

Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Download PDF

Info

Publication number
CN112800998B
CN112800998B CN202110159085.8A CN202110159085A CN112800998B CN 112800998 B CN112800998 B CN 112800998B CN 202110159085 A CN202110159085 A CN 202110159085A CN 112800998 B CN112800998 B CN 112800998B
Authority
CN
China
Prior art keywords
expression
emotion
electroencephalogram
feature vector
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110159085.8A
Other languages
Chinese (zh)
Other versions
CN112800998A (en
Inventor
卢官明
朱清扬
卢峻禾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110159085.8A priority Critical patent/CN112800998B/en
Publication of CN112800998A publication Critical patent/CN112800998A/en
Application granted granted Critical
Publication of CN112800998B publication Critical patent/CN112800998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a multi-modal emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA). The method comprises the following steps: respectively extracting electroencephalogram signal features, peripheral physiological signal features and expression features from the preprocessed electroencephalogram signals, peripheral physiological signals and facial expression videos; respectively extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism; obtaining electroencephalogram-peripheral physiological-expression multi-mode emotional characteristics by using a DMCCA method for the electroencephalogram emotional characteristics, the peripheral physiological emotional characteristics and the expression emotional characteristics; and (4) performing classification and identification on the multi-modal emotional features by using a classifier. According to the method, the attention mechanism is adopted to selectively focus on the characteristics with emotion discrimination in each mode, and the relevance and complementarity among emotion characteristics of different modes are fully utilized by combining DMCCA, so that the accuracy and robustness of emotion recognition can be effectively improved.

Description

Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Technical Field
The invention relates to the technical field of emotion recognition and artificial intelligence, in particular to a multi-modal emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA).
Background
Human emotion is a psychological and physiological state accompanying the human consciousness process, and plays an important role in interpersonal communication. With the continuous progress of technologies such as artificial intelligence, people pay more attention to the more intelligent and humanized Human-Computer interaction (HCIs) experience. People have higher and higher requirements on machine intellectualization, and the machine is expected to have the capability of perceiving, understanding and even expressing emotion, realize humanized human-computer interaction and better serve human beings. Emotion recognition is a branch of emotion calculation, is a basic and core technology for realizing human-computer emotion interaction, has become a research hotspot in the fields of computer science, cognitive science, artificial intelligence and the like, and is widely concerned by the academic and industrial fields. For example, in clinical care, if the emotional state of a patient, especially a patient with a dysexpressive disorder, can be known, different care measures can be taken to improve the quality of care. In addition, there is also an increasing interest in psychobehavioral monitoring of patients with mental disorders, human-machine friendly interaction of emotional robots, and the like.
In the past, many studies on emotion recognition have focused on recognizing human emotional states using information of a single modality, such as speech-based emotion recognition and facial expression-based emotion recognition. Because the emotion information expressed by single voice or expression information is incomplete and is easily influenced by various external factors, for example, facial expression recognition is easily influenced by shading and illumination change, while emotion recognition based on voice is easily influenced by environmental noise interference and sound difference of different subjects, in addition, sometimes people face and smile, hold a cavity and do nothing to silence in order to cover up their real emotions, at this time, the facial expression or body posture has certain deception, and the emotion recognition method based on voice is invalid when people silence and are not speaking, so that the single-mode emotion recognition has certain limitation. Therefore, more and more researchers are focusing on emotion recognition research based on multi-mode information fusion, and it is expected that a robust emotion recognition model can be constructed by utilizing complementarity between various modal information so as to achieve higher emotion recognition accuracy.
Currently, in multi-modal emotion recognition research, a more common information fusion strategy includes decision layer fusion and feature layer fusion. Decision layer fusion is usually based on the result of individual identification of each mode, and then decision judgment is made according to relevant rules, such as a Mean (Mean) rule, a Sum (Sum) rule, a maximum (Max) rule, a voting mechanism of minority majority obeying, and the like, so as to obtain a final identification result. The decision layer fusion technology considers the difference of different modal information comprehensively according to different contributions of the different modal information to emotion recognition, but ignores the correlation of the different modal information. The multi-modal emotion recognition performance based on decision-making layer fusion is not only related to the emotion recognition rate of a single mode, but also depends on the performance of a decision-making layer fusion algorithm. The feature layer fusion refers to combining emotional features of a plurality of modes to form a fused feature vector. The feature layer fusion method utilizes the complementarity of different modal emotional features, but how to determine the weights of the different modal emotional features to reflect the differences of the different features in emotion classification and identification is a key for performing multi-modal feature fusion, and is still an open subject facing challenges at present.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of low accuracy and poor robustness of single-mode emotion recognition and the defects of the existing multi-mode emotion feature fusion method, the invention aims to provide the multi-mode emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA).
The technical scheme is as follows: the invention adopts the following technical scheme for realizing the aim of the invention:
a multi-modal emotion recognition method fusing an attention mechanism and DMCCA comprises the following steps:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof;
(2) mapping the electroencephalogram signal feature vector, the peripheral physiological signal feature vector and the expression feature vector into a plurality of groups of feature vectors through linear transformation matrixes respectively, determining importance weights of different feature vector groups by using an attention mechanism module respectively, and forming an electroencephalogram emotion feature vector, a peripheral physiological emotion feature vector and an expression emotion feature vector which have the same dimension and are discriminating through weighting fusion;
(3) Determining a projection matrix of each emotion characteristic vector by using a discrimination multiple set canonical correlation analysis (DMCCA) method for the electroencephalogram emotion characteristic vector, the peripheral physiological emotion characteristic vector and the expression emotion characteristic vector and maximizing the correlation among different modal emotion characteristics of the same type of sample, projecting each emotion characteristic vector to a public subspace, and obtaining the electroencephalogram-peripheral physiological-expression multi-modal emotion characteristic vector after addition and fusion;
(4) and classifying and identifying the multi-modal emotion feature vectors by using a classifier to obtain emotion categories.
Further, the specific steps of extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism module in the step (2) comprise:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure BDA0002935593770000031
And by linearly transforming the matrix W (1) Mapping to M 1 Group feature vector
Figure BDA0002935593770000032
4≤M 1 Not more than 16, the dimension of each group of feature vectors is N, N is not less than 16 and not more than 64, and the order is
Figure BDA0002935593770000033
The linear transformation expression is as follows:
E (1) =(F (1) ) T W (1)
wherein, the superscript (1) represents an electroencephalogram mode, and T represents a transposed symbol;
determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure BDA0002935593770000034
And the electroencephalogram emotional characteristic vector x (1) Expressed as:
Figure BDA0002935593770000035
Figure BDA0002935593770000036
wherein, r is 1,2, …, M 1
Figure BDA0002935593770000037
Representing the r-th group of electroencephalogram signal feature vectors,
Figure BDA0002935593770000038
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e;
(2.2) expressing the peripheral physiological signal characteristics extracted in the step (1) in a matrix form
Figure BDA0002935593770000039
And by linearly transforming the matrix W (2) Mapping to M 2 Group feature vector
Figure BDA00029355937700000310
4≤M 2 Less than or equal to 16, order
Figure BDA00029355937700000311
Figure BDA00029355937700000312
The linear transformation expression is as follows:
E (2) =(F (2) ) T W (2)
wherein the superscript (2) represents a peripheral physiological modality;
determining importance weights of different feature vector sets using a second attention mechanism module, forming discriminative peripheral physiology by weighted fusionEmotional feature vector, wherein the weight of the s-th group of peripheral physiological signal feature vectors
Figure BDA0002935593770000041
And peripheral physiological emotion feature vector x (2) Expressed as:
Figure BDA0002935593770000042
Figure BDA0002935593770000043
wherein, s is 1,2, …, M 2
Figure BDA0002935593770000044
Represents the s-th group of peripheral physiological signal feature vectors,
Figure BDA0002935593770000045
a trainable linear transformation parameter vector;
(2.3) expressing the expression characteristics extracted in the step (1) in a matrix form into expression characteristics
Figure BDA0002935593770000046
And by linearly transforming the matrix W (3) Mapping to M 3 Group feature vector
Figure BDA0002935593770000047
4≤M 3 Less than or equal to 16, order
Figure BDA0002935593770000048
Figure BDA0002935593770000049
The linear transformation expression is as follows:
E (3) =(F (3) ) T W (3)
wherein, the superscript (3) represents an expression mode;
determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure BDA00029355937700000410
And expression emotion feature vector x (3) Expressed as:
Figure BDA00029355937700000411
Figure BDA00029355937700000412
wherein, t is 1,2, …, M 3
Figure BDA00029355937700000413
Representing the characteristic vector of the t-th group expression,
Figure BDA00029355937700000414
the parameter vector is transformed linearly, which is trainable.
Further, the step (3) specifically comprises the following sub-steps:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure BDA0002935593770000051
Figure BDA0002935593770000052
And
Figure BDA0002935593770000053
32≤d≤128;
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2) (1) Peripheral physiological emotion feature vector x (2) And expression emotion feature vector x (3) Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x (1) Projection into d-dimensional common subspace is Ω T x (1) Peripheral physiological affective feature vector x (2) Projection into d-dimensional common subspace is Ψ T x (2) Expression emotion feature vector x (3) Projection into d-dimensional common subspace is Ψ T x (3)
(3.3) reducing omega T x (1) 、Φ T x (2) And Ψ T x (3) Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omega T x (1)T x (2)T x (3)
Further, the projection matrices Ω, Φ, and Ψ in step (3.1) are obtained by training in the following steps:
(3.1.1) respectively extracting training samples of all emotion types from the training sample set to generate 3 groups of emotion feature vectors
Figure BDA0002935593770000054
Wherein
Figure BDA0002935593770000055
M is the number of training samples, N is
Figure BDA0002935593770000056
I 1,2,3, M1, 2, …, M; let i-1 represent the electroencephalogram modality, i-2 represent the peripheral physiological modality, i-3 represent the expression modality,
Figure BDA0002935593770000057
representing the electroencephalogram emotional characteristic vector,
Figure BDA0002935593770000058
representing a vector of peripheral physiological emotional features,
Figure BDA0002935593770000059
representing an expression emotion feature vector;
(3.1.2) calculation of X (i) Mean of vectors in each column, pair X (i) Carrying out centralized operation;
(3.1.3) solving a group of projection matrixes omega, phi and psi based on the idea of identifying multi-set canonical correlation analysis (DMCCA), so that the linear correlation of the same type of samples in a public projection shadow space is maximized, the inter-class dispersion of data in the modality is maximized, and the intra-class dispersion of the data in the modality is minimized, and X is enabled to be (i) Is a projection vector of
Figure BDA00029355937700000510
1,2,3, the objective function of DMCCA is:
Figure BDA0002935593770000061
wherein,
Figure BDA0002935593770000062
represents X (i) The intra-class dispersion matrix of (a),
Figure BDA0002935593770000063
represents X (i) Cov (·, ·) represents the covariance, i, j ∈ {1,2,3 };
constructing an optimization model as follows and solving to obtain projection matrixes omega, phi and psi:
Figure BDA0002935593770000064
further, solving the optimization model of the DMCCA objective function using Lagrange multiplier (Lagrange multiplier) can obtain the following Lagrange (Lagrange) function:
Figure BDA0002935593770000065
wherein λ is Lagrange multiplier, and then separately determining L (w) (1) ,w (2) ,w (3) ) To w (1) 、w (2) And w (3) And making it zero, i.e. order
Figure BDA0002935593770000066
To obtain
Figure BDA0002935593770000067
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure BDA0002935593770000068
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula 1 ≥λ 2 ≥…≥λ d Corresponding characteristic vector, namely obtaining a projection matrix
Figure BDA0002935593770000071
Figure BDA0002935593770000072
And
Figure BDA0002935593770000073
based on the same inventive concept, the multi-modal emotion recognition system integrating the attention mechanism and the DMCCA, provided by the invention, comprises:
the characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
the characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
The projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same class of samples by using a discrimination multi-set canonical correlation analysis (DMCCA) method;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining an electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
and the classification and identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain the emotion types.
Based on the same inventive concept, the multi-modal emotion recognition system fusing the attention mechanism and the DMCCA provided by the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the multi-modal emotion recognition method fusing the attention mechanism and the DMCCA when being loaded to the processor.
Has the beneficial effects that: compared with the prior art, the invention has the following technical effects:
(1) According to the invention, an attention mechanism is adopted to selectively focus on the significant characteristics playing a key role in emotion recognition in each mode, the characteristics with emotion identification capability are adaptively learned, and the accuracy and robustness of multi-mode emotion recognition can be effectively improved.
(2) The invention adopts a typical correlation analysis method for identifying multiple sets, introduces the category information of the samples, can excavate the nonlinear correlation relationship among different modes by maximizing the correlation among different modal emotional characteristics of the same category sample and maximizing the inter-class dispersion of the same modal emotional characteristics and minimizing the intra-class dispersion of the same modal emotional characteristics, fully utilizes the correlation and complementarity among electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics, eliminates some invalid redundant characteristics at the same time, and can effectively improve the identification power and robustness of characteristic representation.
(3) Compared with a single-mode emotion recognition method, the method comprehensively utilizes various modal information in the emotion expression process, can combine the characteristics of different modes and fully utilize the complementarity of the characteristics to mine multi-mode emotion characteristics, and can effectively improve the accuracy and robustness of emotion recognition.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
fig. 2 is a block diagram of an embodiment of the present invention.
Detailed Description
For a more detailed understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings and specific examples.
As shown in fig. 1 and fig. 2, a multi-modal emotion recognition method combining an attention mechanism and a DMCCA provided by an embodiment of the present invention mainly includes the following steps:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using the trained neural network models respectively, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof.
In this embodiment, a deap (database for electronic Analysis using physical signals) Emotion database is used, and in practice, other Emotion databases including electroencephalogram, peripheral Physiological signals, and facial expression videos may be used. The DEAP database used in this example was a published multimodal emotion database collected by Koelstra et al, university of Mary, London, England. The database comprises physiological signals generated by 32 subjects watching different types of music video clip evoked stimuli with the time length of 40 minutes, peripheral physiological signals and facial expression videos of the first 22 subjects watching the music video clip. Each subject required 40 experiments and had a timely Self-assessment (SAM) after each experiment was completed, 40 Self-assessments on a SAM questionnaire. The SAM questionnaire contains mental scales of the subjects' Arousal (Arousal), Valence (Valence), Dominance (Dominance) and Liking (Liking) for the video. The arousal degree represents the state excitation degree of the human, the change range is gradually transited from a calm state to an excitation state, and the value is measured by the value from 1 to 9; the valence degree is also called the pleasure degree and represents the pleasure degree of the mood of a person, and the variation range is gradually transited from a Negative (Negative) state to a Positive (Positive) state and is also measured by the scores of the numbers 1 to 9; the degree of dominance varies from compliant (or "uncontrolled") to dominant (or "controlled"); the preference indicates the individual preference of the subject for the video. Each subject needs to select a score representing the emotional state after each experiment for classification and identification analysis of the subsequent emotional classifications.
In the DEAP database, the physiological signals are 512Hz sampled, 128Hz complex sampled (preprocessed complex sampled data is provided by the authorities), and the physiological signal matrix of each subject is 40 × 40 × 8064(40 different kinds of music video clips, 40 physiological signal channels, 8064 sampling points). Of the 40 physiological signal channels, the first 32 channels collect electroencephalogram signals, and the last 8 channels collect peripheral physiological signals. The 8064 samples are 63s long at 128Hz sampling rate, and each segment of the signal has 3s silence time before recording.
In the embodiment of the invention, 880 samples with electroencephalogram signals, peripheral physiological signals and facial expressions are used as training samples, and classification recognition is respectively carried out on 4 dimensions of arousal degree, valence degree, dominance degree and preference degree.
The Neural Network model for extracting the electroencephalogram signal features can adopt a Long Short-Term Memory (LSTM) Network or a Convolutional Neural Network (CNN), and the Neural Network model for extracting the expression features can adopt a 3D Convolutional Neural Network, a CNN-LSTM, and the like. In this embodiment, a trained Convolutional Neural Network (CNN) model is used to perform feature extraction on the preprocessed electroencephalogram signal, so as to obtain a 256-dimensional electroencephalogram signal feature vector; extracting 128-dimensional peripheral physiological signal characteristic vectors of preprocessed peripheral physiological signals such as electrocardio, respiration, electrooculogram and myoelectricity by extracting Low Level Descriptors (LLD) of signal waveforms and statistical characteristics (including average value, standard deviation, power spectrum, median, maximum value and minimum value) of the LLD; and extracting 256-dimensional expression feature vectors from the preprocessed facial expression video by using a trained CNN-LSTM model.
(2) And respectively extracting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector with discriminative power by using an attention mechanism module.
(3) And obtaining the electroencephalogram-peripheral physiology-expression multi-mode emotion feature vector by using a discrimination multi-set canonical correlation analysis (DMCCA) method for the electroencephalogram emotion feature vector, the peripheral physiology emotion feature vector and the expression emotion feature vector.
(4) And classifying and identifying the multi-modal emotion feature vectors by using a classifier to obtain emotion categories.
Further, the specific steps of extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism module in the step (2) comprise:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure BDA0002935593770000101
And by linearly transforming the matrix W (1) Mapping to M 1 Group feature vector
Figure BDA0002935593770000102
4≤M 1 Less than or equal to 16, each group of the Chinese medicinal compositionDimension of the eigenvector is N, 16 is more than or equal to N is less than or equal to 64, and
Figure BDA0002935593770000103
the linear transformation expression is as follows:
E (1) =(F (1) ) T W (1)
wherein, the superscript (1) represents the electroencephalogram mode, and T represents the transposed symbol.
Determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure BDA0002935593770000104
And the electroencephalogram emotional characteristic vector x (1) Expressed as:
Figure BDA0002935593770000105
Figure BDA0002935593770000106
wherein, r is 1,2, …, M 1
Figure BDA0002935593770000107
Representing the r-th group of electroencephalogram signal feature vectors,
Figure BDA0002935593770000108
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e. In this embodiment, M 1 =8,N=32。
To train the linear transformation matrix W (1) The parameter of (2) needs to be connected with a softmax classifier after the first attention mechanism module, and the electroencephalogram emotion feature vector x output by the first attention mechanism module is used for (1) C output nodes connected to the softmax classifier are input after passing through the softmax functionA probability distribution vector is generated
Figure BDA0002935593770000111
Wherein C is [1, C ]]And C is the number of emotion categories.
Further, the linear transformation matrix W is trained by the cross entropy loss function shown in the following equation (1) The parameter (c) of (c).
Figure BDA0002935593770000112
Figure BDA0002935593770000113
Wherein x is (1) The electroencephalogram emotion feature vector is 32-dimensional;
Figure BDA0002935593770000114
representing probability distribution vectors of prediction emotion types of the softmax classification model;
Figure BDA0002935593770000115
representing the real emotion category label of the mth electroencephalogram sample, and if the real emotion category label of the mth electroencephalogram sample is c when one-hot coding is adopted
Figure BDA0002935593770000116
Otherwise
Figure BDA0002935593770000117
Representing the probability that the softmax classification model predicts the mth electroencephalogram sample as the class c; loss (1) Representing a linear transformation matrix W (1) A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, the electroencephalogram emotional characteristic vector x can be extracted from the electroencephalogram signal of the newly input test sample (1)
(2.2) extracting step (1)The obtained peripheral physiological signal features are expressed in a matrix form
Figure BDA0002935593770000118
And by linearly transforming the matrix W (2) Mapping to M 2 Group feature vector
Figure BDA0002935593770000119
4≤M 2 Less than or equal to 16, order
Figure BDA00029355937700001110
Figure BDA00029355937700001111
The linear transformation expression is as follows:
E (2) =(F (2) ) T W (2)
wherein the superscript (2) represents the peripheral physiological modality.
Determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectors
Figure BDA00029355937700001112
And peripheral physiological emotion feature vector x (2) Expressed as:
Figure BDA00029355937700001113
Figure BDA0002935593770000121
wherein, s is 1,2, …, M 2
Figure BDA0002935593770000122
Represents the s-th group of peripheral physiological signal feature vectors,
Figure BDA0002935593770000123
the parameter vector is transformed linearly, which is trainable. In this embodiment, M 2 =4。
To train the linear transformation matrix W (2) The peripheral physiological emotion feature vector x output by the second attention mechanism module needs to be connected with a softmax classifier after the second attention mechanism module (2) C output nodes connected to the softmax classifier output a probability distribution vector after passing through the softmax function
Figure BDA0002935593770000124
Further, the linear transformation matrix W is trained by the cross entropy loss function shown in the following equation (2) The parameter (c) of (c).
Figure BDA0002935593770000125
Figure BDA0002935593770000126
Wherein x is (2) A 32-dimensional peripheral physiological emotion feature vector;
Figure BDA0002935593770000127
probability distribution vectors representing the prediction emotion classes of the softmax classification model;
Figure BDA0002935593770000128
when one-hot coding is adopted, if the real emotion category label of the mth peripheral physiological signal sample is c, then
Figure BDA0002935593770000129
Otherwise
Figure BDA00029355937700001210
To representPredicting the probability of the mth peripheral physiological signal sample as a class c by the softmax classification model; loss (2) Representing a linear transformation matrix W (2) A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, a peripheral physiological emotion characteristic vector x can be extracted from the newly input peripheral physiological signal of the test sample (2)
(2.3) expressing the expression characteristics extracted in the step (1) in a matrix form into expression characteristics
Figure BDA00029355937700001211
And by linearly transforming the matrix W (3) Mapping to M 3 Group feature vector
Figure BDA00029355937700001212
4≤M 3 Less than or equal to 16, order
Figure BDA00029355937700001213
Figure BDA00029355937700001214
The linear transformation expression is as follows:
E (3) =(F (3) ) T W (3)
Wherein, the superscript (3) represents an expression mode.
Determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure BDA0002935593770000131
And expression emotion feature vector x (3) Expressed as:
Figure BDA0002935593770000132
Figure BDA0002935593770000133
wherein, t is 1, 2, …, M 3
Figure BDA0002935593770000134
Representing the characteristic vector of the t-th group expression,
Figure BDA0002935593770000135
the parameter vector is transformed linearly, which is trainable. In this embodiment, M 3 =8。
To train the linear transformation matrix W (3) The third attention mechanism module is connected with a softmax classifier, and the expression emotion feature vector x output by the third attention mechanism module is used for classifying the expression emotion feature vector x (3) C output nodes connected to the softmax classifier output a probability distribution vector after passing through the softmax function
Figure BDA0002935593770000136
Further, the linear transformation matrix W is trained by the cross entropy loss function shown in the following equation (3) The parameter (c) of (c).
Figure BDA0002935593770000137
Figure BDA0002935593770000138
Wherein x is (3) Expression emotion feature vectors in 32 dimensions;
Figure BDA0002935593770000139
probability distribution vectors representing the prediction emotion classes of the softmax classification model;
Figure BDA00029355937700001310
is shown asWhen one-hot coding is adopted, if the real emotion category label of the mth expression video sample is c, then
Figure BDA00029355937700001311
Otherwise
Figure BDA00029355937700001312
Representing the probability that the m-th expression video sample is predicted to be of the category c by the softmax classification model; loss (3) Representing a linear transformation matrix W (3) A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, the expression emotion feature vector x can be extracted from the newly input expression video of the test sample (3)
Further, the step (3) specifically comprises the following sub-steps:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure BDA0002935593770000141
Figure BDA0002935593770000142
And
Figure BDA0002935593770000143
d is more than or equal to 32 and less than or equal to 128. In the present embodiment, d is 40.
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2) (1) Peripheral physiological emotion feature vector x (2) And expression emotion feature vector x (3) Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x (1) Projection into d-dimensional common subspace is Ω T x (1) Peripheral physiological affective feature vector x (2) Projection into d-dimensional common subspace is Φ T x (2) Expression emotion feature vector x (3) Projection into d-dimensional common subspace is Ψ T x (3)
(3.3) reducing omega T x (1) 、Φ T x (2) And Ψ T x (3) Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omega T x (1)T x (2)T x (3)
Further, the projection matrices Ω, Φ, and Ψ in step (3.1) are obtained by training in the following steps:
(3.1.1) generating 3 groups of emotional feature vectors for the samples of the class C emotion classes in the training sample set
Figure BDA0002935593770000144
Wherein
Figure BDA0002935593770000145
M is the number of training samples (in this example, the data size in the sample set is not large, all samples participate in the calculation, and the sample set with large data size can randomly extract samples of each emotion type), i is 1,2, 3, M is 1,2, …, M; let i-1 represent the electroencephalogram modality, i-2 represent the peripheral physiological modality, i-3 represent the expression modality,
Figure BDA0002935593770000146
representing the electroencephalogram emotional characteristic vector,
Figure BDA0002935593770000147
representing a vector of peripheral physiological emotional features,
Figure BDA0002935593770000148
representing an expression emotion feature vector; in this embodiment, C is 2, M is 880, and N is 32.
(3.1.2) calculation of X (i) Mean value of the vectors of each column
Figure BDA0002935593770000149
To X (i) Performing a centralization operation to obtain
Figure BDA00029355937700001410
For convenience of description, the following will be centered
Figure BDA00029355937700001411
Is still marked as X (i) I.e. to assume
Figure BDA00029355937700001412
Have all been centralized.
(3.1.3) the idea of discriminative multi-set canonical correlation analysis (DMCCA) is to solve a set of projection matrices Ω, Φ, and Ψ to maximize the linear correlation of homogeneous samples in the common projection shadow space, while maximizing the inter-class and minimizing the intra-class spread of intra-modal data, let X be (i) Is projected vector of
Figure BDA0002935593770000151
1, 2, 3, the objective function of DMCCA is:
Figure BDA0002935593770000152
wherein,
Figure BDA0002935593770000153
represents X (i) The intra-class dispersion matrix of (a),
Figure BDA0002935593770000154
represents X (i) Cov (·, ·) represents the covariance, i, j ∈ {1, 2, 3 }.
The solution to the DMCCA objective function may be represented as an optimization model as follows:
Figure BDA0002935593770000155
(3.1.4) solving the optimization model of the DMCCA objective function using Lagrange multiplier (Lagrange multiplier) yields the following Lagrange (Lagrange) function:
Figure BDA0002935593770000156
wherein λ is Lagrange multiplier, and then respectively calculating L (w) (1) ,w (2) ,w (3) ) To w (1) 、w (2) And w (3) And making it zero, i.e. order
Figure BDA0002935593770000157
To obtain
Figure BDA0002935593770000158
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure BDA0002935593770000161
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula 1 ≥λ 2 ≥…≥λ d Corresponding characteristic vector, namely obtaining a projection matrix
Figure BDA0002935593770000162
Figure BDA0002935593770000163
And
Figure BDA0002935593770000164
in the present embodiment, d is 40.
Based on the same inventive concept, the multi-modal emotion recognition system integrating the attention mechanism and the DMCCA provided by the embodiment of the invention comprises:
the characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
The characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
the projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same type of samples by using a DMCCA method;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining the electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
and the classification and identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain the emotion types.
For specific implementation of each module, reference is made to the above method embodiment, and details are not repeated. Those skilled in the art will appreciate that the modules in the embodiments may be adaptively changed and arranged in one or more systems different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components.
Based on the same inventive concept, the multi-modal emotion recognition system combining the attention mechanism and the DMCCA provided by the embodiment of the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and the computer program realizes the multi-modal emotion recognition method combining the attention mechanism and the DMCCA when being loaded into the processor.
The technical scheme disclosed by the invention not only comprises the technical methods related in the above embodiments, but also comprises the technical scheme formed by arbitrarily combining the technical methods. Those skilled in the art can make certain improvements and modifications without departing from the principles of the invention, and such improvements and modifications are considered within the scope of the invention.

Claims (5)

1. The method for multi-modal emotion recognition based on attention mechanism and DMCCA is characterized by comprising the following steps:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof;
(2) Mapping the electroencephalogram signal feature vector, the peripheral physiological signal feature vector and the expression feature vector into a plurality of groups of feature vectors through linear transformation matrixes respectively, determining importance weights of different feature vector groups by using an attention mechanism module respectively, and forming an electroencephalogram emotion feature vector, a peripheral physiological emotion feature vector and an expression emotion feature vector which have the same dimension and are discriminating through weighting fusion;
(3) for the electroencephalogram emotion feature vectors, the peripheral physiological emotion feature vectors and the expression emotion feature vectors, determining a projection matrix of each emotion feature vector by using a DMCCA method for identifying multiple sets of typical correlation analysis, maximizing the correlation among different modal emotion features of the same type of samples, projecting each emotion feature vector to a public subspace, and adding and fusing to obtain electroencephalogram-peripheral physiological-expression multi-mode emotion feature vectors;
(4) classifying and identifying the multi-mode emotion feature vectors by using a classifier to obtain emotion categories;
the step (2) comprises the following substeps:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure FDA0003685990360000011
And by linearly transforming the matrix W (1) Mapping to M 1 Group feature vector
Figure FDA0003685990360000012
The dimension of each group of feature vectors is N, N is more than or equal to 16 and less than or equal to 64, and the order is
Figure FDA0003685990360000013
The linear transformation expression is as follows:
E (1) =(F (1) ) T W (1)
wherein, the superscript (1) represents an electroencephalogram mode, and T represents a transposed symbol;
determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure FDA0003685990360000014
And the electroencephalogram emotional characteristic vector x (1) Expressed as:
Figure FDA0003685990360000021
Figure FDA0003685990360000022
wherein, r is 1,2, …, M 1
Figure FDA0003685990360000023
Representing the r-th group of electroencephalogram signal feature vectors,
Figure FDA0003685990360000024
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e;
(2.2) expressing the peripheral physiological signal characteristics extracted in the step (1) in a matrix form
Figure FDA0003685990360000025
And by linearly transforming the matrix W (2) Mapping to M 2 Group feature vector
Figure FDA0003685990360000026
Order to
Figure FDA0003685990360000027
Figure FDA0003685990360000028
The linear transformation expression is as follows:
E (2) =(F (2) ) T W (2)
wherein the superscript (2) represents a peripheral physiological modality;
determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectors
Figure FDA0003685990360000029
And peripheral physiological emotion feature vector x (2) Expressed as:
Figure FDA00036859903600000210
Figure FDA00036859903600000211
wherein, s is 1,2, …, M 2
Figure FDA00036859903600000212
Represents the s-th group of peripheral physiological signal feature vectors,
Figure FDA00036859903600000213
a trainable linear transformation parameter vector;
(2.3) expressing the expression characteristics extracted in the step (1) in a matrix form into expression characteristics
Figure FDA00036859903600000214
And by linearly transforming the matrix W (3) Mapping to M 3 Group feature vector
Figure FDA0003685990360000031
Order to
Figure FDA0003685990360000032
Figure FDA0003685990360000033
The linear transformation expression is as follows:
E (3) =(F (3) ) T W (3)
wherein, the superscript (3) represents an expression mode;
determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure FDA0003685990360000034
And expression emotion feature vector x (3) Expressed as:
Figure FDA0003685990360000035
Figure FDA0003685990360000036
wherein, t is 1,2, …, M 3
Figure FDA0003685990360000037
Representing the characteristic vector of the t-th group expression,
Figure FDA0003685990360000038
a trainable linear transformation parameter vector;
the step (3) comprises the following substeps:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure FDA0003685990360000039
Figure FDA00036859903600000310
And
Figure FDA00036859903600000311
32≤d≤128;
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2) (1) Peripheral physiological emotion feature vector x (2) And expression emotion feature vector x (3) Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x (1) Projection to d-dimensional common subspace is Ω T x (1) Peripheral physiological affective feature vector x (2) Projection into d-dimensional common subspace is Φ T x (2) Expression emotion feature vector x (3) To d dimension publicProjection of the common subspace is Ψ T x (3)
(3.3) reducing omega T x (1) 、Φ T x (2) And Ψ T x (3) Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omega T x (1)T x (2)T x (3)
2. The multi-modal emotion recognition method integrating an attention mechanism and DMCCA according to claim 1, wherein the projection matrices Ω, Φ and Ψ in step (3.1) are obtained by training of the following steps:
(3.1.1) respectively extracting training samples of all emotion types from the training sample set to generate 3 groups of emotion feature vectors
Figure FDA0003685990360000041
Wherein
Figure FDA0003685990360000042
M is the number of training samples, i is 1, 2, 3, M is 1, 2. Let i-1 represent the electroencephalogram modality, i-2 represent the peripheral physiological modality, i-3 represent the expression modality,
Figure FDA0003685990360000043
representing the electroencephalogram emotional characteristic vector,
Figure FDA0003685990360000044
representing a vector of peripheral physiological emotional features,
Figure FDA0003685990360000045
representing an expression emotion feature vector;
(3.1.2) calculation of X (i) Mean of vectors in each column, pair X (i) Carrying out centralized operation;
(3.1.3) solving a group of projection matrixes omega, phi and psi based on the idea of identifying multi-set canonical correlation analysis (DMCCA), so that the same type of samples are projected in a public shadow The linear dependence of the space is maximized while maximizing inter-class dispersion of data within the modalities and minimizing intra-class dispersion of data within the modalities, let X (i) Is projected vector of
Figure FDA0003685990360000046
The objective function of DMCCA is:
Figure FDA0003685990360000047
wherein,
Figure FDA0003685990360000048
represents X (i) The intra-class dispersion matrix of (a),
Figure FDA0003685990360000049
Figure FDA00036859903600000410
represents X (i) Cov (·, ·) represents the covariance, i, j ∈ {1, 2, 3 }; constructing an optimization model as follows and solving to obtain projection matrixes omega, phi and psi:
Figure FDA00036859903600000411
3. the multi-modal emotion recognition method integrating the attention mechanism and the DMCCA according to claim 2, wherein the optimized model of the DMCCA objective function constructed by solving the method by using the Lagrangian multiplier method is specifically as follows: the optimization model is expressed as the following lagrange function:
Figure FDA0003685990360000051
where λ is the Lagrangian multiplier, thenSeparately determine L (w) (1) ,w (2) ,w (3) ) To w (1) 、w (2) And w (3) And making it zero, i.e. order
Figure FDA0003685990360000052
To obtain
Figure FDA0003685990360000053
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure FDA0003685990360000054
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula 1 ≥λ 2 ≥…≥λ d Corresponding characteristic vector, namely obtaining a projection matrix
Figure FDA0003685990360000055
Figure FDA0003685990360000056
And
Figure FDA0003685990360000057
4. the multimode emotion recognition system fusing an attention mechanism and DMCCA is characterized by comprising:
The characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
the characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
the projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same type of sample by using a DMCCA (differential motion channel interference cancellation channel) method for identifying multi-set typical correlation analysis;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining an electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
The classification identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain emotion categories;
the specific generation steps of the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector comprise:
representing the extracted characteristics of the brain electrical signals into a matrix form
Figure FDA0003685990360000061
And by linearly transforming the matrix W (1) Mapping to M 1 Group feature vector
Figure FDA0003685990360000062
The dimension of each group of feature vectors is N, N is more than or equal to 16 and less than or equal to 64, and the order is
Figure FDA0003685990360000063
Figure FDA0003685990360000064
The linear transformation expression is as follows:
E (1) =(F (1) ) T W (1)
wherein, the superscript (1) represents an electroencephalogram mode, and T represents a transposed symbol;
determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure FDA0003685990360000065
And the electroencephalogram emotional characteristic vector x (1) Expressed as:
Figure FDA0003685990360000066
Figure FDA0003685990360000067
wherein, r is 1, 2, …, M 1
Figure FDA0003685990360000068
Representing the r-th group of electroencephalogram signal feature vectors,
Figure FDA0003685990360000069
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e;
representing the extracted peripheral physiological signal characteristics into a matrix form
Figure FDA0003685990360000071
And by linearly transforming the matrix W (2) Mapping to M 2 Group feature vector
Figure FDA00036859903600000714
Order to
Figure FDA0003685990360000073
Figure FDA0003685990360000074
The linear transformation expression is as follows:
E (2) =(F (2) ) T W (2)
wherein the superscript (2) represents a peripheral physiological modality;
determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectors
Figure FDA0003685990360000075
And peripheral physiological emotion feature vector x (2) Expressed as:
Figure FDA0003685990360000076
Figure FDA0003685990360000077
wherein, s is 1, 2, …, M 2
Figure FDA0003685990360000078
Represents the s-th group of peripheral physiological signal feature vectors,
Figure FDA0003685990360000079
a trainable linear transformation parameter vector;
expressing the extracted expression features into a matrix form
Figure FDA00036859903600000710
And by linearly transforming the matrix W (3) Mapping to M 3 Group feature vector
Figure FDA00036859903600000715
Order to
Figure FDA00036859903600000712
The linear transformation expression is as follows:
E (3) =(F (3) ) T W (3)
wherein, the superscript (3) represents an expression mode;
determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure FDA00036859903600000713
And expression emotion feature vector x (3) Expressed as:
Figure FDA0003685990360000081
Figure FDA0003685990360000082
wherein, t is 1, 2, …, M 3
Figure FDA0003685990360000083
Representing the characteristic vector of the t-th group expression,
Figure FDA0003685990360000084
a trainable linear transformation parameter vector;
the specific generation steps of the brain electricity-peripheral physiology-expression multi-mode emotion feature vector comprise:
Acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure FDA0003685990360000085
Figure FDA0003685990360000086
And
Figure FDA0003685990360000087
32≤d≤128;
extracting the electroencephalogram emotion feature vector x by using projection matrixes omega, phi and psi respectively (1) Peripheral physiological emotion feature vector x (2) And expression emotion feature vector x (3) Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x (1) Projection into d-dimensional common subspace is Ω T x (1) Peripheral physiological affective feature vector x (2) Projection into d-dimensional common subspace is Φ T x (2) Expression emotion feature vector x (3) Projection into d-dimensional common subspace is Ψ T x (3)
Will omega T x (1) 、Φ T x (2) And Ψ T x (3) Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omega T x (1)T x (2)T x (3)
5. A multi-modal emotion recognition system combining an attention mechanism and DMCCA, comprising at least one computing device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when loaded into the processor implementing the multi-modal emotion recognition method combining an attention mechanism and DMCCA according to any of claims 1-3.
CN202110159085.8A 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Active CN112800998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Publications (2)

Publication Number Publication Date
CN112800998A CN112800998A (en) 2021-05-14
CN112800998B true CN112800998B (en) 2022-07-29

Family

ID=75814276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110159085.8A Active CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Country Status (1)

Country Link
CN (1) CN112800998B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297981B (en) * 2021-05-27 2023-04-07 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113326781B (en) * 2021-05-31 2022-09-02 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113269173B (en) * 2021-07-20 2021-10-22 佛山市墨纳森智能科技有限公司 Method and device for establishing emotion recognition model and recognizing human emotion
CN113749656B (en) * 2021-08-20 2023-12-26 杭州回车电子科技有限公司 Emotion recognition method and device based on multidimensional physiological signals
CN113616209B (en) * 2021-08-25 2023-08-04 西南石油大学 Method for screening schizophrenic patients based on space-time attention mechanism
CN113729710A (en) * 2021-09-26 2021-12-03 华南师范大学 Real-time attention assessment method and system integrating multiple physiological modes
CN114091599A (en) * 2021-11-16 2022-02-25 上海交通大学 Method for recognizing emotion of intensive interaction deep neural network among modalities
CN114298189A (en) * 2021-12-20 2022-04-08 深圳市海清视讯科技有限公司 Fatigue driving detection method, device, equipment and storage medium
CN114947852B (en) * 2022-06-14 2023-01-10 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium
CN117935339A (en) * 2024-03-19 2024-04-26 北京长河数智科技有限责任公司 Micro-expression recognition method based on multi-modal fusion
CN118332505B (en) * 2024-06-12 2024-08-20 临沂大学 Physiological signal data processing method, system and device based on multi-mode fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习神经网络的SAR星上目标识别系统研究;袁秋壮等;《上海航天》;20171025(第05期);全文 *

Also Published As

Publication number Publication date
CN112800998A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112800998B (en) Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Abdullah et al. Multimodal emotion recognition using deep learning
Zhang et al. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review
CN111134666B (en) Emotion recognition method of multi-channel electroencephalogram data and electronic device
CN108805087B (en) Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system
CN106886792B (en) Electroencephalogram emotion recognition method for constructing multi-classifier fusion model based on layering mechanism
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Schels et al. Multi-modal classifier-fusion for the recognition of emotions
CN107766898A (en) The three classification mood probabilistic determination methods based on SVM
Rayatdoost et al. Subject-invariant EEG representation learning for emotion recognition
Chen et al. Patient emotion recognition in human computer interaction system based on machine learning method and interactive design theory
Hwang et al. Brain lateralisation feature extraction and ant colony optimisation‐bidirectional LSTM network model for emotion recognition
CN117935339A (en) Micro-expression recognition method based on multi-modal fusion
Xie et al. WT feature based emotion recognition from multi-channel physiological signals with decision fusion
Chen et al. Design and implementation of human-computer interaction systems based on transfer support vector machine and EEG signal for depression patients’ emotion recognition
Verma et al. Affective state recognition from hand gestures and facial expressions using Grassmann manifolds
CN116701996A (en) Multi-modal emotion analysis method, system, equipment and medium based on multiple loss functions
Li et al. EEG emotion recognition based on self-attention dynamic graph neural networks
Tang et al. Eye movement prediction based on adaptive BP neural network
Zhao et al. Multiscale Global Prompt Transformer for EEG-Based Driver Fatigue Recognition
CN117609863A (en) Long-time electroencephalogram emotion recognition method based on electroencephalogram micro state
Zhang et al. Evolutionary Ensemble Learning for EEG-based Cross-Subject Emotion Recognition
Li et al. Acoustic-articulatory emotion recognition using multiple features and parameter-optimized cascaded deep learning network
CN111709314A (en) Emotional distribution recognition method based on facial surface myoelectricity
Raoof et al. Domain-independent short-term calibration based hybrid approach for motor imagery electroencephalograph classification: a comprehensive review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant