CN103531208A - Astronautic stress emotion identification method based on short-term memory weight fusion - Google Patents
Astronautic stress emotion identification method based on short-term memory weight fusion Download PDFInfo
- Publication number
- CN103531208A CN103531208A CN201310534910.3A CN201310534910A CN103531208A CN 103531208 A CN103531208 A CN 103531208A CN 201310534910 A CN201310534910 A CN 201310534910A CN 103531208 A CN103531208 A CN 103531208A
- Authority
- CN
- China
- Prior art keywords
- emotion
- short
- frame
- term memory
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Machine Translation (AREA)
- Toys (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a method for identifying voice emotion under an astronautic stress environment based on short-term memory weight fusion. The method comprises the following steps: extracting the prior probability of the memory accuracy rate of a voice frame according to a short-term memory forgetting rule of psychology, wherein the prior probability of the memory accuracy rate and an adjusting coefficient jointly form importance weight of the frame; calculating the judgment fused weight of a voice segment based on the importance weight of each frame; performing emotion identification on each voice segment; performing emotion fusion according to the judgment fused weight to obtain the final identification result. Through the adoption of the method provided by the invention, the identification accuracy rate of a text-related speaker identification system can be effectively increased; in addition, the system has higher robustness on noise.
Description
Technical field
The present invention relates to a kind of speech-emotion recognition method, particularly the speech-emotion recognition method merging based on short-term memory weight under a kind of space flight stress environment.
Background technology
In space flight particular surroundings, being emotionally stable of personnel has great importance with objective evaluation.The in the situation that of enclosure space, weightlessness and long-time uninteresting operation, easily bringing out irritated grade stress speech emotional.
Traditional speech emotional automatic identification technology mainly comprises two levels: the one, and which kind of feature in employing voice signal is as emotion recognition, the problem of affective feature extraction namely, the one, how by specific voice data classification, the problem of namely pattern-recognition.
In voice signal, the expression of emotion is always not fairly obvious.Different emotions can have similar feature, and different speakers can exist larger individual difference, and this research to affective characteristics has caused larger difficulty, or even people's ear is listened and distinguished and also caused very large difficulty.In the automatic identification of speech emotional, a very important problem is exactly to find the clearly expression of emotion in feature space, namely needs to carry out extraction and the optimization of affective characteristics.
Emotion modeling is a basic problem in speech emotional identification.We can adopt emotion class models or the dimensional space model in the theories of psychology to carry out modeling to emotion.Emotion class models is divided into mutually different classification by emotion, is a kind of discrete model.Dimensional space model is considered as the continuous variable in hyperspace by emotion, is a kind of continuous model.Gauss hybrid models is a kind of method of higher recognition performance that obtains in speech emotional identification in recent years.Because gauss hybrid models is stronger to the capability of fitting of data, learning ability that therefore may be in the larger data type such as languages identification, Speaker Identification, speech emotional identification " individual difference " is stronger.Yet the shortcoming of gauss hybrid models is to be also limited by training data, and successfully setting up emotion model need to have sufficient training data.
Summary of the invention
The present invention is directed to stress space environment in the demand of continuous speech emotion recognition, the recognition methods that the portion's emotion recognition of setting a trap is combined with overall emotion recognition.Propose a kind of speech-emotion recognition method that embeds short-term memory forgetting curve, its major technique step is:
According to the short-term memory forgetting law in psychology, extract the prior probability of the memory accuracy of speech frame; Recall the prior probability of accuracy, with the weights of importance of the common configuration frame of adjustment factor; Weights of importance based on each frame, weight is merged in the judgement of computing voice section; Each voice segments is carried out emotion recognition; According to judgement, merge weight, carry out emotional fusion, obtain final recognition result.
(1) from short-term memory forgetting law curve, extract the prior probability of recalling accuracy.
(1-1) short-term memory forgetting curve is sampled, the time in sampling interval is t, 0 < t < 18s;
(1-2) obtain k sample value d(k), as the prior probability of recalling accuracy.
(2) by recalling prior probability and the subjective adjustment factor of accuracy, obtain the weights of importance of time domain frame.
(2-1) subjective adjustment factor r is set, the reference value of r is 1, can (0,1] dynamic adjustments in scope;
(2-2) by taking advantage of sexual intercourse to obtain weights of importance f (n)=r * d (k) of time domain frame, wherein n is frame sequence.The frame memory accuracy more waning to the close is higher, and weight is higher, and it is zero that the frame beyond 18 seconds is recalled accuracy, and corresponding weights of importance is also zero.
(3) cut apart continuous voice signal, form emotion recognition unit.
(3-1) continuous voice signal is divided into the unit that is less than 18s, the persistence length reference value of unit is that 2s is to 9s.Dividing method be take naturally and to be paused and syllable is separatrix.
(3-2) extract frame by frame the feature of voice segments, comprise the parameters,acoustics such as fundamental tone, intensity, resonance peak.
(4), according to the weights of importance of every frame, obtain the fusion weight of emotion recognition unit.
(4-1) to cutting apart m the voice segments s (i) obtaining in (3) step, i=1,2 ... m, the weights of importance f of the frame that extraction frame sequence is separately corresponding respectively
i(n);
(4-2) calculate the fusion weight of each voice segments s (i)
(5) each emotion recognition unit is carried out to speech emotional identification.
(5-1) to training data, adopt gauss hybrid models to carry out emotion modeling to voice segments, obtain probability Distribution Model;
(5-2) input data are carried out to the emotion recognition based on bayesian criterion, obtain the emotion vector e (i) of every section of emotion recognition unit s (i)=[e(i, 1), e (i, 2) ..., e (i, p)], p is emotion categorical measure;
(5-3) according to merging weight w (i), fusion is adjudicated in each emotion recognition unit, final emotion be output as E=w (1) * e (1)+w (2) * e (2)+... + w (m) * e (m).
Advantage of the present invention and effect are:
1. the present invention, according to the short-term memory forgetting law in psychology, extracts the prior probability of the memory accuracy of speech frame; Recall the prior probability of accuracy, with the weights of importance of the common configuration frame of adjustment factor, weight is merged in the judgement of computing voice section, finally according to judgement, merge weight and identify, thereby realized the fusion of emotion, improved stress emotion recognition effect.
2. the present invention can detect effectively for the affective style relevant with cognitive process such as irritated, tired, self-confident, stress in space environment, have important application prospect, can to personnel's emotional stability, carry out early warning and monitoring timely.
Other advantages of the present invention and effect will continue to describe below.
Accompanying drawing explanation
Fig. 1---emotion recognition system flowchart
Fig. 2---latent structure method figure
The forgetting curve figure of Fig. 3---short-term memory
The weight iterativecurve of Fig. 4---gaussian component
Fig. 5---embed short-term memory and forget the gauss hybrid models recognition result before weight
Fig. 6---in the present invention, method stress speech emotional recognition result
Embodiment
Below in conjunction with drawings and Examples, technical solutions according to the invention are further elaborated.
Fig. 1 is speech emotional recognition system block diagram involved in the present invention, and wherein main module comprises: segmentation feature extracts, emotion model is trained, divide the calculating of frame weights of importance, local emotion recognition result, overall emotion Vector Fusion etc.The implementation method of lower mask body introducing system.
One. speech feature extraction
For identifying with the proper vector of modeling, generally there are two kinds of building methods, static statistics feature and in short-term behavioral characteristics.Behavioral characteristics is stronger to the dependence of phoneme information.The variation meeting of text has larger impact to affective characteristics.In the middle of emotional speech, roughly comprise three kinds of information sources, speaker information, semantic information and emotion information.At structure affective characteristics with when selecting feature, not only need to make the feature emotion information that reflects as much as possible, namely along with the variation of emotion, there is obvious variation, but also need to keep feature not to be subject to the semantic impact changing as far as possible.
As shown in Figure 2, the phonetic feature of employing has nature static and of overall importance to latent structure method.Feature has comprised prosodic features and tonequality feature, and wherein the computing formula of single order shake is as follows:
The computing formula of second order shake is as follows:
When the frequency-division section feature of structure spectrum energy, do not adopt the energy percentage in 650Hz-4kHz, although this frequency range relates to the first resonance peak and most the second resonance peak, but the energy of this frequency range is subject to impact that content of text changes greatly, mainly along with the variation of phoneme information, changes.
In spectrum energy feature, also adopted energy percentage more than 4kHz, the increase of this band segment energy can reflect the raising of incentive degree, can be used for distinguishing sad and anger etc.
Harmonic noise is than the latent structure (feature 78 is to feature 95) that has increased too frequency-division section in feature.Because harmonic noise is than the impact that can be subject to noise, particularly in high band, the impact of noise is more obvious, therefore considers that frequency-division section structure harmonic noise is than feature, so that more careful description emotion changes the signal intensity bringing.In the division of frequency range, be divided into four following frequency ranges of frequency range: 400Hz (harmonic component that has comprised lower frequency), 400Hz-2000Hz frequency range (energy range that has roughly comprised the first two resonance peak), 2000Hz-5000Hz frequency range (harmonic component of upper frequency).5kHz is more serious with noise effect in the signal of super band, and also inapplicable for the lower language material of some sampling rates, does not therefore adopt.
Two. embed minute frame weight calculation of short-term memory forgetting curve
From short-term memory forgetting law curve h (τ), extract the prior probability p (τ) that recalls accuracy, short-term memory forgetting curve as shown in Figure 3.Short-term memory forgetting curve is sampled, and the time in sampling interval is t, 0 < t < 18s.Obtain k sample value d(k), as the prior probability of recalling accuracy.
p(τ)=d(τ/t) (3)
d(k)=h(kt) (4)
According to following mode, obtain the weights of importance of time domain frame:
f(n)=r×d(k),0<k<18s/t (5)
f(n)=0,k>18s/t (6)
Wherein r is subjective adjustment factor, 0 < r≤1, and wherein n is frame sequence.The frame memory accuracy more waning to the close is higher, and weight is higher, and the frame beyond 18 seconds is recalled accuracy and approached zero, and the weights of importance of correspondence is also zero.
To continuous speech signal s (τ), form emotion recognition cell S (m), m is paragraph numbering.
S(m)=s(τ
i)-s(τ
j),τ
j-τ
i<18s (7)
The persistence length reference value of unit is that 2s is to 9s.Dividing method be take naturally and to be paused and syllable is separatrix.
According to the weights of importance of every frame, obtain the fusion weight of emotion recognition unit
Three. gauss hybrid models modeling
Gauss hybrid models (Gaussian Mixture Model, GMM) can define by following formula:
Here X is the D dimensional feature vector of speech samples, and t is its sample sequence number; b
i(X), i=1,2 ..., M is member's density; a
i, i=1,2 ..., M is mixed weight-value.Each member's density be a D dimension variable about mean value vector U
iwith covariance matrix Σ
igaussian function, form is as follows:
Wherein mixed weight-value satisfies condition:
The mixed density of complete Gauss is by mean value vector, covariance matrix and the mixed weight-value parametrization of all member's density.These parameters gatherings are expressed as together:
λ
i={ a
i, U
i, Σ
i, i=1,2 ..., M (12) is according to bayes decision criterion, and the emotion recognition based on GMM can obtain by maximum a posteriori probability,
Wherein k is emotion classification sequence number.
For the parameter estimation of gauss hybrid models, can adopt EM(Expectation-maximization) algorithm carries out.EM is greatest hope algorithm, and its basic thought is since an initialized model λ, removes to estimate a new model
make
this stylish model becomes initial model for repetitive operation next time, and this process is carried out repeatedly until reach convergence threshold.This is similar to for estimating the Baum-Welch revaluation algorithm of Hidden Markov Model (HMM) (HMM) parameter.During the EM of each step repeats, following revaluation formula guarantees the likelihood value monotone increasing of model:
The revaluation of hybrid parameter:
The revaluation of mean value vector:
The revaluation of variance matrix:
The estimated value of the weight of each component of GMM, average and covariance matrix, is tending towards convergence by iteration each time.
Degree of mixing in gauss hybrid models, can only derive in theory a fixing scope, concrete value need to be determined in experiment, the weight of each gaussian component can be estimated to obtain by EM algorithm, in the iteration of EM algorithm, avoid covariance matrix to become singular matrix, guarantee convergence.
Take weight as example, and the iterativecurve of EM algorithm as shown in Figure 4, has demonstrated the convergence situation of iteration each time in figure.Ordinate represents the numerical value of the weight of each gaussian component, and horizontal ordinate represents the iteration optimization number of times of EM algorithm, and the curve of different colours and shape represents different gaussian component.Wherein part Gaussian mixture components is zero, illustrates that degree of mixing arranges higher.Initial value is to be obtained by the initialization of K mean cluster, after 35 left and right of iteration, and algorithm convergence.
Four. identification by stages and fusion
Local emotion recognition result is carried out to overall emotional fusion, is no more than the capacity 18 seconds of short-term memory:
The output emotion vector of every section of emotion recognition unit s (i)
e(i)=[e(i,1),e(i,2),…,e(i,p)] (18)
P is emotion categorical measure, according to merging weight w (i), fusion is adjudicated in each emotion recognition unit, and final emotion is output as
E=w(1)×e(1)+w(2)×e(2)+…+w(m)×e(m) (19)
E is final emotion recognition result, and m is local voice segments sequence number.
Five. in emotion recognition, embed continuously the compliance test result of short-term memory forgetting curve weight
Select continuous emotion language material to test.The sample that has comprised five kinds of affective states such as irritated, happy, tired, confidence and neutral state in experimental data, has retained the serial number information of recording collection site between statement sample, adjacent statement can obtain by adjacent serial number.Every kind of emotion language material comprises 1000 paragraph samples, amounts to 5000.The discrimination of continuous speech emotion recognition as shown in Figure 5, embeds after short-term memory forgetting curve, by weight fusion method, improved stress emotion recognition effect, as shown in the recognition result in Fig. 6.For the affective style relevant with cognitive process such as irritated, tired, self-confident, detect, stress in space environment, have important application prospect, can to personnel's emotional stability, carry out early warning and monitoring timely.
The scope that the present invention asks for protection is not limited only to the description of this embodiment.
Claims (6)
1. the space flight stress emotion identification method merging based on short-term memory weight, is characterized in that comprising the following steps:
Step 1 extracts the prior probability of recalling accuracy from short-term memory forgetting law curve;
Step 2, by recalling prior probability and the subjective adjustment factor of accuracy, obtains the weights of importance of time domain frame;
Step 3, cuts apart continuous voice signal, forms emotion recognition unit;
Step 4, according to the weights of importance of every frame, obtains the fusion weight of emotion recognition unit;
Step 5, carries out speech emotional identification to each emotion recognition unit.
2. the space flight stress emotion identification method merging based on short-term memory weight according to claim 1, is characterized in that, described step 1 specifically comprises the following steps:
Step 1-1, samples to short-term memory forgetting curve, and the time in sampling interval is t, 0 < t < 18s;
Step 1-2, obtains k sample value d(k), as the prior probability of recalling accuracy;
3. the space flight stress emotion identification method merging based on short-term memory weight according to claim 1, is characterized in that, described step 2 specifically comprises the following steps:
Step 2-1, arranges subjective adjustment factor r, and the reference value of r is 1, can (0,1] dynamic adjustments in scope;
Step 2-2, by taking advantage of sexual intercourse to obtain weights of importance f (n)=r * d (k) of time domain frame, wherein n is frame sequence; The frame memory accuracy more waning to the close is higher, and weight is higher, and it is zero that the frame beyond 18 seconds is recalled accuracy, and corresponding weights of importance is also zero;
4. the space flight stress emotion identification method merging based on short-term memory weight according to claim 1, is characterized in that, described step 3 specifically comprises the following steps:
Step 3-1, is divided into continuous voice signal the unit that is less than 18s, and the persistence length reference value of unit is that 2s is to 9s; Dividing method be take naturally and to be paused and syllable is separatrix;
Step 3-2, extracts the feature of voice segments frame by frame, comprises the parameters,acoustics such as fundamental tone, intensity, resonance peak.
5. the space flight stress emotion identification method merging based on short-term memory weight according to claim 1, is characterized in that, described step 4 specifically comprises the following steps:
Step 4-1, to cutting apart m the voice segments s (i) obtaining in step 3 claimed in claim 1, i=1,2 ... m, the weights of importance f of the frame that extraction frame sequence is separately corresponding respectively
i(n);
6. the space flight stress emotion identification method merging based on short-term memory weight according to claim 1, is characterized in that, described step 5 specifically comprises the following steps:
Step 5-1, adopts gauss hybrid models to carry out emotion modeling to voice segments to training data, obtains probability Distribution Model;
Step 5-2, carries out the emotion recognition based on bayesian criterion to input data, obtain the emotion vector e (i) of every section of emotion recognition unit s (i)=[e(i, 1), e (i, 2) ..., e (i, p)], p is emotion categorical measure;
Step 5-3, according to merging weight w (i), adjudicates fusion to each emotion recognition unit, final emotion be output as E=w (1) * e (1)+w (2) * e (2)+... + w (m) * e (m).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310534910.3A CN103531208B (en) | 2013-11-01 | 2013-11-01 | A kind of space flight stress emotion identification method based on short term memory weight fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310534910.3A CN103531208B (en) | 2013-11-01 | 2013-11-01 | A kind of space flight stress emotion identification method based on short term memory weight fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103531208A true CN103531208A (en) | 2014-01-22 |
CN103531208B CN103531208B (en) | 2016-08-03 |
Family
ID=49933160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310534910.3A Expired - Fee Related CN103531208B (en) | 2013-11-01 | 2013-11-01 | A kind of space flight stress emotion identification method based on short term memory weight fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103531208B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766459A (en) * | 2018-06-13 | 2018-11-06 | 北京联合大学 | Target speaker method of estimation and system in a kind of mixing of multi-person speech |
CN110334705A (en) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | A kind of Language Identification of the scene text image of the global and local information of combination |
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101346758A (en) * | 2006-06-23 | 2009-01-14 | 松下电器产业株式会社 | Emotion recognizer |
CN101894550A (en) * | 2010-07-19 | 2010-11-24 | 东南大学 | Speech emotion classifying method for emotion-based characteristic optimization |
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
CN103021406A (en) * | 2012-12-18 | 2013-04-03 | 台州学院 | Robust speech emotion recognition method based on compressive sensing |
-
2013
- 2013-11-01 CN CN201310534910.3A patent/CN103531208B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101346758A (en) * | 2006-06-23 | 2009-01-14 | 松下电器产业株式会社 | Emotion recognizer |
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
CN101894550A (en) * | 2010-07-19 | 2010-11-24 | 东南大学 | Speech emotion classifying method for emotion-based characteristic optimization |
CN103021406A (en) * | 2012-12-18 | 2013-04-03 | 台州学院 | Robust speech emotion recognition method based on compressive sensing |
Non-Patent Citations (1)
Title |
---|
MARTIN WÖLLMER等: "Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening", 《IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766459A (en) * | 2018-06-13 | 2018-11-06 | 北京联合大学 | Target speaker method of estimation and system in a kind of mixing of multi-person speech |
CN108766459B (en) * | 2018-06-13 | 2020-07-17 | 北京联合大学 | Target speaker estimation method and system in multi-user voice mixing |
CN110334705A (en) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | A kind of Language Identification of the scene text image of the global and local information of combination |
CN110334705B (en) * | 2019-06-25 | 2021-08-03 | 华中科技大学 | Language identification method of scene text image combining global and local information |
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN112002348B (en) * | 2020-09-07 | 2021-12-28 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
Also Published As
Publication number | Publication date |
---|---|
CN103531208B (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108899051B (en) | Speech emotion recognition model and recognition method based on joint feature representation | |
CN102800316B (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
US20170154640A1 (en) | Method and electronic device for voice recognition based on dynamic voice model selection | |
US11837252B2 (en) | Speech emotion recognition method and system based on fused population information | |
CN106098059A (en) | customizable voice awakening method and system | |
CN109389992A (en) | A kind of speech-emotion recognition method based on amplitude and phase information | |
CN105845140A (en) | Speaker confirmation method and speaker confirmation device used in short voice condition | |
CN105760852A (en) | Driver emotion real time identification method fusing facial expressions and voices | |
CN105374352A (en) | Voice activation method and system | |
CN101930735A (en) | Speech emotion recognition equipment and speech emotion recognition method | |
CN104200804A (en) | Various-information coupling emotion recognition method for human-computer interaction | |
CN110853656B (en) | Audio tampering identification method based on improved neural network | |
CN104050965A (en) | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof | |
Chaudhary et al. | Gender identification based on voice signal characteristics | |
CN108269574B (en) | Method and device for processing voice signal to represent vocal cord state of user, storage medium and electronic equipment | |
CN110992959A (en) | Voice recognition method and system | |
CN103559289B (en) | Language-irrelevant keyword search method and system | |
CN103531208A (en) | Astronautic stress emotion identification method based on short-term memory weight fusion | |
CN106710588B (en) | Speech data sentence recognition method, device and system | |
CN110619886B (en) | End-to-end voice enhancement method for low-resource Tujia language | |
Jahangir et al. | Automatic speaker identification through robust time domain features and hierarchical classification approach | |
Jacob et al. | Prosodic feature based speech emotion recognition at segmental and supra segmental levels | |
Kumar et al. | Transfer learning based convolution neural net for authentication and classification of emotions from natural and stimulated speech signals | |
Yadav et al. | Speech emotion classification using machine learning | |
Benhammoud et al. | Automatic classification of disordered voices with hidden Markov models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160803 Termination date: 20201101 |
|
CF01 | Termination of patent right due to non-payment of annual fee |