Nothing Special   »   [go: up one dir, main page]

CN108986788A - A kind of noise robust acoustic modeling method based on aposterior knowledge supervision - Google Patents

A kind of noise robust acoustic modeling method based on aposterior knowledge supervision Download PDF

Info

Publication number
CN108986788A
CN108986788A CN201810576451.8A CN201810576451A CN108986788A CN 108986788 A CN108986788 A CN 108986788A CN 201810576451 A CN201810576451 A CN 201810576451A CN 108986788 A CN108986788 A CN 108986788A
Authority
CN
China
Prior art keywords
model
training
supervision
feature
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810576451.8A
Other languages
Chinese (zh)
Inventor
潘子春
李葵
李明
张引强
黄影
赵峰
吴立刚
徐海青
章爱武
陈是同
徐唯耀
秦浩
王文清
郑娟
秦婷
梁翀
浦正国
张天奇
余江斌
韩涛
杨维
张才俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, Anhui Jiyuan Software Co Ltd, Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810576451.8A priority Critical patent/CN108986788A/en
Publication of CN108986788A publication Critical patent/CN108986788A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of noise robust acoustic modeling methods based on aposterior knowledge supervision, belong to voice human-computer interaction technique field, this method comprises: obtaining the Posterior probability distribution of clean speech by the training of teacher's model;The training for supervising student model as standard using the Posterior probability distribution of the clean speech makes student model infinitely approach the Posterior probability distribution of teacher's model;Wherein, teacher's model is the model of clean speech training, and the student model is the model of noisy speech training.The acoustic model of the exemplary modeling method of the present invention, foundation has stronger environmental robustness, shows superior noiseproof feature.

Description

A kind of noise robust acoustic modeling method based on aposterior knowledge supervision
Technical field
The invention belongs to voice human-computer interaction technique field, specifically a kind of noise Shandong based on aposterior knowledge supervision Stick Acoustic Modeling method.
Background technique
In recent years, continuous with the development of the technologies such as speech recognition, natural language processing, deep learning and the market demand In-depth, the research and development and application of interactive voice product are increasingly becoming a new hot spot;On the other hand, due to practical application scene Complexity, the operation of voice interactive system is typically located in the environment of a low signal-to-noise ratio, due to the anti-interference ability to noise Deficiency, system interaction often will appear situations such as speech recognition accuracy is low or human-computer interaction is chaotic in the process, lead to service pair The interactive experience sense of elephant is bad, largely limits the market application of interactive voice product and promotes.
It is to determine voice that correlative study, which shows that can Speech acoustics model extract complete phoneme information from noisy speech, The key of interactive system noise robustness, deficiency of the acoustic model on noise robustness are mainly that model construction stage environment is made an uproar Acoustic conductance cause training data and test data mismatch and caused by, i.e. the purpose of raising noise robustness is to drop to greatest extent Influence that is low or eliminating such factor.So far, many scholars of field of speech recognition open acoustic model noise robustness It has opened up extensive research and has proposed a variety of improvement strategies, wherein application effect preferably has feature compensation, model compensation, robustness Four kinds of methods of feature extraction and speech enhan-cement.
Feature and model compensation are the noise robustness methods for optimizing processing to acoustic model by adaptive algorithm. Such as Leggetter etc. returns (MLLR) algorithm using maximum likelihood and carries out model adaptation;Tran etc. passes through linear decomposition net Network carries out self-adaptive processing to the input data of the acoustic training model based on deep neural network (DNN), enables acoustic model The data structure of enough preferable matching noisy speeches, model robustness get a promotion.
Robust features extraction refers to the characteristic parameter extracted from corpus for insensitive for noise, constructs anti-noise ability Strong characteristic sequence, to improve the noise robustness of acoustic model.Cepstrum mean normalization method (CMN) and mean variance normalizing Change method (MVN) is most common two kinds of robust features extracting methods, and in addition some scholars will perceive linear predictor coefficient (PLP) feature is combined with relative spectrum (RASTA) filtering, reinforces acoustic model to the robustness of additive noise and linear filtering;Separately Outer Liu Changzheng etc. takes the mode of supervised learning using MFCC feature as the input of CNN network, extracts the voice of higher Feature, experiment, which shows these features in a noisy environment, has preferable timing invariance.
Most common mode is to update to eliminate the spectrum-subtraction combined to voice with noise by noise to speech enhan-cement now Subtract from noisy speech spectrum with noise independent process assuming that estimate the noise spectrum of corpus in situation known to noise information The noise spectrum estimated is gone to obtain the clean spectrum of corpus, to extract the instruction that the clean feature in noise speech is used for acoustic model Practice;Furthermore Xu etc. proposes the mode that spectrum-subtraction is combined with DNN network, and spectrum-subtraction treated feature and noise estimation are joined Number is input in DNN network as basic sample, and the depth acoustic model obtained by noise independent training is compared with spectrum-subtraction Noiseproof feature is more preferable.
Although above-mentioned four kinds of methods can effectively promote the environmental robustness of acoustic model, theoretical upper with application There are problems that two: first is that the above method exercises supervision simply by noise reduction of the clean speech to noisy speech or makes an uproar by band Voice is fitted clean speech, reduces otherness between the two, does not excavate the tacit knowledge of clean speech sufficiently, right The refinement of information is not enough;On the other hand, acoustic feature extraction module and subsequent training identified in above-mentioned four classes method Journey is independent from each other, and does not account for the inner link between device modeling and characterization unit, so that the target letter of model training Comprising partial redundance information, these redundancies in the phonetic feature that several and system entirety performance indicator has deviation, and extracts Information does not have noise robustness usually, causes so that optimal performance is often not achieved in entire acoustic network.
Therefore, how to improve the noise robustness of voice interactive system is urgent problem at this stage.
Summary of the invention
For above-mentioned problems of the prior art, the purpose of the present invention is to provide one kind to be supervised based on aposterior knowledge Noise robust acoustic modeling method, this method can promote the noise robustness of acoustic model.
The technical scheme adopted by the invention is as follows:
Provide a kind of noise robust acoustic modeling method based on aposterior knowledge supervision, comprising:
The Posterior probability distribution of clean speech is obtained by the training of teacher's model;
The training for supervising student model as standard using the Posterior probability distribution of the clean speech keeps student model unlimited Approach the Posterior probability distribution of teacher's model;
Wherein, teacher's model is the model of clean speech training, and the student model is the mould of noisy speech training Type.
Further, the training of teacher's model, comprising:
Feature X is carried out to clean speechtIt extracts;
To dividing the feature X after windowtIt carries out forcing alignment frame by frame, and obtains the hard mark of each frame voice data;Described point Window, that is, framing and adding window usually carry out framing to voice data according to preset parameter, and adding window is convenient for subsequent characteristics alignment.
The start-stop point on time dimension is carried out to each hard mark on the basis of forcing alignment to mark;
The start-stop point markup information and hard labeled data are sent into DNN module as supervision message and carry out acoustic model Modeling training.
Further, the feature after described pair of point window carries out forcing alignment frame by frame, is carried out by GMM-HMM module.
Further, the modeling training of the acoustic model, comprising:
By feature XtAs mode input, phoneme is marked firmly with labeled data as supervision message, is obtained using forwards algorithms Three factor Posterior probability distributions of data frame by frame out.
Further, the training of the student model, comprising:
Preliminary feature X is carried out to noisy speechsIt extracts;
The phoneme feature X extractedsParallel alignment is carried out with the soft mark of teacher's model, to obtain the soft of student model Mark;
High-level characteristic is extracted on the basis of the acoustic feature tentatively extracted, and carries out the dimensionality reduction of high-level characteristic, extracts energy Enough characteristic sequences that noise speech invariance is characterized;
High-level characteristic input DNN module is carried out to the modeling training of acoustic model.
Further, the extraction high-level characteristic is locally connected by CNN network and is extracted with down-sampled module.
Further, the training process of the neural network module is using opposite entropy minimization as Optimality Criteria.
Further, the Posterior probability distribution otherness of teacher's model and student model, passes through the relative entropy amount of progress Change.
Further, the relative entropy of teacher's model and student model are as follows:
Wherein: PtFor the Posterior probability distribution of teacher's model, QsFor the Posterior probability distribution of student model, i indicates triphones Order in state set, phiFor i-th of state in triphones state set, XtIt indicates for training the clean of teacher's model Phonetic feature, XsIndicate the noisy speech feature for being used for training of students model, Pt(phi︱ Xt) indicate feature XtIt is identified as i-th The posterior probability of triphones state, Qs(phi︱ Xs) indicate feature XsIt is identified as the posterior probability of i-th of triphones state.
Further, the Posterior probability distribution relative entropy of teacher's model and student model are as follows:
Compared with prior art, the invention has the benefit that
1, the exemplary noise robust acoustic modeling method based on aposterior knowledge supervision of the present invention, with clean speech training Model is as teacher's model, and the model of noisy speech training as student model, know by the Posterior probability distribution for refining teacher's model Know the training for supervising student model, indirect reaches the requirement for improving acoustic model environmental robustness.
2, the exemplary noise robust acoustic modeling method based on aposterior knowledge supervision of the present invention, using CNN (convolutional Neural Network) the acoustic training model network structure that is combined with DNN (deep neural network), wherein CNN module is made an uproar for extracting band The Invariance feature of voice, DNN are used for Acoustic Modeling, the training of whole network parameter by CNN and DNN module linkage adjustment with Optimization, the model of building have carried out the verifying of the speech recognition performance under different signal-to-noise ratio and comparison, test on CHIME data set The result shows that the model has stronger environmental robustness, superior noiseproof feature is shown.
3, the exemplary noise robust acoustic modeling method based on aposterior knowledge supervision of the present invention, the CNN-DNN of use Raw model increases the extraction that convolutional neural networks module carries out voice high-level characteristic, can preferably catch compared with DNN model Catch the timing invariance of noisy speech;In addition down-sampled (Pooling) layer inside CNN convolutional neural networks is superfluous to phonetic feature Remaining information has rejecting effect, realizes phonetic feature dimensionality reduction, also promotes while improving acoustic model noise robustness The improved efficiency of model training.
4, the exemplary noise robust acoustic modeling method based on aposterior knowledge supervision of the present invention, is handed over compared to traditional standard It pitches entropy (CE) and minimizes criterion, 0-1 vector (hard mark) is substituted with probability vector (soft mark), soft mark is to posterior probability The deep layer of distribution is refined, and the useful information for including is richer, more conducively the modeling of robustness acoustic model.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is the flow chart of the embodiment of the present invention;
Fig. 2 is the flow chart of teacher of embodiment of the present invention model training;
Fig. 3 is the structural schematic diagram of GMM-HMM module;
Fig. 4 is the flow chart of student model of embodiment of the present invention training.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in Figure 1, An embodiment provides a kind of noise robust acoustics based on aposterior knowledge supervision Modeling method, comprising:
S1: the Posterior probability distribution of clean speech is obtained by the training of teacher's model;
S2: the training of student model is supervised as standard using the Posterior probability distribution of the clean speech, makes student model Infinitely approach the Posterior probability distribution of teacher's model;
Wherein, teacher's model is the model of clean speech training, and the student model is the mould of noisy speech training Type.
For the Posterior probability distribution otherness of two kinds of models, the present embodiment is quantified using KL divergence (relative entropy). For acoustic model, the physical significance of KL divergence refers to that in identical basic speech space, probability distribution P (x) is corresponding Each phoneme feature, if encoded with probability distribution Q (x), average each increased bit number of phoneme feature coding length.This reality It applies official holiday and sets PtFor the Posterior probability distribution of teacher's model, QSFor the Posterior probability distribution of student model, QSIt is equivalent to PtPosteriority The approximate evaluation of probability distribution, therefore the relative entropy of the two may be expressed as:
Wherein: i indicates the order in triphones state set, phiFor i-th of state in triphones state set, XtTable Show the clean speech feature for training teacher's model, XsIndicate the noisy speech feature for being used for training of students model, Pt(phi| Xt) indicate feature XtIt is identified as the posterior probability of i-th of triphones state, Qs(phi|Xs) indicate feature XsIt is identified as i-th The posterior probability of a triphones state, the formula can be simplified to following form by deformation:
It can be found through observation,Calculating and student model modeling process without It closes, can ignore during practical supervised training, therefore the Posterior probability distribution relative entropy of two kinds of models can indicate are as follows:
Above-mentioned formula formally sees the calculating similar to standard cross entropy (CE), the difference is that standard cross entropy (CE) is Experienced probability distribution and model Posterior probability distribution to training data carry out difference analysis, in general, empirical probability point Cloth is usually to be marked firmly with 0-1 vector to be described, and the relative entropy of teacher's model and student model is to two kinds of models Posterior probability distribution carries out otherness comparison, is equivalent to " hard mark " being substituted for " soft mark ".
The building of teacher's model is training step such as Fig. 2 institute based on GMM-HMM and the mixed model of neural network Show:
Feature X is carried out to clean speech firsttIt extracts, GMM-HMM module is to the feature X after dividing windowtCarry out pressure pair frame by frame Together, and the hard mark of each frame voice data is obtained, i.e., 0-1 vector determination is carried out to the triphones state of each frame, belongs to certain Then observation probability is set as 1 to one phoneme state, is not belonging to be set as 0, to obtain the triphones state observation probability of each frame data Distribution, such as [1 1010 0];The start-stop point on time dimension is carried out to each hard mark on the basis of forcing alignment Mark, the markup information and hard labeled data are sent into the modeling instruction that neural network module carries out acoustic model as supervision message Practice.The structure of GMM-HMM module is as shown in Figure 3.Above-mentioned point of window, that is, framing and adding window, usually according to preset parameter to voice Data carry out framing, and adding window is convenient for subsequent characteristics alignment.
The training of neural network module is with feature XtAs mode input, phoneme is marked firmly with labeled data as supervision letter Breath obtains the triphones Posterior probability distribution (hard mark) of data frame by frame using forwards algorithms.The difference of hard mark and soft mark It is, it is soft to mark the triphones state Posterior probability distribution for referring to each frame data, rather than simple 0-1 judges, thus obtains The forms of soft mark of each frame data be similar to [0.2 0.15 0.3 0.1 0.1 0.1], each data therein indicate The frame data belong to the posterior probability of different triphones states.
The method that the building of student model is combined using CNN with DNN network, the propaedeutics process of student model is as schemed Shown in 4:
The training of student model carries out preliminary feature X to noisy speech firstsIt extracts, the phoneme feature X extractedsWith it is old The soft mark of teacher's model carries out parallel alignment, to obtain the soft mark of student model.On the basis of preliminary feature extraction, borrow CNN network is helped locally to connect the functional characteristic with down-sampled module, on the acoustic feature basis that MFCC and FBANK etc. is tentatively extracted Upper extraction high-level characteristic, and carry out the dimensionality reduction of high-level characteristic, to extract and can be characterized to noise speech invariance Characteristic sequence;On the other hand, it is contemplated that DNN network has powerful classification capacity, has surmounted in the performance of acoustic model High-level characteristic is finally inputted DNN layers of progress Acoustic Modeling by the conventional models such as GMM, the training process of entire prototype network with Opposite entropy minimization (formula 3) is used as Optimality Criteria.The dimensionality reduction of above-mentioned high-level characteristic, which refers to, carries out characteristic pattern by pooling layers Dimensionality reduction and the condensed important feature with local summaries.
The noise robust acoustic modeling method based on aposterior knowledge supervision of the present embodiment, similar to teacher's instruction of papil Mode instructs the training of student model as supervision message using the Posterior probability distribution (soft mark) of teacher's model, and A kind of student model based on CNN-DNN hybrid network is designed, is refined by the high-level characteristic to noisy speech, is promoted The noiseproof feature of acoustic model.The student model of the present embodiment building has carried out performance verification work in the case where CHIME band makes an uproar data set Make, experimental result shows that the student model Word Error Rate under three kinds of teacher's model supervision averagely has dropped compared with baseline model 5.21%, 6.35% and 7.83%, show that aposterior knowledge measure of supervision proposed in this paper has very the robustness of acoustic model Good promotion effect.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.
Except for the technical features described in the specification, remaining technical characteristic is the known technology of those skilled in the art, is prominent Innovative characteristics of the invention out, details are not described herein for remaining technical characteristic.

Claims (10)

1. a kind of noise robust acoustic modeling method based on aposterior knowledge supervision, characterized in that include:
The Posterior probability distribution of clean speech is obtained by the training of teacher's model;
The training for supervising student model as standard using the Posterior probability distribution of the clean speech, approaches student model infinitely The Posterior probability distribution of teacher's model;
Wherein, teacher's model is the model of clean speech training, and the student model is the model of noisy speech training.
2. the noise robust acoustic modeling method according to claim 1 based on aposterior knowledge supervision, characterized in that described The training of teacher's model, comprising:
Feature X is carried out to clean speechtIt extracts;
To dividing the feature X after windowtIt carries out forcing alignment frame by frame, and obtains the hard mark of each frame voice data;
The start-stop point on time dimension is carried out to each hard mark on the basis of forcing alignment to mark;
The start-stop point markup information and hard labeled data are sent into the modeling that DNN module carries out acoustic model as supervision message Training.
3. the noise robust acoustic modeling method according to claim 2 based on aposterior knowledge supervision, characterized in that described To dividing the feature after window to carry out forcing alignment frame by frame, carried out by GMM-HMM module.
4. the noise robust acoustic modeling method according to claim 2 based on aposterior knowledge supervision, characterized in that described The modeling training of acoustic model, comprising:
By feature XtAs mode input, phoneme is marked firmly with labeled data as supervision message, is obtained frame by frame using forwards algorithms Three factor Posterior probability distributions of data.
5. the noise robust acoustic modeling method according to claim 1 based on aposterior knowledge supervision, characterized in that described The training of student model, comprising:
Preliminary feature X is carried out to noisy speechsIt extracts;
The phoneme feature X extractedsParallel alignment is carried out with the soft mark of teacher's model, to obtain the soft mark of student model;
High-level characteristic is extracted on the basis of the acoustic feature tentatively extracted, and carries out the dimensionality reduction of high-level characteristic, and extracting can be right The characteristic sequence that noise speech invariance is characterized;
High-level characteristic input DNN module is carried out to the modeling training of acoustic model.
6. the noise robust acoustic modeling method according to claim 5 based on aposterior knowledge supervision, characterized in that described It extracts high-level characteristic and is extracted by the part connection of CNN network with down-sampled module.
7. the noise robust acoustic modeling method according to claim 5 based on aposterior knowledge supervision, characterized in that described The training process of neural network module is using opposite entropy minimization as Optimality Criteria.
8. the noise robust acoustic modeling method according to claim 7 based on aposterior knowledge supervision, characterized in that described The Posterior probability distribution otherness of teacher's model and student model, is quantified by relative entropy.
9. the noise robust acoustic modeling method according to claim 8 based on aposterior knowledge supervision, characterized in that described The relative entropy of teacher's model and student model are as follows:
Wherein: PtFor the Posterior probability distribution of teacher's model, QsFor the Posterior probability distribution of student model, i indicates triphones state Order in set, phiFor i-th of state in triphones state set, XtIndicate the clean speech for training teacher's model Feature, XsIndicate the noisy speech feature for being used for training of students model, Pt(phi︱ Xt) indicate feature XtIt is identified as i-th of three sounds The posterior probability of plain state, Qs(phi︱ Xs) indicate feature XsIt is identified as the posterior probability of i-th of triphones state.
10. the noise robust acoustic modeling method according to claim 9 based on aposterior knowledge supervision, characterized in that institute State the Posterior probability distribution relative entropy of teacher's model and student model are as follows:
CN201810576451.8A 2018-06-06 2018-06-06 A kind of noise robust acoustic modeling method based on aposterior knowledge supervision Pending CN108986788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810576451.8A CN108986788A (en) 2018-06-06 2018-06-06 A kind of noise robust acoustic modeling method based on aposterior knowledge supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810576451.8A CN108986788A (en) 2018-06-06 2018-06-06 A kind of noise robust acoustic modeling method based on aposterior knowledge supervision

Publications (1)

Publication Number Publication Date
CN108986788A true CN108986788A (en) 2018-12-11

Family

ID=64540863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810576451.8A Pending CN108986788A (en) 2018-06-06 2018-06-06 A kind of noise robust acoustic modeling method based on aposterior knowledge supervision

Country Status (1)

Country Link
CN (1) CN108986788A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246487A (en) * 2019-06-13 2019-09-17 苏州思必驰信息科技有限公司 Optimization method and system for single pass speech recognition modeling
CN110610715A (en) * 2019-07-29 2019-12-24 西安工程大学 Noise reduction method based on CNN-DNN hybrid neural network
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 Method and system for rapidly building robust acoustic model
CN111599373A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Compression method of noise reduction model
CN112291424A (en) * 2020-10-29 2021-01-29 上海观安信息技术股份有限公司 Fraud number identification method and device, computer equipment and storage medium
CN113380268A (en) * 2021-08-12 2021-09-10 北京世纪好未来教育科技有限公司 Model training method and device and speech signal processing method and device
WO2023279693A1 (en) * 2021-07-09 2023-01-12 平安科技(深圳)有限公司 Knowledge distillation method and apparatus, and terminal device and medium
US11907845B2 (en) 2020-08-17 2024-02-20 International Business Machines Corporation Training teacher machine learning models using lossless and lossy branches

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
CN104392718A (en) * 2014-11-26 2015-03-04 河海大学 Robust voice recognition method based on acoustic model array
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system
CN105609100A (en) * 2014-10-31 2016-05-25 中国科学院声学研究所 Acoustic model training and constructing method, acoustic model and speech recognition system
US20170263240A1 (en) * 2012-11-29 2017-09-14 Sony Interactive Entertainment Inc. Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
US20170263240A1 (en) * 2012-11-29 2017-09-14 Sony Interactive Entertainment Inc. Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection
CN105609100A (en) * 2014-10-31 2016-05-25 中国科学院声学研究所 Acoustic model training and constructing method, acoustic model and speech recognition system
CN104392718A (en) * 2014-11-26 2015-03-04 河海大学 Robust voice recognition method based on acoustic model array
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246487A (en) * 2019-06-13 2019-09-17 苏州思必驰信息科技有限公司 Optimization method and system for single pass speech recognition modeling
CN110246487B (en) * 2019-06-13 2021-06-22 思必驰科技股份有限公司 Optimization method and system for single-channel speech recognition model
CN110610715B (en) * 2019-07-29 2022-02-22 西安工程大学 Noise reduction method based on CNN-DNN hybrid neural network
CN110610715A (en) * 2019-07-29 2019-12-24 西安工程大学 Noise reduction method based on CNN-DNN hybrid neural network
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 Method and system for rapidly building robust acoustic model
CN110634476B (en) * 2019-10-09 2022-06-14 深圳大学 Method and system for rapidly building robust acoustic model
CN111599373A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Compression method of noise reduction model
CN111599373B (en) * 2020-04-07 2023-04-18 云知声智能科技股份有限公司 Compression method of noise reduction model
US11907845B2 (en) 2020-08-17 2024-02-20 International Business Machines Corporation Training teacher machine learning models using lossless and lossy branches
CN112291424B (en) * 2020-10-29 2021-09-14 上海观安信息技术股份有限公司 Fraud number identification method and device, computer equipment and storage medium
CN112291424A (en) * 2020-10-29 2021-01-29 上海观安信息技术股份有限公司 Fraud number identification method and device, computer equipment and storage medium
WO2023279693A1 (en) * 2021-07-09 2023-01-12 平安科技(深圳)有限公司 Knowledge distillation method and apparatus, and terminal device and medium
CN113380268A (en) * 2021-08-12 2021-09-10 北京世纪好未来教育科技有限公司 Model training method and device and speech signal processing method and device

Similar Documents

Publication Publication Date Title
CN108986788A (en) A kind of noise robust acoustic modeling method based on aposterior knowledge supervision
CN104036774B (en) Tibetan dialect recognition methods and system
WO2018054361A1 (en) Environment self-adaptive method of speech recognition, speech recognition device, and household appliance
CN113488058B (en) Voiceprint recognition method based on short voice
CN108694949B (en) Speaker identification method and device based on reordering supervectors and residual error network
CN109616105A (en) A kind of noisy speech recognition methods based on transfer learning
CN108806667A (en) The method for synchronously recognizing of voice and mood based on neural network
CN100440315C (en) Speaker recognition method based on MFCC linear emotion compensation
CN110211594B (en) Speaker identification method based on twin network model and KNN algorithm
CN103811009A (en) Smart phone customer service system based on speech analysis
CN103730114A (en) Mobile equipment voiceprint recognition method based on joint factor analysis model
CN101246685A (en) Pronunciation quality evaluation method of computer auxiliary language learning system
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN109243460A (en) A method of automatically generating news or interrogation record based on the local dialect
JPH075892A (en) Voice recognition method
Marchi et al. Generalised discriminative transform via curriculum learning for speaker recognition
CN107039036A (en) A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN109637526A (en) The adaptive approach of DNN acoustic model based on personal identification feature
KR20190112682A (en) Data mining apparatus, method and system for speech recognition using the same
CN110047504A (en) Method for distinguishing speek person under identity vector x-vector linear transformation
CN100570712C (en) Based on anchor model space projection ordinal number quick method for identifying speaker relatively
CN101178895A (en) Model self-adapting method based on generating parameter listen-feel error minimize
CN118173092A (en) Online customer service platform based on AI voice interaction
CN105845131A (en) Far-talking voice recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211