WO2018166316A1 - 融合多种端到端神经网络结构的说话人感冒症状识别方法 - Google Patents
融合多种端到端神经网络结构的说话人感冒症状识别方法 Download PDFInfo
- Publication number
- WO2018166316A1 WO2018166316A1 PCT/CN2018/076272 CN2018076272W WO2018166316A1 WO 2018166316 A1 WO2018166316 A1 WO 2018166316A1 CN 2018076272 W CN2018076272 W CN 2018076272W WO 2018166316 A1 WO2018166316 A1 WO 2018166316A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- neural network
- input
- speech
- speaker
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 title claims abstract description 23
- 206010022000 influenza Diseases 0.000 title abstract 3
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 23
- 230000006403 short-term memory Effects 0.000 claims abstract description 15
- 230000007787 long-term memory Effects 0.000 claims abstract description 14
- 238000001228 spectrum Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000011176 pooling Methods 0.000 claims description 16
- 201000009240 nasopharyngitis Diseases 0.000 claims description 11
- 238000013135 deep learning Methods 0.000 claims description 6
- 210000005036 nerve Anatomy 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 2
- 230000004913 activation Effects 0.000 claims 1
- 230000007935 neutral effect Effects 0.000 abstract 1
- 238000012549 training Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Definitions
- the invention relates to the field of speech processing technology, and proposes a speaker cold symptom recognition method combining various end-to-end deep learning structures
- Speaker recognition also known as voiceprint recognition, refers to the technique of automatically identifying a speaker by using pattern recognition technology by including unique speaker information in the voice.
- the current speaker technology achieves good performance in experimental conditions, but in practice, speech is affected by environmental noise and speaker health conditions, making the robustness of existing speaker recognition techniques less, and the cold speech recognition method.
- By classifying the existing speech to determine whether it is a cold speech it is possible to improve the robustness of the speaker recognition by using the cold speech recognition method to determine in advance whether the speech is a cold speech or not, and then performing speaker recognition.
- the speech feature extraction is to extract the speaker's speech features and vocal features.
- the mainstream characteristic parameters including MFCC, LPCC, CQCC, etc. are all based on a single feature, which lacks the information to characterize the symptoms of the speaker's cold, and affects the recognition accuracy.
- MFCC MFCC
- LPCC LPCC
- CQCC CQCC
- a large amount of knowledge of classification target speech is needed.
- the method based on vocal tract model and speech model knowledge is earlier, but because of the complexity of the model, it has not achieved good practical effects, and the model Matching methods such as dynamic time warping, hidden Markov model, vector quantization, etc. begin to exert good recognition results.
- Separating feature extraction and pattern classification is a common method for identification research, but there are problems that the features and models do not match, the training is difficult, and the features are not easy to find.
- the classical recognition framework has the above problems.
- the present invention proposes a method for identifying a cold symptom of a speaker that incorporates various end-to-end deep learning structures. We constructed four different end-to-end deep learning networks, and finally merged four different end-to-end neural network structures for speaker cold symptom recognition.
- the four end-to-end deep learning structures are: 1. Input is voice, the network is multi-layer convolutional neural network and long-term and short-term memory network; 2. Input is voice spectrum, network is multi-layer convolutional neural network and long Short-term memory network; 3, the input is the speech spectrum, the network is a multi-layer convolutional neural network and a fully connected network; 4. The input is the Mel cepstrum coefficient and the constant Q cepstral coefficient, and the network is a long-term and short-term memory network;
- the beneficial effects of the present invention are: based on the uncertainty of the traditional features, the output obtained by the neural network training can better express the characteristics of the speaker's cold symptoms, and the input is relatively simple, without excessive feature processing. . Because voice has timing information, we have better results through long- and short-term memory networks. By unifying feature learning and pattern classification, the whole speaker's cold symptom recognition process is simpler and faster, and has broad application prospects.
- FIG 1 shows the flow of voice extraction of the Mel Cepstral Coefficient (MFCC).
- FIG. 1 shows the flow of the speech extraction constant Q cepstrum coefficient (CQCC).
- Figure 3 shows the first end-to-end neural network.
- the input is voice and the network is CNN+LSTM.
- Figure 4 shows the second end-to-end neural network.
- the input is the voice spectrum and the network is CNN+LSTM.
- Figure 5 shows the third end-to-end neural network, the input is the voice spectrum, and the network is the CNN+ fully connected network.
- Figure 6 shows the fourth end-to-end neural network.
- the input is the Mel cepstrum coefficient or the constant Q cepstral coefficient.
- the network is LSTM.
- Step 1 Construct an end-to-end neural network with input as voice and network as CNN+LSTM.
- the input speech is divided into segments of the same size, such as 40ms, and then averaged, and the corresponding convolutional neural network.
- It consists of 8 modules, each of which consists of a one-dimensional convolution layer, a ReLU active layer, and a one-dimensional maximum pooling layer.
- Each convolution kernel has a size of 32 and a pooled core has a size of 2.
- the pooling step size is 2. Then use long and short-term memory networks for classification.
- Step 2 Construct the input as the speech spectrum, and the network is the end-to-end neural network of CNN+LSTM. Specifically, the input speech is divided into segments of the same size, and the fast Fourier transform is performed to obtain the spectrum of the speech segment.
- the convolutional neural network consists of six modules, each consisting of a two-dimensional convolutional layer, a ReLU active layer, and a two-dimensional maximum pooling layer. Among them, the first convolution layer uses 7*7 convolutional layer, the second layer uses 5*5 convolution kernel, the remaining 4 layers use 3*3 convolution kernel, and all the largest pooling layer uses 3 *3 pooled core, the pooling step size is 2. Finally, it is classified by the LSTM network.
- Step 3 Construct the input as the speech spectrum, and the network is the end-to-end neural network of CNN+LSTM. Specifically, the input speech is divided into segments of the same size, and the fast Fourier transform is performed to obtain the spectrum of the speech segment.
- the convolutional neural network consists of six modules, each consisting of a two-dimensional convolutional layer, a ReLU active layer, and a two-dimensional maximum pooling layer. Among them, the first convolution layer uses 7*7 convolutional layer, the second layer uses 5*5 convolution kernel, the remaining 4 layers use 3*3 convolution kernel, and all the largest pooling layer uses 3 *3 pooled core, the pooling step size is 2. After a full connection layer, it is finally classified by Softmax.
- Step 4 Construct an end-to-end neural network whose input is MFCC feature or CQCC feature, and the network is LSTM.
- the MFCC feature pre-emphasizes the speech, windowing frame, fast Fourier transform, and Meyer scale triangular filter bank.
- the filtering, logarithmic operation, and discrete cosine transform are finally obtained, and the CQCC feature is obtained by performing constant Q transform on the speech, obtaining the energy spectral density, taking the logarithm operation, and cosine transform.
- the speech extraction MFCC or CQCC features are used as input to the neural network, and finally classified by long-term and short-term memory networks.
- Step 5 Combine the above four networks to perform speaker cold speech recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
融合多种端到端深度学习结构的说话人感冒症状识别方法,由4个端到端的神经网络组成,当输入为原始语音或者语音频谱时,通过卷积神经网络提取最优特征,最后接长短期记忆网络或者全连接网络进行分类;当输入为梅尔倒谱系数(MFCC)或者常数Q倒谱系数(CQCC)时,直接通过长短期记忆网络进行分类,最后把这些系统融合在一起,整个流程把特征提取和模型分类两个问题统一在一起,使得整个说话人感冒症状识别过程更加简单快速。
Description
技术领域
本发明涉及语音处理技术领域,提出融合多种端到端深度学习结构的说话人感冒症状识别方法
背景技术
1、说话人识别又称声纹识别,是指通过语音中包含特有的说话人信息,利用模式识别技术自动识别说话人的技术。当前的说话人技术是实验条件中取得很好的性能,但是在实际中,语音会受到环境噪声和说话人健康条件的影响,使得已有说话人识别技术的鲁棒性降低,感冒语音识别方法通过对已有语音进行分类判断是否为感冒语音,通过感冒语音识别方法提前判别语音是否是感冒语音,再进行说话人识别,可以提高说话人识别的鲁棒性。
2、在语音技术研究中,研究者总是希望能找到表示目标类型的特征,从识别目标语音中找到明显区别正常语音的特性进行描述,语音特征提取是提取说话人的语音特征和声道特征,目前,主流的特征参数包括MFCC、LPCC、CQCC等,都是以单个特征为主,表征说话人感冒症状的信息不足,影响识别精度。同时需要大量区分分类目标语音的知识,而在语音识别算法中,起步较早的是基于声道模型和语音模型知识的方法,但是因为模型的复杂性,没有取得很好的实用效果,而模型匹配方法如动态时间规整、隐马尔可夫模型、矢量量化等技术等开始发挥良好的识别效果。把特征提取和模式分类分开研究是识别研究的常用方法,但是存在特征和模型不匹配、训练困难、特征不易寻找的问题,经典的识别框架存在上述的问题。
3、近年来随着深度学习的发展,基于深层神经网络在图像和语音的识别已显示出巨大的能量,一系列的神经网络结构也被提出,比如自动编码网络、卷积神经网络和循环神经网络等。有很多学者发现,通过神经网络对语音进行学习,可以得到更好描述语音的隐藏结构特征,端到端的识别方法就是通过尽量少的先验知识,同时对特征学习和特征识别进行处理,具有很好的识别效果。
发明内容:
根据现有识别技术都是把特征和模式分类分开研究,存在特征和模型不匹配、训练困难,特征不易寻找等问题,本发明提出融合多种端到端深度学习结构的说话人感冒症状识别方法,我们构建四种不同的端到端深度学习网络,最后融合四种不同的端到端神经网络结构进行说话人感冒症状识别。
四种端到端深度学习结构分别为:1、输入为语音,网络为多层卷积神经网络和长短期记忆网路;2、输入为语音频谱,网络为多层卷积神经网路和长短期记忆网络;3、输入为语音频谱,网络为多层卷积神经网络和全连接网络;4、输入为梅尔倒谱系数和常数Q倒谱系数,网络为长短期记忆网络;
本发明的有益效果是:基于传统特征的不确定性,我们通过神经网络训练得到的输出可以更好的表达说话人感冒症状的特征,并且输入相对来说比较简单,不用过多的进行特征处理。因为语音具有时序信息,我们通过长短期记忆网络实现分类有更好的效果。通过把特征学习和模式分类统一在一起,使得整个说话人感冒症状识别过程更加简单快速,具有广泛的应用前景。
附图说明
图一为语音提取梅尔倒谱系数(MFCC)的流程
图二为语音提取常数Q倒谱系数(CQCC)的流程
图三为第一个端到端神经网络,输入为语音,网络为CNN+LSTM。
图四为第二个端到端神经网络,输入为语音频谱,网络为CNN+LSTM
图五为第三个端到端神经网络,输入为语音频谱,网络为CNN+全连接网络
图六为第四个端到端神经网络,输入为梅尔倒谱系数或者常数Q倒谱系数,网络为LSTM。
具体实施方式:
为使本发明的技术方案和优点更加清楚,下面结合附图,对发明的技术方案进行清楚完整的描述:
步骤一:构建输入为语音、网络为CNN+LSTM的端到端神经网络,具体为:输入语音切分为相同大小的片段比如40ms,然后进行均值归一化,而相对应的卷积神经网络由8个模块组成,每一个模块是由一维卷积层、ReLU激活层、一维最大池化层组成的,其中,每一个卷积核的大小为32,池化核的大小为2,池化步长为2。而后使用长短期记忆网络进行分类。
步骤二:构建输入为语音频谱,网络为为CNN+LSTM的端到端神经网络,具体为:输入语音切分为相同大小的片段,进行快速傅里叶变换,求出语音片段的频谱图,卷积神经网络则由6个模块组成,每个模块由二维卷积层、ReLU激活层、二维最大池化层组成。其中,第一个卷积层使用7*7的卷积层,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核,所有的最大池化层使用3*3的池化核,池化步长为2。最后经过LSTM网络进行分类。
步骤三:构建输入为语音频谱,网络为为CNN+LSTM的端到端神经网络,具体为:输入语音切分为相同大小的片段,进行快速傅里叶变换,求出语音片段的频谱图,卷积神经网络则由6个模块组成,每个模块由二维卷积层、ReLU激活层、二维最大池化层组成。其中,第一个卷积层使用7*7的卷积层,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核,所有的最大池化层使用3*3的池化核,池化步长为2。再经过一个全连接层,最后经过Softmax进行分类。
步骤四:构建输入为MFCC特征或者CQCC特征,网络为LSTM的端到端神经网路,MFCC特征通过对语音进行预加重,加窗分帧、快速傅里叶变换、梅尔刻度三角滤波器组滤波、取对数运算、离散余弦变换后最终得到的,而CQCC特征是通过对语音进行常数Q变换、求能量谱密度、取对数操作、余弦变换得到的。进行经过长短期记忆网路进行分类。对语音提取MFCC或者CQCC特征作为神经网络的输入,最后经过长短期记忆网络进行分类。
步骤五:将以上四个网络融合在一起进行说话人感冒语音识别。
Claims (5)
- 融合多种端到端深度学习结构的说话人感冒症状识别方法,包括:S1、构建输入为语音,网络为卷积神经网络加上长短期记忆网络的端到端神经网络;S2、构建输入为语音频谱,网络为卷积神经网络加上长短期记忆网络的端到端神经网络;S3、构建输入为语音频谱,网络为卷积神经网络加上全连接的端到端神经网络;S4、构建输入为语音MFCC/CQCC特征,网络为长短期记忆网络的端到端神经网络;S5、融合以上四种端到端神经网络进行说话人感冒症状识别;
- 根据权利要求1所述的融合多种端到端深度学习结构的说话人感冒症状识别方法,其特征还在于:S1中所述的输入为语音、网络为CNN+LSTM的端到端神经网络,具体为,输入语音切分为相同大小的片段比如40ms,然后进行均值归一化,而相对应的卷积神经网络由8个模块组成,每一个模块是由一维卷积层、ReLU激活层、一维最大池化层组成的,其中,每一个卷积核的大小为32,池化核的大小为2,池化步长为2。而后使用长短期记忆网络进行分类。
- 根据权利要求1所述的融合多种端到端深度学习结构的说话人感冒症状识别方法,其特征还在于:S2中所述的输入为语音频谱,网络为为CNN+LSTM的端到端神经网络,具体为:输入语音切分为相同大小的片段,进行快速傅里叶变换,求出语音片段的频谱图,卷积神经网络则由6个模块组成,每个模块由二维卷积层、ReLU激活层、二维最大池化层组成。其中,第一个卷积层使用7*7的卷积层,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核,所有的最大池化层使用3*3的池化核,池化步长为2。最后经过LSTM网络进行分类。
- 根据权利要求1所述的融合多种端到端深度学习结构的说话人感冒症状识别方法,其特征还在于:S3中所述的输入为语音频谱,网络为为CNN+LSTM的端到端神经网络,具体为:输入语音切分为相同大小的片段,进行快速傅里叶变换,求出语音片段的频谱图,卷积神经网络则由6个模块组成,每个模块由二维卷积层、ReLU激活层、二维最大池化层组成。其中,第一个卷积层使用7*7的卷积层,第二层使用5*5的卷积核,剩下4层使用3*3的卷积核,所有的最大池化层使用3*3的池化核,池化步长为2。再经过一个全连接层,最后经过Softmax进行分类。
- 根据权利要求1所述的融合多种端到端深度学习结构的说话人感冒症状识别方法,其特征还在于:S4的MFCC特征通过对语音进行预加重,加窗分帧、快速傅里叶变换、梅尔刻度三角滤波器组滤波、取对数运算、离散余弦变换后最终得到的,而CQCC特征是通过对语音进行常数Q变换、求能量谱密度、取对数操作、余弦变换得到的。进行经过长短期记忆网路进行分类。对语音提取MFCC或者CQCC特征作为神经网络的输入,最后经过长短期记忆网络进行分类。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710146957.0 | 2017-03-13 | ||
CN201710146957.0A CN107068167A (zh) | 2017-03-13 | 2017-03-13 | 融合多种端到端神经网络结构的说话人感冒症状识别方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018166316A1 true WO2018166316A1 (zh) | 2018-09-20 |
Family
ID=59621946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/076272 WO2018166316A1 (zh) | 2017-03-13 | 2018-02-11 | 融合多种端到端神经网络结构的说话人感冒症状识别方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107068167A (zh) |
WO (1) | WO2018166316A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10692502B2 (en) * | 2017-03-03 | 2020-06-23 | Pindrop Security, Inc. | Method and apparatus for detecting spoofing conditions |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068167A (zh) * | 2017-03-13 | 2017-08-18 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 融合多种端到端神经网络结构的说话人感冒症状识别方法 |
CN108053841A (zh) * | 2017-10-23 | 2018-05-18 | 平安科技(深圳)有限公司 | 利用语音进行疾病预测的方法及应用服务器 |
CN109960910B (zh) * | 2017-12-14 | 2021-06-08 | Oppo广东移动通信有限公司 | 语音处理方法、装置、存储介质及终端设备 |
CN109086892B (zh) * | 2018-06-15 | 2022-02-18 | 中山大学 | 一种基于一般依赖树的视觉问题推理模型及系统 |
CN109192226A (zh) * | 2018-06-26 | 2019-01-11 | 深圳大学 | 一种信号处理方法及装置 |
CN108899051B (zh) * | 2018-06-26 | 2020-06-16 | 北京大学深圳研究生院 | 一种基于联合特征表示的语音情感识别模型及识别方法 |
CN109256118B (zh) * | 2018-10-22 | 2021-06-25 | 江苏师范大学 | 基于生成式听觉模型的端到端汉语方言识别系统和方法 |
CN109282837B (zh) * | 2018-10-24 | 2021-06-22 | 福州大学 | 基于lstm网络的布拉格光纤光栅交错光谱的解调方法 |
CN111028859A (zh) * | 2019-12-15 | 2020-04-17 | 中北大学 | 一种基于音频特征融合的杂交神经网络车型识别方法 |
CN116110437B (zh) * | 2023-04-14 | 2023-06-13 | 天津大学 | 基于语音特征和说话人特征融合的病理嗓音质量评价方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
CN105139864A (zh) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | 语音识别方法和装置 |
CN106328122A (zh) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | 一种利用长短期记忆模型递归神经网络的语音识别方法 |
CN107068167A (zh) * | 2017-03-13 | 2017-08-18 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 融合多种端到端神经网络结构的说话人感冒症状识别方法 |
-
2017
- 2017-03-13 CN CN201710146957.0A patent/CN107068167A/zh active Pending
-
2018
- 2018-02-11 WO PCT/CN2018/076272 patent/WO2018166316A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
CN105139864A (zh) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | 语音识别方法和装置 |
CN106328122A (zh) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | 一种利用长短期记忆模型递归神经网络的语音识别方法 |
CN107068167A (zh) * | 2017-03-13 | 2017-08-18 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 融合多种端到端神经网络结构的说话人感冒症状识别方法 |
Non-Patent Citations (1)
Title |
---|
TARA N.: "Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2015 IEEE INTERNATIONAL CONFERENCE ON, 6 August 2015 (2015-08-06), XP033187628, ISSN: 2379-190X * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10692502B2 (en) * | 2017-03-03 | 2020-06-23 | Pindrop Security, Inc. | Method and apparatus for detecting spoofing conditions |
US11488605B2 (en) | 2017-03-03 | 2022-11-01 | Pindrop Security, Inc. | Method and apparatus for detecting spoofing conditions |
Also Published As
Publication number | Publication date |
---|---|
CN107068167A (zh) | 2017-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018166316A1 (zh) | 融合多种端到端神经网络结构的说话人感冒症状识别方法 | |
CN112509564B (zh) | 基于连接时序分类和自注意力机制的端到端语音识别方法 | |
CN105023573B (zh) | 使用听觉注意力线索的语音音节/元音/音素边界检测 | |
WO2018227780A1 (zh) | 语音识别方法、装置、计算机设备及存储介质 | |
CN110459225B (zh) | 一种基于cnn融合特征的说话人辨认系统 | |
WO2019232829A1 (zh) | 声纹识别方法、装置、计算机设备及存储介质 | |
CN107331384A (zh) | 语音识别方法、装置、计算机设备及存储介质 | |
CN108847244A (zh) | 基于mfcc和改进bp神经网络的声纹识别方法及系统 | |
CN110299142B (zh) | 一种基于网络融合的声纹识别方法及装置 | |
WO2019023877A1 (zh) | 特定声音识别方法、设备和存储介质 | |
CN111048097B (zh) | 一种基于3d卷积的孪生网络声纹识别方法 | |
CN108922541A (zh) | 基于dtw和gmm模型的多维特征参数声纹识别方法 | |
CN108922513A (zh) | 语音区分方法、装置、计算机设备及存储介质 | |
CN114863937A (zh) | 基于深度迁移学习与XGBoost的混合鸟鸣识别方法 | |
CN108877812B (zh) | 一种声纹识别方法、装置及存储介质 | |
CN116842460A (zh) | 基于注意力机制与残差神经网络的咳嗽关联疾病识别方法和系统 | |
CN111862978A (zh) | 一种基于改进mfcc系数的语音唤醒方法及系统 | |
CN111243621A (zh) | 一种用于合成语音检测的gru-svm深度学习模型的构造方法 | |
CN111785262B (zh) | 一种基于残差网络及融合特征的说话人年龄性别分类方法 | |
CN113571095B (zh) | 基于嵌套深度神经网络的语音情感识别方法和系统 | |
CN113539238B (zh) | 一种基于空洞卷积神经网络的端到端语种识别分类方法 | |
CN115472168B (zh) | 耦合bgcc和pwpe特征的短时语音声纹识别方法、系统及设备 | |
Li et al. | Audio similarity detection algorithm based on Siamese LSTM network | |
Hu et al. | Speaker Recognition Based on 3DCNN-LSTM. | |
CN113963718A (zh) | 一种基于深度学习的语音会话分割方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18768185 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18768185 Country of ref document: EP Kind code of ref document: A1 |