WO2020087716A1 - 人工耳蜗听觉场景识别方法 - Google Patents
人工耳蜗听觉场景识别方法 Download PDFInfo
- Publication number
- WO2020087716A1 WO2020087716A1 PCT/CN2018/123296 CN2018123296W WO2020087716A1 WO 2020087716 A1 WO2020087716 A1 WO 2020087716A1 CN 2018123296 W CN2018123296 W CN 2018123296W WO 2020087716 A1 WO2020087716 A1 WO 2020087716A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scene
- scene recognition
- sound signal
- recognition method
- auditory
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 210000003477 cochlea Anatomy 0.000 title abstract 3
- 230000005236 sound signal Effects 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000009432 framing Methods 0.000 claims abstract 2
- 239000007943 implant Substances 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 230000037433 frameshift Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 230000000638 stimulation Effects 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 10
- 238000005457 optimization Methods 0.000 description 4
- 230000004936 stimulating effect Effects 0.000 description 2
- 206010011891 Deafness neurosensory Diseases 0.000 description 1
- 208000009966 Sensorineural Hearing Loss Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 210000000860 cochlear nerve Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 231100000879 sensorineural hearing loss Toxicity 0.000 description 1
- 208000023573 sensorineural hearing loss disease Diseases 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Definitions
- the invention relates to an auditory scene recognition method, and in particular to a cochlear implant auditory scene recognition method.
- Cochlear implants are currently recognized as the only effective method and device in the world that can restore hearing to patients with bilateral severe or extremely severe sensorineural hearing loss.
- the existing cochlear implant operation process is: the sound is first collected by the microphone and converted into an electrical signal, after special digital processing, and then encoded according to a certain strategy, transmitted to the body through the transmitting coil carried behind the ear, and the receiving coil of the implant After the signal is sensed, it is decoded by the decoding chip, so that the stimulating electrode of the implant generates electric current, thereby stimulating the auditory nerve to produce hearing. Due to the limitation of the use environment, the sound is bound to be mixed with environmental noise, and certain algorithm optimization of the sound signal is required.
- the technical problem to be solved by the present invention is to provide a cochlear implant auditory scene recognition method, which can recognize different auditory scenes.
- the present invention provides a cochlear auditory auditory scene recognition method, which includes the following steps: (A) the preprocessing program module divides the sound signal into frames and windowing; (B) the feature extraction program module The processed sound signal is subjected to feature extraction; (C) The scene recognition program module performs CNN operation on the sound signal after feature extraction to obtain the probability value of each preset scene, and determines the scene with the largest probability value as the final scene.
- step A the windowing process uses Hamming window or Hanning window.
- step B the feature vector is extracted using MFCC, FBank, or spectrogram.
- the CNN includes an input layer, an intermediate layer, and an output layer, wherein the input layer is a two-dimensional data matrix composed of sound signal features, the intermediate layer includes a convolutional output layer, a pooled output layer, and a fully connected output Layer, the fully connected output layer is composed of one-dimensional data, and the pooled output layer is one less than the convolutional output layer.
- pooling process uses Maxpooling or Meanpooling.
- the activation function uses ReLU, sigmoid, tanh, or Logistic, where the ReLU formula:
- the cochlear auditory scene recognition method of the present invention can be processed by CNN to identify different auditory scenes, provide instructions for the signal processing module of the speech processor's subsequent speech enhancement and speech strategy, and make the signal processing of the speech processor more closely match the auditory scene and output
- the stimulus signal more in line with the actual auditory scene improves the clarity and intelligibility of the patient's voice signal in a noisy environment, and also improves the listening effect in the music scene, further improving the quality of life of patients with cochlear implants.
- FIG. 1 is a schematic flowchart of a method for recognizing an auditory scene of a cochlear implant of the present invention.
- FIG. 2 is a schematic flow chart of the CNN processing sound signals of the present invention.
- FIG. 3 is a flowchart of a specific embodiment of CNN processing sound signals according to the present invention.
- the present invention provides a cochlear auditory auditory scene recognition method for identifying different auditory scenes, such as classrooms, streets, concert halls, shopping malls, train stations, and vegetable markets.
- the cochlear auditory auditory scene recognition method includes three steps of preprocessing, feature extraction, and scene recognition.
- the preprocessing program module divides the sound signal into frames and adds windows.
- the purpose of the preprocessing is to use the window function to smoothly divide the sampled sound signal in units of frames. Different frame lengths and The window function will affect the results of the system output.
- the purpose of windowing is to reduce the leakage in the frequency domain of the signal and reduce the amplitude of the side lobes.
- the windowing process can also use other window functions such as Hanning window, and the frame length and frame shift can also be changed and set according to the needs of the system.
- the feature extraction program module performs feature extraction on the preprocessed sound signal, where the feature extraction uses MFCC (Mel-Frequency Cepstrum Coefficient, Mel frequency cepstrum coefficient), FBank (Mel-scale Filter Bank, Mel Scale filter bank) or spectrogram.
- MFCC Mel-Frequency Cepstrum Coefficient, Mel frequency cepstrum coefficient
- FBank Mel-scale Filter Bank, Mel Scale filter bank
- spectrogram spectrogram
- the feature extraction method using Fbank is as follows:
- H m (k) is the frequency response of the Mel filter, and m is the number of Mel filters, which is 40 here;
- the scene recognition program module performs CNN (Convolution Neural Network, Convolutional Neural Network) operation on the sound signal after feature extraction to obtain the probability value of each preset scene, and determines the scene with the largest probability value as the final scene, thus Provides instructions for the signal processing modules of the speech processor, such as subsequent speech enhancement and speech strategy, so that the signal processing of the speech processor more closely matches the auditory scene.
- CNN Convolution Neural Network, Convolutional Neural Network
- the CNN includes an input layer, an intermediate layer, and an output layer, where the input layer is a two-dimensional data matrix composed of sound signal features, and the intermediate layer includes a convolutional output layer, a pooled output layer, and a fully connected
- the output layer, the convolution pool output layer has a convolution effect, the pooled output layer has a pooling effect, and the fully connected output layer also plays a pooling role and is composed of one-dimensional data, whose purpose is to reduce dimensionality and convolution And pooling occurs in pairs, that is, the pooled output layer is one less than the convolutional output layer.
- the process of processing the sound signal by the CNN is as follows: the sound signal enters the first convolution output layer from the input layer and outputs the feature group C_1 after the convolution process; the feature group C_1 enters the first pooled output layer and is pooled After processing, the feature group S_1 is output; the feature group S_1 enters the second convolution output layer for convolution processing, and then outputs the feature group C_2, and then enters the second pooled output layer for pooling processing, and then outputs the feature group S_2, to
- the final feature group is finally output by the Nth convolution output layer, and the final feature group is finally pooled by the fully connected output layer to obtain the classification result of each preset scene, that is, the probability of each preset scene Value, and finally the output layer determines the preset scene with the highest probability as the final scene, where N is greater than or equal to 2.
- the pooling process uses Maxpooling or Meanpooling.
- the activation function uses ReLU (Rectified Linear Units), the formula is as follows:
- the activation function can also use sigmoid, tanh or logistic.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Prostheses (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
Abstract
一种人工耳蜗听觉场景识别方法,其包括如下步骤:(A)预处理程序模块将声音信号进行分帧与加窗处理;(B)特征提取程序模块将预处理后的声音信号进行特征提取;(C)场景识别程序模块将特征提取后的声音信号进行CNN运算,得出各预设场景的概率值,将概率值最大的场景判定为最终场景并输出。该通过CNN处理,能识别不同的听觉场景,为语音处理器后续语音增强及言语策略等信号处理模块提供指示,使语音处理器的信号处理与听觉场景更加匹配,输出与实际听觉场景更加相符的刺激信号,提高患者在噪声环境下的语音信号的清晰度、可懂度,同时还可提高音乐场景下的聆听效果,进一步改善人工耳蜗植入患者的生活质量。
Description
本发明涉及一种听觉场景识别方法,尤其涉及一种人工耳蜗听觉场景识别方法。
人工耳蜗是目前世界公认的能使双侧重度或极重度感音神经性耳聋患者恢复听觉的唯一有效方法及装置。现有的人工耳蜗运作流程为:声音先由麦克风采集转换为电信号,经过特殊的数字化处理,再按照一定的策略编码,通过载在耳后的发射线圈传送到体内,植入体的接收线圈感应到信号后,经过解码芯片解码,使植入体的刺激电极产生电流,从而刺激听神经产生听觉。由于使用环境的限制,声音中必然掺杂着环境杂音,需要对声音信号进行一定的算法优化,但鉴于使用环境的多样化,如果只使用单一算法优化,则算法优化后的信号有时会与实际情况有所偏差,无法达到最佳的听觉效果,故需要一种听觉场景的识别方法,使得不同场景使用不同的优化算法,已达到最佳的听觉效果。
发明内容
有鉴于现有技术的上述缺陷,本发明所要解决的技术问题是提供一种人工耳蜗听觉场景识别方法,其能识别不同的听觉场景。
为实现上述目的,本发明提供了一种人工耳蜗听觉场景识别方法,其包括如下步骤:(A)预处理程序模块将声音信号进行分帧与加窗处理;(B)特征提取程序模块将预处理后的声音信号进行特征提取;(C)场景识别程序模块将特征提取后的声音信号进行CNN运算,得出各预设场景的概率值,将概率值最大的场景判定为最终场景。
在步骤A中,该加窗处理使用Hamming窗或Hanning窗。
在步骤B中,该特征向量提取采用MFCC、FBank或语谱图。
进一步,Fbank的特征提取方法:对预处理输出的每一帧声音信号进行FFT变换:X[i,k]=FFT[x
i(m)];对每一帧FFT后的数据计算谱线能量:E[i,k]=[x
i(k)]
2;计算Mel滤波器能量:
其中,H
m(k)为Mel滤波器的频率响应,m为Mel滤波器个数,这里取40;取对数运算:Fbank=log[S(i,m)]。
在步骤C中,该CNN包括输入层,中间层及输出层,其中,该输入层为声音信号特征构成的二维数据矩阵,该中间层包括卷积输出层,池化输出层以及全连接输出层,该全连接输出层由一个一维数据组成,该池化输出层比该卷积输出层少一个。
进一步,池化处理采用Maxpooling或Meanpooling。
再进一步,激活函数使用ReLU、sigmoid、tanh或Logistic,其中,ReLU公式:
本发明人工耳蜗听觉场景识别方法通过CNN处理,能识别不同的听觉场景,为语音处理器后续语音增强及言语策略等信号处理模块提供指示,使语音处理器的信号处理与听觉场景更加匹配,输出与实际听觉场景更加相符的刺激信号,提高患者在噪声环境下的语音信号的清晰度、可懂度,同时还可提高音乐场景下的聆听效果,进一步改善人工耳蜗植入患者的生活质量。
以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。
图1是本发明人工耳蜗听觉场景识别方法的流程示意图。
图2是本发明CNN处理声音信号的流程示意图。
图3是本发明CNN处理声音信号一具体实施例的流程图。
本发明提供了一种人工耳蜗听觉场景识别方法,用于识别不同的听觉场景,比如教室、街道、音乐厅、商场、火车站、菜市场等。
如图1所示,该人工耳蜗听觉场景识别方法包括预处理,特征提取,场景识别三个步 骤。
预处理:预处理程序模块将声音信号进行分帧与加窗处理,其中,预处理的目的是使用窗函数平滑地在对采样后的声音信号以帧为单位进行切分,不同的帧长及窗函数都会影响系统输出的结果,加窗的目的是减少信号频域中的泄露,降低旁瓣幅度。
以系统采样频率为16kHz为例。
该加窗处理使用Hamming窗,窗长N=256,帧移取窗长一半,即128。
Hamming窗:
该加窗处理也可以使用Hanning窗等其他窗函数,帧长和帧移也可以根据系统需要进行变化设置。
特征提取:特征提取程序模块将预处理后的声音信号进行特征提取,其中,该特征提取采用MFCC(Mel-Frequency Cepstrum Coefficient,梅尔频率倒谱系数)、FBank(Mel-scale Filter Bank,梅尔标度滤波器组)或语谱图。
采用Fbank的特征提取方法如下:
对预处理输出的每一帧声音信号进行FFT变换:X[i,k]=FFT[x
i(m)];
对每一帧FFT后的数据计算谱线能量:E[i,k]=[x
i(k)]
2;
取对数运算:Fbank=log[S(i,m)]。
场景识别:场景识别程序模块将特征提取后的声音信号进行CNN(Convolution Neural Network,卷积神经网络)运算,得出各预设场景的概率值,将概率值最大的场景判定为最终场景,从而为语音处理器后续语音增强及言语策略等信号处理模块提供指示,使语音处理器的信号处理与听觉场景更加匹配。
如图2所示,该CNN包括输入层,中间层及输出层,其中,该输入层为声音信号特征构成的二维数据矩阵,该中间层包括卷积输出层,池化输出层以及全连接输出层,该卷积池输出层其卷积作用,该池化输出层其池化作用,该全连接输出层也起池化作用且由一个 一维数据组成,其目的是降维,卷积及池化是成对出现的,即,该池化输出层比该卷积输出层少一个。该CNN处理声音信号的流程为:声音信号由该输入层进入第一个卷积输出层,经卷积处理后输出特征组C_1;该特征组C_1进入第一个池化输出层,经池化处理后输出特征组S_1;该特征组S_1进入第二个卷积输出层进行卷积处理后输出特征组C_2,然后进入第二个池化输出层输出进行池化处理后输出特征组S_2,以此类推,最后由第N个卷积输出层输出最终特征组,该最终特征组由该全连接输出层进行最后池化处理,以得出各预设场景分类结果,即各预设场景的概率值,最后由该输出层将最大概率的预设场景判定为最终场景,其中,N大于等于2。
如图3所示,举一CNN框架参数配置属性进行说明,见下表。
池化处理采用Maxpooling或Meanpooling。
激活函数使用ReLU(Rectified Linear Units),公式如下:
该激活函数也可以采用sigmoid、tanh或Logistic。
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术 方案,皆应在由权利要求书所确定的保护范围内。
Claims (8)
- 一种人工耳蜗听觉场景识别方法,其包括如下步骤:(A)预处理程序模块将声音信号进行分帧与加窗处理;(B)特征提取程序模块将预处理后的声音信号进行特征提取;(C)场景识别程序模块将特征提取后的声音信号进行CNN运算,得出各预设场景的概率值,将概率值最大的场景判定为最终场景。
- 如权利要求1所述的人工耳蜗听觉场景识别方法,其特征在于:在步骤A中,该加窗处理使用Hamming窗或Hanning窗。
- 如权利要求1所述的人工耳蜗听觉场景识别方法,其特征在于:在步骤B中,该特征向量提取采用MFCC、FBank或语谱图。
- 如权利要求1所述的人工耳蜗听觉场景识别方法,其特征在于:在步骤C中,该CNN包括输入层,中间层及输出层,其中,该输入层为声音信号特征构成的二维数据矩阵,该中间层包括卷积输出层,池化输出层以及全连接输出层,该全连接输出层由一个一维数据组成,该池化输出层比该卷积输出层少一个。
- 如权利要求6所述的人工耳蜗听觉场景识别方法,其特征在于:池化处理采用Maxpooling或Meanpooling。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811276582.0 | 2018-10-30 | ||
CN201811276582.0A CN109448702A (zh) | 2018-10-30 | 2018-10-30 | 人工耳蜗听觉场景识别方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020087716A1 true WO2020087716A1 (zh) | 2020-05-07 |
Family
ID=65549467
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/123296 WO2020087716A1 (zh) | 2018-10-30 | 2018-12-25 | 人工耳蜗听觉场景识别方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109448702A (zh) |
WO (1) | WO2020087716A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859768A (zh) * | 2019-03-12 | 2019-06-07 | 上海力声特医学科技有限公司 | 人工耳蜗语音增强方法 |
CN110796027B (zh) * | 2019-10-10 | 2023-10-17 | 天津大学 | 一种基于紧密卷积的神经网络模型的声音场景识别方法 |
CN111491245B (zh) * | 2020-03-13 | 2022-03-04 | 天津大学 | 基于循环神经网络的数字助听器声场识别算法及实现方法 |
CN113160844A (zh) * | 2021-04-27 | 2021-07-23 | 山东省计算中心(国家超级计算济南中心) | 基于噪声背景分类的语音增强方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (zh) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | 一种分析和提取设定场景的音频数据的方法 |
CN103456301A (zh) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | 一种基于环境声音的场景识别方法及装置及移动终端 |
CN107103901A (zh) * | 2017-04-03 | 2017-08-29 | 浙江诺尔康神经电子科技股份有限公司 | 人工耳蜗声音场景识别系统和方法 |
CN108231067A (zh) * | 2018-01-13 | 2018-06-29 | 福州大学 | 基于卷积神经网络与随机森林分类的声音场景识别方法 |
CN108520757A (zh) * | 2018-03-31 | 2018-09-11 | 华南理工大学 | 基于听觉特性的音乐适用场景自动分类方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102016214745B4 (de) * | 2016-08-09 | 2018-06-14 | Carl Von Ossietzky Universität Oldenburg | Verfahren zur Stimulation einer implantierten Elektrodenanordnung einer Hörprothese |
CN106682574A (zh) * | 2016-11-18 | 2017-05-17 | 哈尔滨工程大学 | 一维深度卷积网络的水下多目标识别方法 |
CN108550375A (zh) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | 一种基于语音信号的情感识别方法、装置和计算机设备 |
-
2018
- 2018-10-30 CN CN201811276582.0A patent/CN109448702A/zh active Pending
- 2018-12-25 WO PCT/CN2018/123296 patent/WO2020087716A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (zh) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | 一种分析和提取设定场景的音频数据的方法 |
CN103456301A (zh) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | 一种基于环境声音的场景识别方法及装置及移动终端 |
CN107103901A (zh) * | 2017-04-03 | 2017-08-29 | 浙江诺尔康神经电子科技股份有限公司 | 人工耳蜗声音场景识别系统和方法 |
CN108231067A (zh) * | 2018-01-13 | 2018-06-29 | 福州大学 | 基于卷积神经网络与随机森林分类的声音场景识别方法 |
CN108520757A (zh) * | 2018-03-31 | 2018-09-11 | 华南理工大学 | 基于听觉特性的音乐适用场景自动分类方法 |
Also Published As
Publication number | Publication date |
---|---|
CN109448702A (zh) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lai et al. | Deep learning–based noise reduction approach to improve speech intelligibility for cochlear implant recipients | |
WO2020087716A1 (zh) | 人工耳蜗听觉场景识别方法 | |
CN108766419B (zh) | 一种基于深度学习的非常态语音区别方法 | |
CN109326302B (zh) | 一种基于声纹比对和生成对抗网络的语音增强方法 | |
CN105741849B (zh) | 数字助听器中融合相位估计与人耳听觉特性的语音增强方法 | |
CN110120227B (zh) | 一种深度堆叠残差网络的语音分离方法 | |
Stern et al. | Hearing is believing: Biologically inspired methods for robust automatic speech recognition | |
CN108447495B (zh) | 一种基于综合特征集的深度学习语音增强方法 | |
CN107767859B (zh) | 噪声环境下人工耳蜗信号的说话人可懂性检测方法 | |
CN110111769B (zh) | 一种电子耳蜗控制方法、装置、可读存储介质及电子耳蜗 | |
CN103761974B (zh) | 一种人工耳蜗 | |
CN104810024A (zh) | 一种双路麦克风语音降噪处理方法及系统 | |
CN109448755A (zh) | 人工耳蜗听觉场景识别方法 | |
US7787640B2 (en) | System and method for spectral enhancement employing compression and expansion | |
CN1967659A (zh) | 用于助听器的语音增强方法 | |
Henry et al. | Noise reduction in cochlear implant signal processing: A review and recent developments | |
Hazrati et al. | Reverberation suppression in cochlear implants using a blind channel-selection strategy | |
CN110992967A (zh) | 一种语音信号处理方法、装置、助听器及存储介质 | |
CN106782500A (zh) | 一种基于基音周期和mfcc的融合特征参数提取方法 | |
CN109859768A (zh) | 人工耳蜗语音增强方法 | |
Patil et al. | Marathi speech intelligibility enhancement using I-AMS based neuro-fuzzy classifier approach for hearing aid users | |
CN112581970B (zh) | 用于音频信号生成的系统和方法 | |
Nogueira et al. | Development of a sound coding strategy based on a deep recurrent neural network for monaural source separation in cochlear implants | |
CN114023352B (zh) | 一种基于能量谱深度调制的语音增强方法及装置 | |
Huang et al. | Combination and comparison of sound coding strategies using cochlear implant simulation with mandarin speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18938809 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18938809 Country of ref document: EP Kind code of ref document: A1 |