Jiang et al., 2021 - Google Patents
A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time DomainJiang et al., 2021
View PDF- Document ID
- 10760158430632576896
- Author
- Jiang T
- Liu H
- Zhou Y
- Gan L
- Publication year
- Publication venue
- International Conference on Communications and Networking in China
External Links
Snippet
This paper presents a novel end-to-end multi-channel speech enhancement using complex time-domain operations. To that end, in time-domain, Hilbert transform is utilized to construct a complex time-domain analytic signal as the training inputs of the neural network. The …
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210089967A1 (en) | Data training in multi-sensor setups | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
Erdogan et al. | Deep recurrent networks for separation and recognition of single-channel speech in nonstationary background audio | |
Grais et al. | Multi-resolution fully convolutional neural networks for monaural audio source separation | |
CN110428852A (en) | Speech separating method, device, medium and equipment | |
Sainath et al. | Raw multichannel processing using deep neural networks | |
Nugraha et al. | Deep neural network based multichannel audio source separation | |
Sun et al. | Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation | |
Chen et al. | A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation | |
Muñoz-Montoro et al. | Ambisonics domain singing voice separation combining deep neural network and direction aware multichannel nmf | |
Vinitha George et al. | A novel U-Net with dense block for drum signal separation from polyphonic music signal mixture | |
Jiang et al. | A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain | |
CN114446316B (en) | Audio separation method, training method, device and equipment of audio separation model | |
Joseph et al. | Cycle GAN-Based Audio Source Separation Using Time–Frequency Masking | |
Koteswararao et al. | Single channel source separation using time–frequency non-negative matrix factorization and sigmoid base normalization deep neural networks | |
Čmejla et al. | Independent vector analysis exploiting pre-learned banks of relative transfer functions for assumed target’s positions | |
Li et al. | MAF-Net: multidimensional attention fusion network for multichannel speech separation | |
Wang et al. | Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments | |
Wang et al. | A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments | |
Watanabe et al. | DNN-based frequency component prediction for frequency-domain audio source separation | |
Jiang et al. | Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming | |
He et al. | Mask-based blind source separation and MVDR beamforming in ASR | |
Liu et al. | A new neural beamformer for multi-channel speech separation | |
Tomassetti et al. | Neural beamforming for speech enhancement: preliminary results | |
Prasanna Kumar et al. | Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies |