Nothing Special   »   [go: up one dir, main page]

Jiang et al., 2021 - Google Patents

A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain

Jiang et al., 2021

View PDF
Document ID
10760158430632576896
Author
Jiang T
Liu H
Zhou Y
Gan L
Publication year
Publication venue
International Conference on Communications and Networking in China

External Links

Snippet

This paper presents a novel end-to-end multi-channel speech enhancement using complex time-domain operations. To that end, in time-domain, Hilbert transform is utilized to construct a complex time-domain analytic signal as the training inputs of the neural network. The …
Continue reading at bura.brunel.ac.uk (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification

Similar Documents

Publication Publication Date Title
US20210089967A1 (en) Data training in multi-sensor setups
Schädler et al. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
Erdogan et al. Deep recurrent networks for separation and recognition of single-channel speech in nonstationary background audio
Grais et al. Multi-resolution fully convolutional neural networks for monaural audio source separation
CN110428852A (en) Speech separating method, device, medium and equipment
Sainath et al. Raw multichannel processing using deep neural networks
Nugraha et al. Deep neural network based multichannel audio source separation
Sun et al. Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation
Chen et al. A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
Muñoz-Montoro et al. Ambisonics domain singing voice separation combining deep neural network and direction aware multichannel nmf
Vinitha George et al. A novel U-Net with dense block for drum signal separation from polyphonic music signal mixture
Jiang et al. A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain
CN114446316B (en) Audio separation method, training method, device and equipment of audio separation model
Joseph et al. Cycle GAN-Based Audio Source Separation Using Time–Frequency Masking
Koteswararao et al. Single channel source separation using time–frequency non-negative matrix factorization and sigmoid base normalization deep neural networks
Čmejla et al. Independent vector analysis exploiting pre-learned banks of relative transfer functions for assumed target’s positions
Li et al. MAF-Net: multidimensional attention fusion network for multichannel speech separation
Wang et al. Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
Wang et al. A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
Watanabe et al. DNN-based frequency component prediction for frequency-domain audio source separation
Jiang et al. Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming
He et al. Mask-based blind source separation and MVDR beamforming in ASR
Liu et al. A new neural beamformer for multi-channel speech separation
Tomassetti et al. Neural beamforming for speech enhancement: preliminary results
Prasanna Kumar et al. Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies