Nakamura et al., 2020 - Google Patents
Harmonic-temporal factor decomposition for unsupervised monaural separation of harmonic soundsNakamura et al., 2020
View PDF- Document ID
- 8527282070498557560
- Author
- Nakamura T
- Kameoka H
- Publication year
- Publication venue
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
External Links
Snippet
We address the problem of separating a monaural mixture of harmonic sounds into the audio signals of individual semitones in an unsupervised manner. Unsupervised monaural audio source separation has thus far been mainly addressed by two approaches: one rooted …
- 238000000926 separation method 0 title abstract description 62
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/471—General musical sound synthesis principles, i.e. sound category-independent synthesis methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
- G10H7/08—Instruments in which the tones are synthesised from a data store, e.g. computer organs by calculating functions or polynomial approximations to evaluate amplitudes at successive sample points of a tone waveform
- G10H7/10—Instruments in which the tones are synthesised from a data store, e.g. computer organs by calculating functions or polynomial approximations to evaluate amplitudes at successive sample points of a tone waveform using coefficients or parameters stored in a memory, e.g. Fourier coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H5/00—Instruments in which the tones are generated by means of electronic generators
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rafii et al. | An overview of lead and accompaniment separation in music | |
FitzGerald et al. | Extended nonnegative tensor factorisation models for musical sound source separation | |
Klapuri | Signal processing methods for the automatic transcription of music | |
Virtanen | Sound source separation in monaural music signals | |
Fuentes et al. | Harmonic adaptive latent component analysis of audio and application to music transcription | |
Ikemiya et al. | Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation | |
Hayes et al. | A review of differentiable digital signal processing for music and speech synthesis | |
CN105684079B (en) | For enhancing the method and system for having noise cancellation signal of input | |
Tachibana et al. | Harmonic/percussive sound separation based on anisotropic smoothness of spectrograms | |
US9779706B2 (en) | Context-dependent piano music transcription with convolutional sparse coding | |
Chien et al. | Bayesian factorization and learning for monaural source separation | |
Carabias-Orti et al. | Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription | |
Schulze-Forster et al. | Unsupervised music source separation using differentiable parametric source models | |
Cogliati et al. | Context-dependent piano music transcription with convolutional sparse coding | |
Wu et al. | DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation | |
Magron et al. | Phase recovery in NMF for audio source separation: an insightful benchmark | |
Rodriguez-Serrano et al. | Online score-informed source separation with adaptive instrument models | |
Durrieu et al. | Main instrument separation from stereophonic audio signals using a source/filter model | |
Nakano et al. | Nonnegative matrix factorization with Markov-chained bases for modeling time-varying patterns in music spectrograms | |
Nakamura et al. | Harmonic-temporal factor decomposition for unsupervised monaural separation of harmonic sounds | |
Li et al. | A music cognition–guided framework for multi-pitch estimation | |
Park et al. | Separation of instrument sounds using non-negative matrix factorization with spectral envelope constraints | |
JP2012027196A (en) | Signal analyzing device, method, and program | |
Rodríguez-Serrano et al. | Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures | |
Kameoka et al. | Nonnegative matrix factorization with basis clustering using cepstral distance regularization |