research-article

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

Author:

Tuomas VirtanenAuthors Info & Claims

IEEE Transactions on Audio, Speech, and Language Processing, Volume 15, Issue 3

Pages 1066 - 1074

https://doi.org/10.1109/TASL.2006.885253

Published: 01 March 2007 Publication History

Abstract

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements

Cited By

View all

Yu SYang C(2024)MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker ExtractionMultiMedia Modeling10.1007/978-3-031-53308-2_17(227-238)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53308-2_17
Mo SMorgado PKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)A unified audio-visual learning framework for localization, separation, and recognitionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619449(25006-25017)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619449
Hsu YBai M(2023)Learning-based robust speaker counting and separation with the aid of spatial coherenceEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00298-32023:1Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1186/s13636-023-00298-3
Show More Cited By

Index Terms

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Computing methodologies
  1. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Monotonous (semi-)nonnegative matrix factorization
CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences

Nonnegative matrix factorization (NMF) factorizes a non-negative matrix into product of two non-negative matrices, namely a signal matrix and a mixing matrix. NMF suffers from the scale and ordering ambiguities. Often, the source signals can be ...
NMF-based environmental sound source separation using time-variant gain features

Various environmental sounds exist around us in our daily life. Recently, environmental sound recognition has drawn great attention for understanding our environment. However, because environmental sounds derive from multiple sound sources, it is ...
Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization

This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing

IEEE Transactions on Audio, Speech, and Language Processing Volume 15, Issue 3

March 2007

374 pages

ISSN:1558-7916

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 March 2007

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

158
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yu SYang C(2024)MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker ExtractionMultiMedia Modeling10.1007/978-3-031-53308-2_17(227-238)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53308-2_17
Mo SMorgado PKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)A unified audio-visual learning framework for localization, separation, and recognitionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619449(25006-25017)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619449
Hsu YBai M(2023)Learning-based robust speaker counting and separation with the aid of spatial coherenceEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00298-32023:1Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1186/s13636-023-00298-3
Ji YMa SXu XLi XShen H(2023)Self-Supervised Fine-Grained Cycle-Separation Network (FSCN) for Visual-Audio SeparationIEEE Transactions on Multimedia10.1109/TMM.2022.320028225(5864-5876)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3200282
Mokrý OMagron POberlin TFévotte C(2023)Algorithms for audio inpainting based on probabilistic nonnegative matrix factorizationSignal Processing10.1016/j.sigpro.2022.108905206:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.sigpro.2022.108905
Agrawal JGupta MGarg H(2023)A review on speech separation in cocktail party environment: challenges and approachesMultimedia Tools and Applications10.1007/s11042-023-14649-x82:20(31035-31067)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14649-x
Lai WWang S(2022)RPCA-DRNN technique for monaural singing voice separationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-022-00236-92022:1Online publication date: 5-Feb-2022
https://dl.acm.org/doi/10.1186/s13636-022-00236-9
Lan CYe HLiu YGuo XHan Y(2022)Research on a Hyperplane Decomposition NMF Algorithm Applied to Speech Signal SeparationProceedings of the 2022 4th International Conference on Video, Signal and Image Processing10.1145/3577164.3577188(152-157)Online publication date: 25-Nov-2022
https://dl.acm.org/doi/10.1145/3577164.3577188
Hao YWu JHuang XZhang ZLiu FWu Q(2022)Speaker extraction network with attention mechanism for speech dialogue systemService Oriented Computing and Applications10.1007/s11761-022-00340-w16:2(111-119)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s11761-022-00340-w
Ke SWang ZHu RWang X(2022)Single-channel Multi-speakers Speech Separation Based on Isolated Speech SegmentsNeural Processing Letters10.1007/s11063-022-10887-655:1(385-400)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1007/s11063-022-10887-6
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations

Monotonous (semi-)nonnegative matrix factorization

NMF-based environmental sound source separation using time-variant gain features

Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations