Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3309772.3309780acmotherconferencesArticle/Chapter ViewAbstractPublication PagesappisConference Proceedingsconference-collections
research-article

Separating musical sources with convolutional sparse coding

Published: 07 January 2019 Publication History

Abstract

The solution to the problem of separating multiple vocal and instrumental tracks from a single audio waveform is solved naturally by the human auditory cortex but has yet to be effectively implemented computationally. In this paper, we demonstrate a neurally-inspired approach to separating bass, drums, vocals and other instruments from sparse encodings of phase-rich Fourier and Constant-Q representations of stereo musical data. Our sparse encodings are generated from learned features that are both spectrally and temporally convolutional, similar to the hemispheric lateralization of human auditory cortex. We find that learning from neurally inspired Constant-Q representations provides better separation over Fourier spectrograms due to the presence of structure that is convolutional in log-frequency that aids in the differentiation of instruments.

References

[1]
Vincent, E., Ono, N.:Music Source Separation and its Applications to MIR. In: 11th International Society for Music Information Retrieval Conference (ISMIR). Conference slides. http://ismir2010.ismir.net/proceedings/tutorial_1_Vincent-Ono.pdf (2010)
[2]
Kokkinakis, K., Loizou, PC.: Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients. The Journal of the Acoustical Society of America. 123.4, 2379--2390 (2008)
[3]
Bronstein, A.M., Bronstein, M.M., Zibulevsky, M.: Blind Source Separation: Biomedical applications. Wiley Encyclopedia of Biomedical Engineering. (2005)
[4]
Community-Based Signal Separation Evaluation Campaign (SiSEC 2018), http://sisec.inria.fr/
[5]
Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, P.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57--60. IEEE Press, New York (2012)
[6]
Huang, P.S., Kim, M., Hasegawa-Johnson, P., Smaragdis, P.: Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136--2147. IEEE Press, New York (2015)
[7]
Grais, E.M., Plumbley, M.D.: Single Channel Audio Source Separation using Convolutional Denoising Autoencoders. ArXiv. https://arxiv.org/abs/1703.08019 (2017)
[8]
Barker, T., Virtanen, T.: Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 12, pp. 2377--2389. IEEE Press, New York (2016)
[9]
Subakan, C., Smaragdis, P.: Generative Adversarial Source Separation. ArXiv. https://arxiv.org/abs/1710.10779 (2017)
[10]
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. In: IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462--1469. IEEE Press, New York (2006)
[11]
Elliott, S. J., Shera, C. A.: The cochlea as a smart structure. In: Smart Materials and Structures 21.6, 064001. (2012)
[12]
Saenz, M., Langers, D.R.: Tonotopic mapping of human auditory cortex. In: Hearing Research 307, pp. 42--52, ISSN 0378-5955. http://www.sciencedirect.com/science/article/pii/S0378595513001871 (2014)
[13]
van der Heijden, M., Joris, P.X.: Cochlear Phase and Amplitude Retrieved from the Auditory Nerve at Arbitrary Frequencies. In: Journal of Neuroscience Vol. 23, Issue 27. http://www.jneurosci.org/content/23/27/9194.full (2003)
[14]
Joris, P. X.: Neural Processing of Amplitude-Modulated Sounds. In: Physiological Reviews 84.2, pp. 541--577. (2004)
[15]
Nourski, K. V., Brugge, J. F.: Representation of temporal sound features in the human auditory cortex. In: Reviews in the Neurosciences, 22.2. (2011)
[16]
Wang, R., Perreau-Guimaraes, M., Carvalhaes, C., Suppes, P.: Using phase to recognize English phonemes and their distinctive features in the brain. In: Proceedings of the National Academy of Sciences of the United States of America, 109.50, pp. 20685--20690. (2012)
[17]
Chechik, G., Anderson, M.J., Bar-Yosef, O., Young, E.D., Tishby, N., Nelken, I.: Reduction of Information Redundancy in the Ascending Auditory Pathway. In: Neuron 51, pp. 359--68. (2006)
[18]
Oppenheim, J.N., Magnasco, M.O.: Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle. In: Phys. Rev. Lett. 110.4, pp. 044301 - 044306. American Physical Society (2013) https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.110.044301
[19]
Zatorre, R. J., Belin, P., and Penhune, V. B.: Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37?46. (2002)
[20]
Youngberg, J., Boll, S.: Constant-Q signal analysis and synthesis. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 375--378. (1978)
[21]
Dubey, M., Kenyon, G., Carlson, N., Thresher, A.: Does Phase Matter For Monaural Source Separation?. ArXiv. https://arxiv.org/abs/1711.00913 (2017)
[22]
Olshausen B. and Field D.: Emergence of simple-cell receptive feld properties by learning a sparse code for natural images. In: Nature 681, p. 607--609. (1996)
[23]
Olshausen B.: Highly overcomplete sparse coding. In: SPIE 8651, Human Vision and Electronic Imaging XVIII, 86510S. (2013)
[24]
Rozell, C. et al.: Sparse coding via thresholding and local competition in neural circuits. In Neural Computation 20.10, pp. 2526--2563 (2008)
[25]
Carlson, N.L., Ming, V.L., DeWeese, M.R.: Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus. In: PLoS Comput Biol 8.7: e1002594. (2012)
[26]
Dubey, M. L., Shultz, P. F., Kenyon, G. T.: Learning phase-rich features from streaming auditory images. In: Image Analysis and Interpretation (SSIAI), 2016 IEEE Southwest Symposium on (pp. 73--76). IEEE. (2016)
[27]
Petavision, https://github.com/PetaVision
[28]
BSS-Eval Toolbox, http://bass-db.gforge.inria.fr/bss_eval/
[29]
Rafi, Z. et al. MUSDB18 (2017)
[30]
NSG Toolbox http://nsg.sourceforge.net/index.php
[31]
Hsu K.-C., Lin C.-S., and Chi T.-S.: Sparse coding based music genre classification using spectro-temporal modulations. In: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), 2016, on (pp. 744âĂŞ750). (2016)
[32]
Huzaifah M.: Comparison of Time-Frequency Representations for Environmental Sound Classifcation using Convolutional Neural Networks. ArXiv. https://arxiv.org/abs/1706.07156.
[33]
Carroll J., Carlson N., Kenyon G. T.: Phase Transitions in Image Denoising via Sparsely Coding Convolutional Neural Networks. ArXiv. https://arxiv.org/abs/1710.09875
[34]
Nishihara, M., Inui, K., Morita, T., Kodaira, M., Mochizuki, H., Otsuru, N., âĂę Kakigi, R. Echoic Memory: Investigation of Its Temporal Resolution by Auditory Offset Cortical Responses. PLoS ONE, 9(8), e106553. (2014)
[35]
Snyder, J. S., and Alain, C. Toward a neurophysiological theory of auditory stream segregation. Psychological Bulletin, 133(5), 780--799. (2007).

Index Terms

  1. Separating musical sources with convolutional sparse coding
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems
    January 2019
    208 pages
    ISBN:9781450360852
    DOI:10.1145/3309772
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 January 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. constant-Q transform
    2. convolutional sparse coding
    3. source separation

    Qualifiers

    • Research-article

    Conference

    APPIS 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 74
      Total Downloads
    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media