research-article

Separating musical sources with convolutional sparse coding

Authors:

Austin Thresher,

Garrett KenyonAuthors Info & Claims

APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems

Article No.: 8, Pages 1 - 5

https://doi.org/10.1145/3309772.3309780

Published: 07 January 2019 Publication History

Abstract

The solution to the problem of separating multiple vocal and instrumental tracks from a single audio waveform is solved naturally by the human auditory cortex but has yet to be effectively implemented computationally. In this paper, we demonstrate a neurally-inspired approach to separating bass, drums, vocals and other instruments from sparse encodings of phase-rich Fourier and Constant-Q representations of stereo musical data. Our sparse encodings are generated from learned features that are both spectrally and temporally convolutional, similar to the hemispheric lateralization of human auditory cortex. We find that learning from neurally inspired Constant-Q representations provides better separation over Fourier spectrograms due to the presence of structure that is convolutional in log-frequency that aids in the differentiation of instruments.

References

[1]

Vincent, E., Ono, N.:Music Source Separation and its Applications to MIR. In: 11th International Society for Music Information Retrieval Conference (ISMIR). Conference slides. http://ismir2010.ismir.net/proceedings/tutorial_1_Vincent-Ono.pdf (2010)

[2]

Kokkinakis, K., Loizou, PC.: Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients. The Journal of the Acoustical Society of America. 123.4, 2379--2390 (2008)

[3]

Bronstein, A.M., Bronstein, M.M., Zibulevsky, M.: Blind Source Separation: Biomedical applications. Wiley Encyclopedia of Biomedical Engineering. (2005)

[4]

Community-Based Signal Separation Evaluation Campaign (SiSEC 2018), http://sisec.inria.fr/

[5]

Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, P.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57--60. IEEE Press, New York (2012)

[6]

Huang, P.S., Kim, M., Hasegawa-Johnson, P., Smaragdis, P.: Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136--2147. IEEE Press, New York (2015)

Digital Library

[7]

Grais, E.M., Plumbley, M.D.: Single Channel Audio Source Separation using Convolutional Denoising Autoencoders. ArXiv. https://arxiv.org/abs/1703.08019 (2017)

[8]

Barker, T., Virtanen, T.: Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 12, pp. 2377--2389. IEEE Press, New York (2016)

Digital Library

[9]

Subakan, C., Smaragdis, P.: Generative Adversarial Source Separation. ArXiv. https://arxiv.org/abs/1710.10779 (2017)

[10]

Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. In: IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462--1469. IEEE Press, New York (2006)

Digital Library

[11]

Elliott, S. J., Shera, C. A.: The cochlea as a smart structure. In: Smart Materials and Structures 21.6, 064001. (2012)

[12]

Saenz, M., Langers, D.R.: Tonotopic mapping of human auditory cortex. In: Hearing Research 307, pp. 42--52, ISSN 0378-5955. http://www.sciencedirect.com/science/article/pii/S0378595513001871 (2014)

[13]

van der Heijden, M., Joris, P.X.: Cochlear Phase and Amplitude Retrieved from the Auditory Nerve at Arbitrary Frequencies. In: Journal of Neuroscience Vol. 23, Issue 27. http://www.jneurosci.org/content/23/27/9194.full (2003)

[14]

Joris, P. X.: Neural Processing of Amplitude-Modulated Sounds. In: Physiological Reviews 84.2, pp. 541--577. (2004)

[15]

Nourski, K. V., Brugge, J. F.: Representation of temporal sound features in the human auditory cortex. In: Reviews in the Neurosciences, 22.2. (2011)

[16]

Wang, R., Perreau-Guimaraes, M., Carvalhaes, C., Suppes, P.: Using phase to recognize English phonemes and their distinctive features in the brain. In: Proceedings of the National Academy of Sciences of the United States of America, 109.50, pp. 20685--20690. (2012)

[17]

Chechik, G., Anderson, M.J., Bar-Yosef, O., Young, E.D., Tishby, N., Nelken, I.: Reduction of Information Redundancy in the Ascending Auditory Pathway. In: Neuron 51, pp. 359--68. (2006)

[18]

Oppenheim, J.N., Magnasco, M.O.: Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle. In: Phys. Rev. Lett. 110.4, pp. 044301 - 044306. American Physical Society (2013) https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.110.044301

[19]

Zatorre, R. J., Belin, P., and Penhune, V. B.: Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37?46. (2002)

[20]

Youngberg, J., Boll, S.: Constant-Q signal analysis and synthesis. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 375--378. (1978)

[21]

Dubey, M., Kenyon, G., Carlson, N., Thresher, A.: Does Phase Matter For Monaural Source Separation?. ArXiv. https://arxiv.org/abs/1711.00913 (2017)

[22]

Olshausen B. and Field D.: Emergence of simple-cell receptive feld properties by learning a sparse code for natural images. In: Nature 681, p. 607--609. (1996)

[23]

Olshausen B.: Highly overcomplete sparse coding. In: SPIE 8651, Human Vision and Electronic Imaging XVIII, 86510S. (2013)

[24]

Rozell, C. et al.: Sparse coding via thresholding and local competition in neural circuits. In Neural Computation 20.10, pp. 2526--2563 (2008)

Digital Library

[25]

Carlson, N.L., Ming, V.L., DeWeese, M.R.: Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus. In: PLoS Comput Biol 8.7: e1002594. (2012)

[26]

Dubey, M. L., Shultz, P. F., Kenyon, G. T.: Learning phase-rich features from streaming auditory images. In: Image Analysis and Interpretation (SSIAI), 2016 IEEE Southwest Symposium on (pp. 73--76). IEEE. (2016)

[27]

Petavision, https://github.com/PetaVision

[28]

BSS-Eval Toolbox, http://bass-db.gforge.inria.fr/bss_eval/

[29]

Rafi, Z. et al. MUSDB18 (2017)

[30]

NSG Toolbox http://nsg.sourceforge.net/index.php

[31]

Hsu K.-C., Lin C.-S., and Chi T.-S.: Sparse coding based music genre classification using spectro-temporal modulations. In: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), 2016, on (pp. 744âĂŞ750). (2016)

[32]

Huzaifah M.: Comparison of Time-Frequency Representations for Environmental Sound Classifcation using Convolutional Neural Networks. ArXiv. https://arxiv.org/abs/1706.07156.

[33]

Carroll J., Carlson N., Kenyon G. T.: Phase Transitions in Image Denoising via Sparsely Coding Convolutional Neural Networks. ArXiv. https://arxiv.org/abs/1710.09875

[34]

Nishihara, M., Inui, K., Morita, T., Kodaira, M., Mochizuki, H., Otsuru, N., âĂę Kakigi, R. Echoic Memory: Investigation of Its Temporal Resolution by Auditory Offset Cortical Responses. PLoS ONE, 9(8), e106553. (2014)

[35]

Snyder, J. S., and Alain, C. Toward a neurophysiological theory of auditory stream segregation. Psychological Bulletin, 133(5), 780--799. (2007).

Index Terms

Separating musical sources with convolutional sparse coding
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

In this paper we propose a monophonic constrained signal decomposition model applied to polyphonic signals composed of several monophonic sources from different musical instruments. The harmonic constraint is particularly effective for tonal instruments ...
Static and dynamic classification methods for polyphonic transcription of piano pieces in different musical styles
ICC'08: Proceedings of the 12th WSEAS international conference on Circuits

In this paper, we present two methods based on neural networks for the automatic transcription of polyphonic piano music. The input to these methods consists in piano music recordings stored in WAV files, while the pitch of all the notes in the ...
Short-term memory and event memory classification systems for automatic polyphonic music transcription
CSECS'09: Proceedings of the 8th WSEAS International Conference on Circuits, systems, electronics, control & signal processing

Music transcription consists in transforming the musical content of audio data into a symbolic representation. The objective of this study is to investigate a transcription system for polyphonic piano. The input to this system consists in piano music ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems

January 2019

208 pages

ISBN:9781450360852

DOI:10.1145/3309772

Conference Chairs:
Nicolai Petkov
University of Groningen, Netherlands
,
Nicola Strisciuglio
University of Groningen, Netherlands
,
Carlos M. Travieso
University of Las Palmas de Gran Canaria, Spain

Copyright © 2019 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

APPIS 2019

APPIS 2019: 2nd International Conference on Applications of Intelligent Systems

January 7 - 9, 2019

Las Palmas de Gran Canaria, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
74
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents