Abstract
There exist a large number of methods for pitched instrument source separation. The core problem is to separate the time-frequency overlapping harmonics. To yield better results, we propose a function to select fine harmonic separation results from existing methods. Our strategy is based on the discovery that a fine separation result usually has a low total amplitude fluctuation. For source harmonics overlapped in a frequency band, each method produces a separation result. Employing the harmonics separated by each method, we can estimate the total amplitude fluctuation of each group of overlapping harmonics. Our selection function maps the band index to the method index by selecting the method with the minimum total amplitude fluctuation. Experiments are conducted on sample mixtures from the University of Iowa Musical Instrument Sample Database. Three advanced separation techniques are compared, including common amplitude modulation (CAM), harmonic bandwidth companding (HBW-comp) and ideal binary mask (IBM) filtering. Experiment results indicate that the proposed selection function is able to boost the separation performance significantly.
Similar content being viewed by others
References
Koteswararao, Y.V., Rao, C.B.R.: Multichannel speech separation using hybrid GOMF and enthalpy-based deep neural networks. Multimedia Syst. 27, 271–286 (2021)
Xie, L., Fu, Z., Feng, W., Luo, Y.: Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news. Multimedia Syst. 17, 101–11 (2011)
Rafii, Z., Liutkus, A., Stoter, F. R., Mimilakis, S. I., FitzGerald, D., Pardo, B.: “An Overview of Lead and Accompaniment Separation in Music,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 8, 2018
Li, Y., Woodruff, J.: Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Trans. Audio Speech Lang. Process. 17(7), 1361–1371 (2009)
Zivanovic, M.: Harmonic bandwidth companding for separation of overlapping harmonics in pitched signals. IEEE/ACM Trans. Audio Speech Lang. Process. 23(5), 898–908 (2015)
Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15(5), 1135–1150 (2004)
Stoter, F. R., Liutkus, A., Badeau, R., Edler, B., Magron, P.: “Common fate model for unison source separation,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2016
Pishdadian, F., Pardo, B.: Multi-resolution common fate transform. IEEE/ACM Trans. Audio Speech Lang. Process. 27(2), 342–354 (2019)
Tachibana, H., Ono, N., Sagayama, S.: Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 228–237 (2014)
Brian, C.J.: Moore. Academic Press, An introduction to the psychology of hearing (1997)
The University of IOWA Musical Instrument Sample Database. [Online]. Available: http//:theremin.music.uiowa.edu/
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Wang, D.L., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley/IEEE Press, Hoboken (2006)
Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15(4), 1475–1487 (2007)
Serra, X.: “Musical sound modeling with sinusoids plus noise,” in Musical Signal Processing, 1997
McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoustic Speech Signal Process. 34(4), 744–754 (1986)
Fevotte, C., Godsill, S.J.: A Bayesian approach for blind separation of sparse sources. IEEE Trans. Audio Speech Lang. Process. 14(6), 2174–2188 (2006)
Ozerov, A., Philippe, P., Bimbot, F., Gribonval, R.: Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Trans. Audio Speech Lang. Process. 15(5), 1564–1578 (2007)
Casey, M. A., Westner, W.: “Separation of mixed audio sources by independent subspace analysis,” in Proceedings of International Computer Music Conference, 2000
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Abdallah, S.A., Plumbley, M.D.: Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans. Neural Networks 17(1), 1066–1074 (2007)
Huang, P. S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: “Joint optimization of masks and deep recurrent neural networks for monaural source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, 2015
Chandna, P., Miron, M., Janer, J., Gómez, E.: “Monoaural audio source separation using deep convolutional neural networks,” in 13th International Conference on Latent Variable Analysis and Signal Separation, 2017
Hershey, J. R., Chen, Z., Roux, J. L., Watanabe, S.: “Deep clustering: Discriminative embeddings for segmentation and separation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2016
Luo, Y., Chen, Z., Hershey, J. R., Roux, J. L., Mesgarani, N.: “Deep clustering and conventional networks for music separation: Stronger together,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2017
Grais, E.M., Roma, G., Simpson, A.J.R., Plumbley, M.D.: Two-stage single-channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(9), 1773–1783 (2017)
Every, M.R., Szymanski, J.E.: Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans. Audio Speech Lang. Process. 14(5), 1845–1856 (2006)
Virtanen, T., Klapuri, A.: “Separation of harmonic sounds using multipitch analysis and iterative parameter estimation,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 83–86, 2001
Bay, M., Beauchamp, J. W.: “Harmonic source separation using prestored spectra,” Independent Component Analysis and Blind Signal Separation, pp. 561–568, 2006
Duan, Z., Zhang, Y., Zhang, C., Shi, Z.: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)
Gong, Y., Shu, X., Tang, J.: “Recovering overlapping partials for monaural perfect harmonic musical sound separation using modified common amplitude modulation,” in Pacific Rim Conference on Multimedia, pp. 903–912, 2017
Jensen, K.: “Timbre models of musical sounds,” Ph.D. dissertation, University of Copenhagen, 1999
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by X. Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gong, Y., Dai, L. & Tang, J. A selection function for pitched instrument source separation. Multimedia Systems 28, 311–319 (2022). https://doi.org/10.1007/s00530-021-00836-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00836-z