Abstract
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then post-processing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)
Diederik, P.: Kingma and Max Welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Shenoi, B.A.: Introduction to Digital Signal Processing and Filter Design. Wiley, Hoboken (2006)
Wolf, G., Mallat, S., Shamma, S.: Rigid motion model for audio source separation. IEEE Trans. Signal Process. 64(7), 1822–1831 (2016)
Yang, N., Usman, M., He, X., Jan, M.A., Zhang, L.: Time-frequency filter bank: a simple approach for audio and music separation. IEEE Access 5, 27114–27125 (2017)
Serviere, C., Fabry, P.: Principal component analysis and blind source separation of modulated sources for electromechanical systems diagnostic. Mech. Syst. Signal Process. 19, 1293–1311 (2005)
Lee, S., Pang, H.-S.: Multichannel non-negative matrix factorisation based on alternating least squares for audio source separation system. Electron. Lett. 51(3), 197–198 (2015)
Chien, J., Hsieh, H.: Convex divergence ICA for blind source separation. IEEE Trans. Audio Speech Lang. Process. 20(1), 302–313 (2012)
Fu, G.-S., Phlypo, R., Anderson, M., Li, X.-L., Adal, T.: Blind source separation by entropy rate minimization. IEEE Trans. Signal Process. 62(16), 4245–4255 (2014)
Liu, B., Reju, V.G., Khong, A.W.H., Reddy, V.V.: A GMM post-filter for residual crosstalk suppression in blind source separation. IEEE Signal Process. Lett. 21(8), 942–946 (2014)
Hosseini, S., Deville, Y.: Blind separation of parametric nonlinear mixtures of possibly auto correlated and non-stationary sources. IEEE Trans. Signal Process. 62(24), 6521–6533 (2014)
Allen, J.B.: Short time spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 25(3), 235–238 (1977)
Okopal, G., Wisdom, S., Atlas, L.: Speech analysis with the strong uncorrelating transform. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1858–1868 (2015)
Kabal, P.: Time Windows for Linear Prediction of Speech. McGill University (2009)
Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)
Mai, V.-K., Pastor, D., Aïssa-El-Bey, A., Le-Bidan, R.: Robust estimation of non-stationary noise power spectrum for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 670–682 (2015)
Parande, P.G., Thomas, T.G.: A study of the cocktail party problem. In: International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017)
Oppenheim, A.V., Schafer, R.W., Buck, J.A.: Discrete-Time Signal Processing. Prentice Hall, Upper Saddle River (1999)
Blackman, R.B., Tukey, J.W.: The Measurement of Power Spectra from the Point of View of Communications Engineering. Dover Publications Publishing House, New York (1959)
Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall Publishing House, Upper Saddle River (2001)
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems 6, pp. 3–10 (1994)
Doersch, C.: Tutorial on variational autoencoders. arXiv:1606.05908 (2016)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Kullback, S.: Information Theory and Statistics. Wiley, Hoboken (1959)
Rumelhart David, E., Hinton Geoffrey, E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Do, H.D., Tran, S.T., Chau, D.T.: Speech source separation using variational autoencoder and bandpass filter. IEEE Access 8, 156219–156231 (2020)
Fisher William, M., Doddington George, R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: specifications and status (1986)
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Do, H.D., Tran, S.T., Chau, D.T. (2020). A Variational Autoencoder Approach for Speech Signal Separation. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-63007-2_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)