Abstract
Ambient noise suppression in a reverberant room is usually performed by the microphone array. The adaptive beamforming, whose typical representative is minimum variance distortionless (MVDR) beamformer, is an effective method for noise suppression. However, MVDR beamformer gives poor results in the real room because of its sensitivity to the steering error and the multipath wave propagation. In this paper we propose a noise suppression method based on assumption that the positions of the speakers in the reverberant room are roughly known. Noise reduction is realized by two MVDR beamformers directed toward each of the speakers. Adaptation of the MVDR beamformers are controlled by a speaker activity detector which decision is based on power transfer model of the multiple superdirective beamformers in combined diffuse and coherent noise field. The proposed voice activity detector also provides residual noise reduction. The proposed method and its robustness to steering error were tested on the model of simulated room as well as in real room environment. The improvement of the restored speech signal was evaluated by Signal to Noise Ratio Enhancement (SNRE) and by Perceptual evaluation of speech quality (PESQ) measure.
Similar content being viewed by others
Notes
Strictly speaking, it is not beamformer because it uses only one microphone, i.e. fourth microphone, with omnidirectional characteristic.
In experimental tests we used small value of λ, λ=0.25 which provides fast tracking of the power change.
In practice, there is one more hypothesis when both speakers speak simultaneously. In this case we assume that the louder speaker is active.
In this test case SNRE is ratio of speech energy during speech segment and residual noise in pause segment attenuated by (19).
References
Agnew J, Thornton MJ (2000) Just noticeable and objectionable group delays in digital hearing aids. J Am Acad Audiol 11(6):330–336
Air conditioner sounds https://www.soundsnap.com/tags/air_conditioner. Accessed: 2017-05-25
Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950
Bitzer J, Uwe Simmer K (2001) Superdirective microphone arrays. Microphone arrays. Springer, Berlin, pp 19–38
Cabañas-Molero P et al. (2018) Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis. Multimed Tools Appl: 1–23. https://doi.org/10.1007/s11042-018-5944-2
Defatta DJ, Lucas JG, Hodgkiss WS (1988) Digital signal processing: a system design approach
Farhang-Boroujeny B (1998) Adaptive filters: theory and applications. John Wiley & Sons, Inc., New York
Frost LO III (1972) An algorithm for linearly constrained adaptive array processing. Proc IEEE 60:926–935 (Frost, 1972)
Griffiths L, Jim CW (1982) An alternative approach to linearly constrained adaptive beamforming. IEEE Trans Antennas Propag 30(1):27–34
Hoshuyama O, Sugiyama A, Hirano A (1999) A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Trans Signal Process 47:2677–2684
ITU-T (2001) Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Int Telecomm Union
ITU-T Test Signals for Telecommunication Systems http://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm.Accessed: 2018-02-07
Jovičić TS, Šarić MZ, Turajlić RS (2005) Application of the maximum signal to interference criterion to the adaptive microphone array. Acoustics Research Letters Online (ARLO) 6(4):232–237
Marro C, Mahieux Y, Simmer UK (1998) Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Trans Speech Audio Process 6(3):240–259
McCowan AI, Bourlard H (2003) Microphone array post-filter based on noise field coherence. IEEE Transactions on Speech and Audio Processing, 11(6). (McCowan and Bourlard (2003)
Papp II, Šarić MZ, Jovičić TS, Teslić DN (2007) Adaptive microphone array for unknown desired speaker’s transfer function. JASA Express Lett 122(2):EL44–EL49
Parra L, Alvino C (2002) Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Trans Speech Audio Process 10(6):352–362
Parra L, Spence C (2000) Convolutive blind separation of non-stationary sources. IEEE Trans Speech Audio Process 8(3):320–327
Šarić MZ, Jovičić TS (2004) Adaptive microphone array based on pause detection. Acoust Res Lett Online (ARLO) 5(2):68–74
Šarić MZ, Simić PD, Jovičić TS (2011) A new post-filter algorithm combined with two-step adaptive beam former. Circ Syst Sign Process 30:483–500. https://doi.org/10.1007/s00034-010-9233-1, printed, CSSP(2011)
Simmer KU, Bitzer J, Marro C (2001) Post-filtering techniques. Microphone arrays. Springer, Berlin, pp 39–60
Spriet A, MooNEN MARC, Wouters J (2002) A multi-channel subband generalized singular value decomposition approach to speech enhancement. Trans Emerg Telecomm Technol 13(2):149–158
Van Trees HL (2004) Optimum array processing: part IV of detection, estimation, and modulation theory. John Wiley & Sons
Wang L, Ding H, Fuliang Y (2010) Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J Audio, Speech Music Process 1(2010):797962
White G, Louie GJ (2005) The audio dictionary: revised and expanded. University of Washington Press
Wölfel M, McDonough J (2009) Distant speech recognition. John Wiley & Sons
Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transport Syst 19(1):284–295
Zelinski R (1988) A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. Proc ICASSP88: 2578–2581
Acknowledgements
This research was supported by grants 178027, TR32032 and TR32035 from the Ministry of Education, Science and Technological Development of the Republic of Serbia.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Transfer of the acoustic power by diffuse noise field
Appendix: Transfer of the acoustic power by diffuse noise field
Transfer of the diffuse component of the acoustic power from the acoustic source to the output of the beamformer is defined by linear transfer factor.
where Pdiff, k is total diffuse power at the output of the beamformer k, Psis the power of the acoustic source measured at distance 1 m. Taking into account directivity of the microphone array defined by beam pattern hк(j, ϕ, θ), diffuse power component is.
where Dk(j) is directivity factor, Pdif _ array is diffuse power component at microphone array position. Diffuse power is uniformly distributed in the room. It is equal to the direct path power at critical distance dc.
Substituting (24), (23) into (22) we obtain.
Rights and permissions
About this article
Cite this article
Šarić, Z., Subotić, M., Bilibajkić, R. et al. Bidirectional microphone array with adaptation controlled by voice activity detector based on multiple beamformers. Multimed Tools Appl 78, 15235–15254 (2019). https://doi.org/10.1007/s11042-018-6895-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6895-3