Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Multiple Sound Source Localization With Steered Response Power Density and Hierarchical Grid Refinement

Published: 01 November 2018 Publication History

Abstract

Estimation of the direction-of-arrival DOA of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. The standard method for analyzing sound fields recorded using such arrays is steered response power SRP maps wherein the source DOA can be estimated as the steering direction that maximizes the output power of a maximally directive beam. This approach is computationally costly since it requires steering the beam in all possible directions. This paper presents an extension to SRP called steered response power density SRPD and an associated, signal-adaptive search method called hierarchical grid refinement for reducing the number of steering directions needed for DOA estimation. The proposed method can localize near-coherent as well as incoherent sources while jointly providing the number of prominent sources in the scene. It is shown to be robust to reverberation and additive white noise. An evaluation of the proposed method using simulations and real recordings under highly reverberant conditions as well as a comparison with the state-of-the-art methods are presented.

References

[1]
M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications. Berlin, Germany: Springer Science & Business Media, 2013.
[2]
B. Rafaely, Fundamentals of Spherical Array Processing. New York, NY, USA: Springer-Verlag, 2015.
[3]
D. P. Jarrett, E. A. P. Habets, and P. A. Naylor, Theory and Applications of Spherical Microphone Array Processing. New York, NY, USA: Springer, 2016.
[4]
H. Sun, H. Teutsch, E. Mabande, and W. Kellermann, "Robust localization of multiple sources in reverberant environments using eb-esprit with spherical microphone arrays," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Prague, Czech Republic, May 22-27, 2011, pp. 117-120.
[5]
D. P. Jarrett, E. A. P. Habets, and P. A. Naylor, "3D source localization in the spherical harmonic domain using a pseudointensity vector," in Proc. 18th Eur. Signal Process. Conf., Aalborg, Denmark, Aug. 23-27, 2010, pp. 442-446.
[6]
H. Sun, E. Mabande, K. Kowalczyk, and W. Kellermann, "Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing," J. Acoust. Soc. Amer., vol. 131, no. 4, pp. 2828-2840, 2012.
[7]
C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320-327, 1976.
[8]
H. Do, H. F. Silverman, and Y. Yu, "A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Honolulu, HI, USA, Apr. 16-20, 2007, pp. I-121-I-124.
[9]
H. Do and H. F. Silverman, "A fast microphone array SRP-PHAT source location implementation using coarse-to-fine region contraction (CFRC)," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 21-24 2007, pp. 295-298.
[10]
A. Marti, M. Cobos, J. J. Lopez, and J. Escolano, "A steered response power iterative method for high-accuracy acoustic source localization," J. Acoust. Soc. Amer., vol. 134, no. 4, pp. 2627-2630, Oct. 2013.
[11]
M. V. S. Lima et al., "A volumetric SRP with refinement step for sound source localization," IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1098- 1102, Aug. 2015.
[12]
L. O. Nunes et al., "A steered-response power algorithm employing hierarchical search for acoustic source localization using microphone arrays," IEEE Trans. Signal Process., vol. 62, no. 19, pp. 5171-5183, Oct. 2014.
[13]
J. J. L. M. Cobos and A. Marti, "A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling," IEEE Signal Process. Lett., vol. 18, no. 1, pp. 71-74, Jan. 2011.
[14]
D. Salvati, C. Drioli, and G. L. Foresti, "Exploiting a geometrically sampled grid in the steered response power algorithm for localization improvement," J. Acoust. Soc. Amer., vol. 141, no. 1, pp. 586-601, 2017.
[15]
Y. C. D. Yook and T. Lee, "Fast sound source localization using two-level search space clustering," IEEE Trans. Cybern., vol. 46, no. 1, pp. 20-26, Jan. 2016.
[16]
H. Teutsch and W. Kellermann, "Detection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Las Vegas, NV, USA, Mar. 31-Apr. 4, 2008, pp. 5276-5279.
[17]
C. Evers, A. H. Moore, and P. A. Naylor, "Multiple source localization in the spherical harmonic domain," in Proc. 14th Int. Workshop Acoust. Signal Enhancement, Antibes, France, Sep. 8-11 2014, pp. 258-262.
[18]
S. Rickard and O. Yilmaz, "On the approximate W-disjoint orthogonality of speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, Orlando, FL, USA, May 13-17, 2002, pp. 529-532.
[19]
S. Hafezi, A. H. Moore, and P. A. Naylor, "Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search," in Proc. 24th Eur. Signal Process. Conf., Budapest, Hungary, Aug. 29-Sep. 2, 2016, pp. 602-606.
[20]
S. Hafezi, A. H. Moore, and P. A. Naylor, "Augmented intensity vectors for direction of arrival estimation in the spherical harmonic domain," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1956-1968, Aug. 2017.
[21]
A. H. Moore, C. Evers, and P.A. Naylor, "Direction of arrival estimation in the spherical harmonic domain using subspace pseudointensity vectors," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 1, pp. 178- 192, Jan. 2017.
[22]
S. Hafezi, A. H. Moore, and P.A. Naylor, "Multi-source estimation consistency for improved multiple direction-of-arrival estimation," in Proc. 5th Workshop Hands-Free Speech Commun. Microphone Arrays, San Francisco, CA, USA, Mar. 1-3, 2017, pp. 81-85.
[23]
D. Khaykin and B. Rafaely, "Coherent signals direction-of-arrival estimation using a spherical microphone array: Frequency smoothing approach," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 18-21, 2009, pp. 221-224.
[24]
D. Khaykin and B. Rafaely, "Acoustic analysis by spherical microphone array processing of room impulse responses," J. Acoust. Soc. Amer., vol. 132, no. 1, pp. 261-270, Jul. 2012.
[25]
O. Nadiri and B. Rafaely, "Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 10, pp. 1494-1505, Oct. 2014.
[26]
A. Moore, C. Evers, P. A. Naylor, D. L. Alon, and B. Rafaely, "Direction of arrival estimation using pseudo-intensity vectors with direct-path dominance test," in Proc. 23rd Eur. Signal Process. Conf., Nice, France, Aug. 31-Sep. 4, 2015, pp. 2296-2300.
[27]
D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, "Real-time multiple sound source localization and counting using a circular microphone array," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2193-2206, Oct. 2013.
[28]
D. Pavlidi, S. Delikaris-Manias, V. Pulkki, and A. Mouchtaris, "3D DOA estimation ofmultiple sound sources based on spatially constrained beamforming driven by intensity vectors," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 2016, pp. 96-100.
[29]
S. Hafezi, A. H. Moore, and P. A. Naylor, "3D acoustic source localization in the spherical harmonic domain based on optimized grid search," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 415-419.
[30]
E. G. Williams, Fourier Acoustics. London, U.K.: Academic, 1999.
[31]
Z. Li, R. Duraiswami, E. Grassi, and L. S. Davis, "Flexible layout and optimal cancellation of the orthonormality error for spherical microphone arrays," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 4, Montreal, QC, Canada, May 17-21, 2004, pp. IV-41-IV-44.
[32]
B. Rafaely, B. Weiss, and E. Bachmat, "Spatial aliasing in spherical microphone arrays," IEEE Trans. Signal Process., vol. 55, no. 3, pp. 1003-1010, Mar. 2007.
[33]
B. Rafaely, "Plane-wave decomposition of the sound field on a sphere by spherical convolution," J. Acoust. Soc. Amer., vol. 116, no. 4, pp. 2149- 2157, Oct. 2004.
[34]
B. Rafaely, Y. Peled, M. Agmon, D. Khaykin, and E. Fisher, "Spherical microphone array beamforming," in Speech Processing in Modern Communication, G. S. Cohen I, Benesty J, Ed. Heidelberg, Germany: Springer-Verlag, 2010, pp. 281-305.
[35]
Z. Li and R. Duraiswami, "Flexible and optimal design of spherical microphone arrays for beamforming," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 702-714, Feb. 2007.
[36]
R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, Feb. 1990.
[37]
G. B. Arfken and H. J. Weber, Mathematical Methods For Physicists International Student Edition. Amsterdam, The Netherlands: Elsevier, Jul. 2005.
[38]
S. Böck and G. Widmer, "Maximum filter vibrato suppression for onset detection," in Proc. 16th Int. Conf. Digit. Audio Effects, Maynooth, Ireland, Sep. 2-5, 2013.
[39]
B. Rafaely and K. Alhaiany, "Speaker localization using direct path dominance test based on sound field directivity," Signal Process., vol. 143, no. 2, pp. 42-47, 2018.
[40]
K. M. Gorski et al., "HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere," Astrophys. J., vol. 622, no. 2, pp. 759-771, Apr. 2005.
[41]
M. Batty, "Spatial entropy," Geogr. Anal., vol. 6, no. 1, pp. 1-31, Jan. 1974.
[42]
K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge, MA, USA: MIT Press, 2012.
[43]
P. Soille, Morphological Image Analysis: Principles and Applications. Heidelberg, Germany: Springer-Verlag, 2004.
[44]
H. Samet, "Connected component labeling using quadtrees," J. ACM, vol. 28, no. 3, pp. 487-501, Mar. 1981.
[45]
D. Bradley and G. Roth, "Adaptive thresholding using the integral image," J. Graph. Tools, vol. 12, no. 2, pp. 13-21, Feb. 2007.
[46]
R. P. Brent, "Multiple-precision zero-finding methods and the complexity of elementary function evaluation," in Analytic Computational Complexity, J. F. Traub, Ed. London, U.K.: Academic, 1976, pp. 151-176.
[47]
A. Farina, "Simultaneous measurement of impulse response and distortion with a swept-sine technique," in Proc. 108th Audio Eng. Soc. Convention, Paris, France, Feb. 2000.
[48]
J. Meyer and G. W. Elko, "Spherical microphone arrays for 3D sound recording," in Proc. Audio Signal Process. Next-Gener. Multimedia Commun. Syst., B. J. Huang Y, Ed. New York, NY, USA: Springer, 2004, pp. 67-89.
[49]
Bang and Olufsen, "Music for Archimedes," Audio CD, 1992.
[50]
J. Pätynen, V. Pulkki, and T. Lokki, "Anechoic recording system for symphony orchestra," Acta Acust. United Acust., vol. 94, no. 6, pp. 856- 865, Jun. 2008.
[51]
International Telecommunications Union, "Objective measurement of active speech level," ITU-T Recommendation, 1993.
[52]
M. Brookes et al., "Voicebox: Speech processing toolbox for matlab," Software, [Online]. Available: www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, vol. 47, 1997.
[53]
H.W. Kuhn, "The Hungarian method for the assignment problem," Naval Res. Logist., vol. 2, no. 1-2, pp. 83-97, Jan. 1955.
[54]
F. Fahy, Sound Intensity, 2nd ed. Boca Raton, FL, USA: CRC Press, 1995.
[55]
B. Rafaely and D. Kolossa, "Speaker localization in reverberant rooms based on direct path dominance test statistics," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., New Orleans, LA, USA, Mar. 5-9, 2017, pp. 6120-6124.

Cited By

View all
  • (2021)Multi-Sound-Source Localization Using Machine Learning for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone ArrayJournal of Intelligent and Robotic Systems10.1007/s10846-021-01481-4103:3Online publication date: 27-Oct-2021
  1. Multiple Sound Source Localization With Steered Response Power Density and Hierarchical Grid Refinement

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE/ACM Transactions on Audio, Speech and Language Processing
      IEEE/ACM Transactions on Audio, Speech and Language Processing  Volume 26, Issue 11
      November 2018
      302 pages
      ISSN:2329-9290
      EISSN:2329-9304
      Issue’s Table of Contents

      Publisher

      IEEE Press

      Publication History

      Published: 01 November 2018
      Published in TASLP Volume 26, Issue 11

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Multi-Sound-Source Localization Using Machine Learning for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone ArrayJournal of Intelligent and Robotic Systems10.1007/s10846-021-01481-4103:3Online publication date: 27-Oct-2021

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media