Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  1. Ristic, B., Arulampalam, S., & Gordon, N. (2004). Beyond the Kalman filter: particle filters for tracking applications. In A tutorial on particle filters (pp. 35–65). Boston-London: Artech House.

    Google Scholar 

  2. Hendeby, G., Hol, J. D., Karlsson, R., & Gustafsson, F. (2007). A graphics processing unit implementation of the particle filter. Signal Processing Conference, 2007 15th European, 1639–1643.

  3. Hendeby, G., Karlsson, R., & Gustafsson, F. (2010). Particle filtering: the need for speed. EURASIP Journal on Advances in Signal processing, 2010(22), 22:1–22:9.

    MATH  Google Scholar 

  4. Gong, P., Basciftci, J. D., & Ozguner, F. (2012). A parallel resampling algorithm for particle filtering on shared-memory architectures. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 I.E. 26th International, Shanghai, 1477–1483.

  5. Hwang, K., & Sung, W. (2013). Load balanced resampling for real-time particle filtering on graphics processing units. IEEE Transactions on Signal Processing, 61(2), 411–419.

    Article  MathSciNet  Google Scholar 

  6. Wu, Y., Wang, J., & Cao, Y. H. (2015). Particle filter based on iterated importance density function and parallel resampling. Journal of Central South University, 22(9), 3427–3439.

    Article  Google Scholar 

  7. Chao, M. A., Chu, C. Y., Chao, C. H., & Wu, A. Y. (2010). Efficient parallelized particle filter design on CUDA. Signal Processing Systems (SIPS), 2010 I.E. workshop on, San Francisco, CA, 299–304.

  8. Chitchian, M., Simonetto, A., van Amesfoort, A. S., & Keviczky, T. (2013). Distributed computation particle filters on GPU architectures for real-time control applications. IEEE Transactions on Control Systems Technology, 21(6), 2224–2238.

    Article  Google Scholar 

  9. Shabany, M. (2012). An efficient architecture for sequential Monte Carlo receivers in wireless flat-fading channels. Journal of Signal Processing Systems, 68(3), 303–315.

    Article  Google Scholar 

  10. Pan, Y., Zheng, N., Tian, Q., Yan, X., & Huan, R. (2013). Hierarchical resampling algorithm and architecture for distributed particle filters. Journal of Signal Processing Systems, 71(3), 237–246.

    Article  Google Scholar 

  11. Bolic, M., Djuric, P. M., & Hong, S. (2005). Resampling algorithms and architectures for distributed particle filters. IEEE Transactions on Signal Processing, 53(7), 2442–2450.

    Article  MathSciNet  MATH  Google Scholar 

  12. Balasingam, B., Bolić, M., Djurić, P. M., & Míguez, J. (2011). Efficient distributed resampling for particle filters. Acoustics, speech and signal processing (ICASSP), 2011 I.E. international Conference on, Prague, Czech Republic, 3772–3775.

  13. Tian, Q., Pan, Y., Salcic, Z., & Huan, R. (2016). DART: distributed particle filter algorithm with resampling tree for ultimate real-time capability. Journal of Signal Processing Systems, 1–14. doi:10.1007/s11265-016-1110-0.

  14. Hong, S., Chin, S. S., Djurić, P. M., & Bolić, M. (2006). Design and implementation of flexible resampling mechanism for high-speed parallel particle filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 44(1–2), 47–62.

    Article  MATH  Google Scholar 

  15. Li, T., Bolic, M., & Djuric, P. M. (2015). Resampling methods for particle filtering: classification, implementation, and strategies. IEEE Signal Processing Magazine, 32(3), 70–86.

    Article  Google Scholar 

  16. Murray, L. M., Lee, A., & Jacob, P. E. (2016). Parallel resampling in the particle filter. Journal of Computational and Graphical Statistics, 25(3), 789–805.

  17. Murray, L. (2012). GPU acceleration of the particle filter: the Metropolis resampler, arXiv:1202.6163v1. https://arxiv.org/abs/1202.6163.

  18. Liu, S., Mingas, G., & Bouganis, C. S. (2014). Parallel resampling for particle filters on FPGAs. Field-programmable technology (FPT), 2014 international Conference on, Shanghai, 191–198.

  19. Aguilera, A. R., Salas, A. L., Perandrés, D. M., & Otaduy, M. A. (2015). A parallel resampling method for interactive deformation of volumetric models. Computers & Graphics, 53, 147–155.

    Article  Google Scholar 

  20. Jacob, P. E., Murray, L. M., & Rubenthaler, S. (2015). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.

    Article  MathSciNet  MATH  Google Scholar 

  21. Cook, S. (2013). CUDA programming: a developer’s guide to parallel computing with GPUs. Waltham: Morgan Kaufmann.

    Google Scholar 

  22. NVIDIA. (2013). Tesla K40 GPU active accelerator: board specification. https://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf.

  23. NVIDIA. (2014). NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf.

  24. NVIDIA. (2016). CUDA C best practices guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf.

  25. NVIDIA. (2015). CURAND library: programming guide. http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf.

  26. Li, T., Villarrubia, G., Sun, S., Corchado, J. M., & Bajo, J. (2015). Resampling methods for particle filtering: identical distribution, a new method, and comparable study. Frontiers of Information Technology & Electronic Engineering, 16(11), 969–984.

    Article  Google Scholar 

  27. Bowman, K. O., & Shenton, L. R. (1988). Properties of estimators for the gamma distribution. New York: Marcel Dekker.

    MATH  Google Scholar 

  28. Ropella, K. M. (2007). Introduction to statistics for biomedical engineers. San Rafael: Morgan & Claypool Publisher.

    Google Scholar 

  29. Harris, M. (2007). Optimizing parallel reduction in CUDA, NVIDIA developer technology. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf.

  30. Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.

    Article  Google Scholar 

  31. NVIDIA. (2017). NVCC. http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc.

  32. NVIDIA. (2017). Kepler tuning guide. http://docs.nvidia.com/cuda/kepler-tuning-guide.

  33. NVIDIA. (2017). Profiler. http://docs.nvidia.com/cuda/profiler-users-guide.

Download references

Acknowledgement

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used in this work. This work was also supported in part by Republic of Turkey Ministry of Development, Turkey under grant BAP-08-11-DPT2002K120510. Furthermore, the first author was supported in part by The Scientific and Technological Research Council of Turkey (TUBITAK) under program 2214/A. Thanks are due to Prof. Fusun Ozguner for her support for the first author during his visit to Ohio State. The authors are grateful to Mr. Lawrence Murray for his support in Metropolis resampling implementation and bias calculation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Özcan Dülger.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

ESM 1

(RAR 38 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dülger, Ö., Oğuztüzün, H. & Demirekler, M. Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit. J Sign Process Syst 90, 433–447 (2018). https://doi.org/10.1007/s11265-017-1254-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1254-6

Keywords

Navigation