Abstract
Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.
Similar content being viewed by others
References
Ristic, B., Arulampalam, S., & Gordon, N. (2004). Beyond the Kalman filter: particle filters for tracking applications. In A tutorial on particle filters (pp. 35–65). Boston-London: Artech House.
Hendeby, G., Hol, J. D., Karlsson, R., & Gustafsson, F. (2007). A graphics processing unit implementation of the particle filter. Signal Processing Conference, 2007 15th European, 1639–1643.
Hendeby, G., Karlsson, R., & Gustafsson, F. (2010). Particle filtering: the need for speed. EURASIP Journal on Advances in Signal processing, 2010(22), 22:1–22:9.
Gong, P., Basciftci, J. D., & Ozguner, F. (2012). A parallel resampling algorithm for particle filtering on shared-memory architectures. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 I.E. 26th International, Shanghai, 1477–1483.
Hwang, K., & Sung, W. (2013). Load balanced resampling for real-time particle filtering on graphics processing units. IEEE Transactions on Signal Processing, 61(2), 411–419.
Wu, Y., Wang, J., & Cao, Y. H. (2015). Particle filter based on iterated importance density function and parallel resampling. Journal of Central South University, 22(9), 3427–3439.
Chao, M. A., Chu, C. Y., Chao, C. H., & Wu, A. Y. (2010). Efficient parallelized particle filter design on CUDA. Signal Processing Systems (SIPS), 2010 I.E. workshop on, San Francisco, CA, 299–304.
Chitchian, M., Simonetto, A., van Amesfoort, A. S., & Keviczky, T. (2013). Distributed computation particle filters on GPU architectures for real-time control applications. IEEE Transactions on Control Systems Technology, 21(6), 2224–2238.
Shabany, M. (2012). An efficient architecture for sequential Monte Carlo receivers in wireless flat-fading channels. Journal of Signal Processing Systems, 68(3), 303–315.
Pan, Y., Zheng, N., Tian, Q., Yan, X., & Huan, R. (2013). Hierarchical resampling algorithm and architecture for distributed particle filters. Journal of Signal Processing Systems, 71(3), 237–246.
Bolic, M., Djuric, P. M., & Hong, S. (2005). Resampling algorithms and architectures for distributed particle filters. IEEE Transactions on Signal Processing, 53(7), 2442–2450.
Balasingam, B., Bolić, M., Djurić, P. M., & Míguez, J. (2011). Efficient distributed resampling for particle filters. Acoustics, speech and signal processing (ICASSP), 2011 I.E. international Conference on, Prague, Czech Republic, 3772–3775.
Tian, Q., Pan, Y., Salcic, Z., & Huan, R. (2016). DART: distributed particle filter algorithm with resampling tree for ultimate real-time capability. Journal of Signal Processing Systems, 1–14. doi:10.1007/s11265-016-1110-0.
Hong, S., Chin, S. S., Djurić, P. M., & Bolić, M. (2006). Design and implementation of flexible resampling mechanism for high-speed parallel particle filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 44(1–2), 47–62.
Li, T., Bolic, M., & Djuric, P. M. (2015). Resampling methods for particle filtering: classification, implementation, and strategies. IEEE Signal Processing Magazine, 32(3), 70–86.
Murray, L. M., Lee, A., & Jacob, P. E. (2016). Parallel resampling in the particle filter. Journal of Computational and Graphical Statistics, 25(3), 789–805.
Murray, L. (2012). GPU acceleration of the particle filter: the Metropolis resampler, arXiv:1202.6163v1. https://arxiv.org/abs/1202.6163.
Liu, S., Mingas, G., & Bouganis, C. S. (2014). Parallel resampling for particle filters on FPGAs. Field-programmable technology (FPT), 2014 international Conference on, Shanghai, 191–198.
Aguilera, A. R., Salas, A. L., Perandrés, D. M., & Otaduy, M. A. (2015). A parallel resampling method for interactive deformation of volumetric models. Computers & Graphics, 53, 147–155.
Jacob, P. E., Murray, L. M., & Rubenthaler, S. (2015). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.
Cook, S. (2013). CUDA programming: a developer’s guide to parallel computing with GPUs. Waltham: Morgan Kaufmann.
NVIDIA. (2013). Tesla K40 GPU active accelerator: board specification. https://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf.
NVIDIA. (2014). NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf.
NVIDIA. (2016). CUDA C best practices guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf.
NVIDIA. (2015). CURAND library: programming guide. http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf.
Li, T., Villarrubia, G., Sun, S., Corchado, J. M., & Bajo, J. (2015). Resampling methods for particle filtering: identical distribution, a new method, and comparable study. Frontiers of Information Technology & Electronic Engineering, 16(11), 969–984.
Bowman, K. O., & Shenton, L. R. (1988). Properties of estimators for the gamma distribution. New York: Marcel Dekker.
Ropella, K. M. (2007). Introduction to statistics for biomedical engineers. San Rafael: Morgan & Claypool Publisher.
Harris, M. (2007). Optimizing parallel reduction in CUDA, NVIDIA developer technology. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf.
Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
NVIDIA. (2017). NVCC. http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc.
NVIDIA. (2017). Kepler tuning guide. http://docs.nvidia.com/cuda/kepler-tuning-guide.
NVIDIA. (2017). Profiler. http://docs.nvidia.com/cuda/profiler-users-guide.
Acknowledgement
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used in this work. This work was also supported in part by Republic of Turkey Ministry of Development, Turkey under grant BAP-08-11-DPT2002K120510. Furthermore, the first author was supported in part by The Scientific and Technological Research Council of Turkey (TUBITAK) under program 2214/A. Thanks are due to Prof. Fusun Ozguner for her support for the first author during his visit to Ohio State. The authors are grateful to Mr. Lawrence Murray for his support in Metropolis resampling implementation and bias calculation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
ESM 1
(RAR 38 kb)
Rights and permissions
About this article
Cite this article
Dülger, Ö., Oğuztüzün, H. & Demirekler, M. Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit. J Sign Process Syst 90, 433–447 (2018). https://doi.org/10.1007/s11265-017-1254-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-017-1254-6