Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

582 Accesses
Explore all metrics

Abstract

Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel parallel accelerated CRPF algorithm

Article 17 October 2019

A particle packing parallel geometric method using GPU

Article 22 November 2020

Kokkos-Based Implementation of MPCD on Heterogeneous Nodes

References

Ristic, B., Arulampalam, S., & Gordon, N. (2004). Beyond the Kalman filter: particle filters for tracking applications. In A tutorial on particle filters (pp. 35–65). Boston-London: Artech House.
Google Scholar
Hendeby, G., Hol, J. D., Karlsson, R., & Gustafsson, F. (2007). A graphics processing unit implementation of the particle filter. Signal Processing Conference, 2007 15th European, 1639–1643.
Hendeby, G., Karlsson, R., & Gustafsson, F. (2010). Particle filtering: the need for speed. EURASIP Journal on Advances in Signal processing, 2010(22), 22:1–22:9.
MATH Google Scholar
Gong, P., Basciftci, J. D., & Ozguner, F. (2012). A parallel resampling algorithm for particle filtering on shared-memory architectures. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 I.E. 26th International, Shanghai, 1477–1483.
Hwang, K., & Sung, W. (2013). Load balanced resampling for real-time particle filtering on graphics processing units. IEEE Transactions on Signal Processing, 61(2), 411–419.
Article MathSciNet Google Scholar
Wu, Y., Wang, J., & Cao, Y. H. (2015). Particle filter based on iterated importance density function and parallel resampling. Journal of Central South University, 22(9), 3427–3439.
Article Google Scholar
Chao, M. A., Chu, C. Y., Chao, C. H., & Wu, A. Y. (2010). Efficient parallelized particle filter design on CUDA. Signal Processing Systems (SIPS), 2010 I.E. workshop on, San Francisco, CA, 299–304.
Chitchian, M., Simonetto, A., van Amesfoort, A. S., & Keviczky, T. (2013). Distributed computation particle filters on GPU architectures for real-time control applications. IEEE Transactions on Control Systems Technology, 21(6), 2224–2238.
Article Google Scholar
Shabany, M. (2012). An efficient architecture for sequential Monte Carlo receivers in wireless flat-fading channels. Journal of Signal Processing Systems, 68(3), 303–315.
Article Google Scholar
Pan, Y., Zheng, N., Tian, Q., Yan, X., & Huan, R. (2013). Hierarchical resampling algorithm and architecture for distributed particle filters. Journal of Signal Processing Systems, 71(3), 237–246.
Article Google Scholar
Bolic, M., Djuric, P. M., & Hong, S. (2005). Resampling algorithms and architectures for distributed particle filters. IEEE Transactions on Signal Processing, 53(7), 2442–2450.
Article MathSciNet MATH Google Scholar
Balasingam, B., Bolić, M., Djurić, P. M., & Míguez, J. (2011). Efficient distributed resampling for particle filters. Acoustics, speech and signal processing (ICASSP), 2011 I.E. international Conference on, Prague, Czech Republic, 3772–3775.
Tian, Q., Pan, Y., Salcic, Z., & Huan, R. (2016). DART: distributed particle filter algorithm with resampling tree for ultimate real-time capability. Journal of Signal Processing Systems, 1–14. doi:10.1007/s11265-016-1110-0.
Hong, S., Chin, S. S., Djurić, P. M., & Bolić, M. (2006). Design and implementation of flexible resampling mechanism for high-speed parallel particle filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 44(1–2), 47–62.
Article MATH Google Scholar
Li, T., Bolic, M., & Djuric, P. M. (2015). Resampling methods for particle filtering: classification, implementation, and strategies. IEEE Signal Processing Magazine, 32(3), 70–86.
Article Google Scholar
Murray, L. M., Lee, A., & Jacob, P. E. (2016). Parallel resampling in the particle filter. Journal of Computational and Graphical Statistics, 25(3), 789–805.
Murray, L. (2012). GPU acceleration of the particle filter: the Metropolis resampler, arXiv:1202.6163v1. https://arxiv.org/abs/1202.6163.
Liu, S., Mingas, G., & Bouganis, C. S. (2014). Parallel resampling for particle filters on FPGAs. Field-programmable technology (FPT), 2014 international Conference on, Shanghai, 191–198.
Aguilera, A. R., Salas, A. L., Perandrés, D. M., & Otaduy, M. A. (2015). A parallel resampling method for interactive deformation of volumetric models. Computers & Graphics, 53, 147–155.
Article Google Scholar
Jacob, P. E., Murray, L. M., & Rubenthaler, S. (2015). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.
Article MathSciNet MATH Google Scholar
Cook, S. (2013). CUDA programming: a developer’s guide to parallel computing with GPUs. Waltham: Morgan Kaufmann.
Google Scholar
NVIDIA. (2013). Tesla K40 GPU active accelerator: board specification. https://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf.
NVIDIA. (2014). NVIDIA’s next generation CUDA compute architecture: Kepler GK110/210. http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf.
NVIDIA. (2016). CUDA C best practices guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf.
NVIDIA. (2015). CURAND library: programming guide. http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf.
Li, T., Villarrubia, G., Sun, S., Corchado, J. M., & Bajo, J. (2015). Resampling methods for particle filtering: identical distribution, a new method, and comparable study. Frontiers of Information Technology & Electronic Engineering, 16(11), 969–984.
Article Google Scholar
Bowman, K. O., & Shenton, L. R. (1988). Properties of estimators for the gamma distribution. New York: Marcel Dekker.
MATH Google Scholar
Ropella, K. M. (2007). Introduction to statistics for biomedical engineers. San Rafael: Morgan & Claypool Publisher.
Google Scholar
Harris, M. (2007). Optimizing parallel reduction in CUDA, NVIDIA developer technology. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf.
Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Article Google Scholar
NVIDIA. (2017). NVCC. http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc.
NVIDIA. (2017). Kepler tuning guide. http://docs.nvidia.com/cuda/kepler-tuning-guide.
NVIDIA. (2017). Profiler. http://docs.nvidia.com/cuda/profiler-users-guide.

Download references

Acknowledgement

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used in this work. This work was also supported in part by Republic of Turkey Ministry of Development, Turkey under grant BAP-08-11-DPT2002K120510. Furthermore, the first author was supported in part by The Scientific and Technological Research Council of Turkey (TUBITAK) under program 2214/A. Thanks are due to Prof. Fusun Ozguner for her support for the first author during his visit to Ohio State. The authors are grateful to Mr. Lawrence Murray for his support in Metropolis resampling implementation and bias calculation.

Author information

Authors and Affiliations

Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
Özcan Dülger & Halit Oğuztüzün
Center for Automotive Research, The Ohio State University, Columbus, OH, USA
Özcan Dülger
Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey
Mübeccel Demirekler

Authors

Özcan Dülger
View author publications
You can also search for this author in PubMed Google Scholar
Halit Oğuztüzün
View author publications
You can also search for this author in PubMed Google Scholar
Mübeccel Demirekler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Özcan Dülger.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

ESM 1

(RAR 38 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dülger, Ö., Oğuztüzün, H. & Demirekler, M. Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit. J Sign Process Syst 90, 433–447 (2018). https://doi.org/10.1007/s11265-017-1254-6

Download citation

Received: 25 May 2016
Revised: 16 April 2017
Accepted: 18 May 2017
Published: 10 June 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11265-017-1254-6

Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel parallel accelerated CRPF algorithm

A particle packing parallel geometric method using GPU

Kokkos-Based Implementation of MPCD on Heterogeneous Nodes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel parallel accelerated CRPF algorithm

A particle packing parallel geometric method using GPU

Kokkos-Based Implementation of MPCD on Heterogeneous Nodes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation