Abstract
Sophisticated denoising algorithms are used to improve image quality in the Magnetic Resonance Imaging field. Of course, better results are obtained by implementing computationally expensive schemes. In this paper, we consider the Overcomplete Local Principal Component Analysis (OLPCA) method for image denoising and its main issues. More in detail, we investigated the impact of the Singular Value Decomposition on the OLPCA algorithm and its high computational cost. Moreover, we propose a fine-to-coarse parallelization strategy in order to exploit a parallel hybrid architecture and we implement a multilevel parallel software as a smart combination between codes using NVIDIA cuBLAS library for Graphic Processor Units (GPUs) and the standard Message Passing Interface library for cluster programming. Experimental results show improvements in terms of execution time with a promising speed up with respect to the CPU and our old GPU versions.
Similar content being viewed by others
References
Abate, D., Ambrosino, F., Aprea, G., Bastianelli, T., Beone, F., Bertini, R., Bracco, G., Calosso, B., Caporicci, M., Chinnici, M., Colavincenzo, A., Cucurullo, A., D’Angelo, P., De Michele, P., De Rosa, M., Del Giudice, E., Funel, A., Furini, G., Giammattei, D., Giusepponi, S., Guadagni, R., Guarnieri, G., Italiano, A., Magagnino, S., Mariano, A., Mencuccini, G., Mercuri, C., Migliori, S., Ornelli, P., Palombi, F., Pecoraro, S., Perozziello, A., Pierattini, S., Podda, S., Poggi, F., Ponti, G., Quintiliani, A., Rocchi, A., Scio, C., Simoni, F., Vita, A.: The role of medium size facilities in the hpc ecosystem: the case of the new cresco4 cluster integrated in the eneagrid infrastructure. In: International Conference on High Performance Computing and Simulation, pp. 1030–1033, HPCS 2014, Bologna, Italy, 21–25 July (2014). doi:10.1109/HPCSim.2014.6903807
Berry, M., Sameh, A.: Special issue on parallel algorithms for numerical linear algebra an overview of parallel algorithms for the singular value and symmetric eigenvalue problems. J. Comput. Appl. Math. 27(1), 191–213 (1989). doi:10.1016/0377-0427(89)90366-X
Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005). doi:10.1137/040616024
Buades, A., Coll, B., Morel, J.: Image denoising methods. A new nonlocal principle. SIAM Rev. 52(1), 113–147 (2010). doi:10.1137/090773908
Bydder, M., Du, J.: Noise reduction in multiple-echo data sets using singular value decomposition. Magn. Reson. Imaging 24(7), 849–856 (2006). doi:10.1016/j.mri.2006.03.006. http://www.sciencedirect.com/science/article/pii/S0730725X06001317
Cafieri, S., D’Apuzzo, M., De Simone, V., Di Serafino, D., Toraldo, G.: Convergence analysis of an inexact potential reduction method for convex quadratic programming. J. Optim. Theory Appl. 135(3), 355–366 (2007). doi:10.1007/s10957-007-9264-3
Campagna, R., Crisci, S., Cuomo, S., De Michele, P., Galletti, A., Marcellino, L., Murano, A.: A novel split Bregman algorithm for MRI denoising task in an e-health system. In: ACM International Conference Proceeding Series. Proceedings of the 9th PETRA Conference will held on the Island of Corfu, Greece at the Corfu Holiday Palace Hotel from June 29 to July 1 (2016). doi:10.1145/2910674.2910692. http://dl.acm.org/citation.cfm?doid=2910674.2910692
Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu-parallel algorithm for ecg signal denoting based on the nlm method. In: 30th IEEE International Conference on Advanced Information Networking and Applications, AINA 2016, Crans-Montana, Switzerland, March 23–25, 2016, pp. 35–39 (2016). doi:10.1109/WAINA.2016.110. http://doi.ieeecomputersociety.org/10.1109/WAINA.2016.110
Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu parallel implementation of the local principal component analysis overcomplete method for dw image denoising. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 26–31 (2016). The Twenty-First IEEE Symposium on Computers and Communication, 27–30 June 2016, Messina, Italy. doi:10.1109/ISCC.2016.7543709
Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: Local principal component analysis overcomplete method: a gpu parallel implementation combining shared and global memories. In: International Conference on High Performance Computing and Simulation, HPCS 2016, Innsbruck, Austria, July 18–22, 2016, pp. 81–87 (2016). doi:10.1109/HPCSim.2016.7568319
Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A parallel pde-based numerical algorithm for computing the optical flow in hybrid systems. J. Comput. Sci. (2017). doi:10.1016/j.jocs.2017.03.011
Cuomo, S., De Michele, P., Maiorano, F., Marcellino, L.: Advances on P2P, parallel, grid, cloud and internet computing. Lecture Notes on Data Engineering and Communications Technologies, vol. 1, chap. GPU Profiling of Singular Value Decomposition in OLPCA Method for Image Denoising, pp. 707–716. Springer International Publishing (2017). doi:10.1007/978-3-319-49109-7_68. Proceedings of the 11th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing 3PGCIC-2016 November 5–7, 2016, Soonchunhyang University, Asan, Korea. Online ISBN: 978-3-319-49109-7
Cuomo, S., Galletti, A., Giunta, G., Marcellino, L.: Toward a multi-level parallel framework on gpu cluster with petsc-cuda for pde-based optical flow computation. pp. 170–179 (2015). doi:10.1016/j.procs.2015.05.220. http://www.scopus.com/inward/record.url?eid=2-s2.0-84939155665&partnerID=40&md5=ddcb2162cbc29925e582fc9498463059
Cuomo, S., Galletti, A., Marcellino, L.: A gpu algorithm in a distributed computing system for 3d MRI denoising. In: F. Xhafa, L. Barolli, F. Messina, M. R Ogilla (eds.) 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Krakow, Poland, pp. 557–562, November 4–6 (2015). doi:10.1109/3PGCIC.2015.77
Cuomo, S., Michele, P.D., Piccialli, F.: 3d data denoising via nonlocal means filter by using parallel GPU strategies. Comput. Math. Methods Med. 523, 1–523. doi:10.1155/2014/523862
D’Amore, L., Arcucci, R., Marcellino, L., Murli, A.: A parallel three-dimensional variational data assimilation scheme. AIP Conf. Proc. 1389(1), 1829–1831 (2011). doi:10.1063/1.3636965
D’Amore, L., Laccetti, G., Romano, D., Scotti, G., Murli, A.: Towards a parallel component in a gpucuda environment: a case study with the l-bfgs harwell routine. Int. J. Comput. Math. 92(1), 59–76 (2015). doi:10.1080/00207160.2014.899589
de Angelis, P.L., Bomze, I.M., Toraldo, G.: Ellipsoidal approach to box-constrained quadratic problems. J. Glob. Optim. 28(1), 1–15 (2004). doi:10.1023/B:JOGO.0000006654.34226.fe
D’Amore, L., Marcellino, L., Mele, V., Romano, D.: Deconvolution of 3d fluorescence microscopy images using graphics processing units. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7203 LNCS(PART 1), pp. 690–699 (2012). doi:10.1007/978-3-642-31464-3_70
De Asmundis, R., di Serafino, D., Hager, W., Toraldo, G., Zhang, H.: An efficient gradient method using the yuan steplength. Comput. Optim. Appl. 59(3), 541–563 (2014). doi:10.1007/s10589-014-9669-5
Gmez, S., Severino, G., Randazzo, L., Toraldo, G., Otero, J.: Identification of the hydraulic conductivity using a global optimization method. Agric. Water Manag. 96(3), 504–510 (2009). doi:10.1016/j.agwat.2008.09.025
Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous gpu and multicore based systems. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8384 LNCS(PART 1), pp. 704–713 (2014). doi:10.1007/978-3-642-55224-3_66
Manjón, J., Coupé, P., Concha, L., Buades, A., Collins, D., Robles, M.: Diffusion weighted image denoising using overcomplete local pca. PLoS ONE 8(9) (2013). doi:10.1371/journal.pone.0073021. http://www.scopus.com/inward/record.url?eid=2-s2.0-84883366803&partnerID=40&md5=467a3af41b50d17486ab1385ccf8e816
Manjón, J.V., Coupé, P., Martí-Bonmatí, L., Collins, D.L., Robles, M.: Adaptive non-local means denoising of MR images with spatially varying noise levels. J. Magn. Reson. Imaging 31(1), 192–203 (2010). doi:10.1002/jmri.22003. http://www.hal.inserm.fr/inserm-00454564
Muresan, D.D., Parks, T.W.: Orthogonal, exactly periodic subspace decomposition. IEEE Trans. Signal Process. 51(9), 2270–2279 (2003). doi:10.1109/TSP.2003.815381
Palma, G., Piccialli, F., Michele, P.D., Cuomo, S., Comerci, M., Borrelli, P., Alfano, B.: 3d non-local means denoising via multi-gpu. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, September 8–11, 2013, pp. 495–498 (2013). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6644045
Piccialli, F., Cuomo, S., De Michele, P.: A regularized mri image reconstruction based on hessian penalty term on cpu/gpu systems, pp. 2643–2646 (2013). doi:10.1016/j.procs.2013.06.001. http://www.scopus.com/inward/record.url?eid=2-s2.0-84892506892&partnerID=40&md5=cc785a43da0426b134b5a4e05bc3ad5e
Poon, P., Wei-Ren, N., Sridharan, V.: Image denoising with singular value decompositon and principal component analysis. http://www.u.arizona.edu/~ppoon/ImageDenoisingWithSVD.pdf (2009)
Song, F., Dongarra, J.: A scalable approach to solving dense linear algebra problems on hybrid cpu–gpu systems. Concurr. Comput. 27(14), 3702–3723 (2015). doi:10.1002/cpe.3403
Tristán-Vega, A., Aja-Fernández, S.: DWI filtering using joint information for DTI and HARDI. Med. Image Anal. 14(2), 205–218 (2010). doi:10.1016/j.media.2009.11.001
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
De Michele, P., Maiorano, F., Marcellino, L. et al. A GPU Implementation of OLPCA Method in Hybrid Environment. Int J Parallel Prog 46, 528–542 (2018). https://doi.org/10.1007/s10766-017-0505-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0505-2