Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Hybrid static–dynamic selection of implementation alternatives in heterogeneous environments

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the emergence of heterogeneous architectures, developing parallel software has become an increasingly complex task. The ability of using multiple devices in a single application, such as CPUs, accelerators, or coprocessors, has turned the implementation and optimization tasks into a challenging process, which comes along with a variety of difficulties. The inherent complexities of the parallel algorithm, its multiple implementations, and the mapping possibilities onto one of the available processors are just examples of how intricate these tasks can become. To alleviate these issues, this paper proposes a hybrid static–dynamic selector to better exploit resources provided by heterogeneous systems. Specifically, this framework generates at compile time a decision tree based on historical information for selecting the implementation that performs best at run-time. To evaluate the benefits of this approach, we analyze the performance with two use cases: the general matrix–matrix multiplication and an image processing medical application. The experimental results demonstrate that our proposed selector enhances performance and minimizes efforts needed to tune applications. We proved that our solution improves from 10 to 24% the overall application performance in comparison with other similar approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Brodtkorb AR, Dyken C, Hagen TR, Hjelmervik JM, Storaasli OO (2010) State-of-the-art in heterogeneous computing. Sci Program 18(1):1–33. doi:10.1155/2010/540159

    Google Scholar 

  2. Canales-Rodríguez EJ, Daducci A, Sotiropoulos SN, Caruyer E, Aja-Fernández S, Radua J, Mendizabal JMY, Iturria-Medina Y, Melie-García L, Alemán-Gómez Y et al (2015) Spherical deconvolution of multichannel diffusion MRI data with non-Gaussian noise models and spatial regularization. PloS One 10(10):e0138910

    Article  Google Scholar 

  3. clMathLibraries (2015) clBLAS. https://github.com/clMathLibraries/clBLAS

  4. Daoud MI, Kharma N (2006) Efficient compile-time task scheduling for heterogeneous distributed computing systems. In: 12th International Conference on Parallel and Distributed Systems—(ICPADS’06), vol 1, 9 pp

  5. Dastgeer U, Li L, Kessler C (2013) Adaptive implementation selection in the SkePU skeleton programming library. In: Advanced Parallel Processing Technologies: 10th International Symposium, APPT 2013, Stockholm, Sweden, 27–28 August 2013, Revised Selected Papers. Springer, Berlin, pp 170–183

  6. Duran A, Ayguadé E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21:173–193. doi:10.1142/S0129626411000151

    Article  MathSciNet  Google Scholar 

  7. Garcia-Blas J (2016) Parallel high angular resolution diffusion imaging toolbox. https://bitbucket.org/fjblas/phardi

  8. Garcia-Blas J, Dolz MF, García JD, Carretero J, Daducci A, Alemán-Gómez Y, Canales-Rodríguez EJ (2016) Porting Matlab applications to high-performance C++ codes: CPU/GPU-accelerated spherical deconvolution of diffusion MRI data. In: Algorithms and Architectures for Parallel Processing—16th International Conference, ICA3PP 2016, Granada, Spain, 14–16 December 2016, Proceedings, pp 630–643. doi:10.1007/978-3-319-49583-5_49

  9. Intel (2015) MKL—Math Kernel Library. https://software.intel.com/en-us/intel-mkl

  10. Maurer J, Wong M (2008) Towards support for attributes in C++ (Revision 6). In: JTC1/SC22/WG21—The C++ Standards Committee. N2761=08-0271

  11. nVidia (2012) cuBLAS library user guide. nVidia, v5.0 edn

  12. Sotomayor R, Sanchez LM, Garcia-Blas J, Calderon A, Fernandez J (2015) AKI: automatic kernel identification and annotation tool based on C++ attributes. In: Proceedings of the IEEE TrustCom-BigDataSE-ISPA, pp 148–156

  13. Sanchez LM, del Rio Astorga D, Dolz MF, Fernández J (2016) CID: a compile-time implementation decider for heterogeneous platforms based on C++ attributes. In: 2016 International IEEE Conference on Scalable Computing and Communications (ScalCom), pp 1149–1156. doi:10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0177

  14. Shen J, Varbanescu A, Sips H (2014) Look before you leap: using the right hardware resources to accelerate applications. In: 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS), 2014 IEEE International Conference on High Performance Computing and Communications, pp 383–391

  15. Su LT (2013) Architecting the future through heterogeneous computing. In: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp 8–11. doi:10.1109/ISSCC.2013.6487618

  16. Tan WJ, Tang WT, Goh R, Turner S, Wong WF (2015) A code generation framework for targeting optimized library calls for multiple platforms. IEEE Trans Parallel Distrib Syst 26(7):1789–1799

    Article  Google Scholar 

  17. Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-gpu platforms using functional performance models. IEEE Trans Comput 64(9):2506–2518. doi:10.1109/TC.2014.2375202

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the EU Project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the Project TIN2016-79637-P “Towards Unification of HPC and Big Data Paradigms” from the Spanish “Ministerio de Economía y Competitividad”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel F. Dolz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

del Rio Astorga, D., Dolz, M.F., Fernandez, J. et al. Hybrid static–dynamic selection of implementation alternatives in heterogeneous environments. J Supercomput 75, 4098–4113 (2019). https://doi.org/10.1007/s11227-017-2147-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2147-y

Keywords

Navigation