Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework

Paweł Czarnul⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 431))

469 Accesses
3 Citations

Abstract

The paper proposes an approach for efficient and flexible parallelization of divide-and-conquer computations using the modern Intel Xeon Phi accelerators. Many real-life problems follow the divide-and-conquer paradigm and consequently generate either balanced or imbalanced trees. The paper proposes an OpenMP multi-threaded implementation of a general framework that requires coding basic divide-and-conquer constructs such as data partitioning, computations and result integration. Mapping computations onto threads is handled by the underlying runtime layer. The paper presents performance results for a parallel adaptive quadrature integration resulting in an irregular and imbalanced tree. It is shown that speed-ups obtained reach around 90 for parallelization of an irregular adaptive integration code compared to maximum speed-ups of 98 for code without thread management at various levels of the divide-and-conquer tree. Results for various thread affinities are shown. The framework, for which the source code is presented, can be easily reused for any other divide-and-conquer application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel multilevel recursive approximate inverse techniques for solving general sparse linear systems

Article 05 May 2016

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Article 14 February 2017

OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration

Notes

1.
http://top500.org.

References

Czarnul, P., Rosciszewski, P.: Optimization of execution time under power consumption constraints in a heterogeneous parallel system with gpus and cpus. In: Distributed Computing and Networking, pp. 66–80. Springer Berlin (2014), Volume 8314 of LNCS
Google Scholar
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, New South Wales (2013)
Google Scholar
Rugina, R., Rinard, M.: Automatic parallelization of divide and conquer algorithms. In: Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 72–83. PPoPP ’99, New York, NY, USA, ACM (1999)
Google Scholar
Freisleben, B., Kielmann, T.: Automatic parallelization of divide-and-conquer algorithms. In: Parallel Processing: CONPAR 92—VAPP V, pp. 849–850. Springer, Berlin (1992), Volume 634 of Lecture Notes in Computer Science
Google Scholar
Czarnul, P.: Programming, tuning and automatic parallelization of irregular divide-and-conquer applications in Dampvm/DAC. Int. J. High Perform. Comput. Appl. 17, 77–93 (2003)
Google Scholar
Eriksson, M.V., Keßler, C.W., Chalabine, M.: Load balancing of irregular parallel divideand-conquer algorithms in group-spmd programming environments. In: ARCS Workshops, pp. 313–322. GI (2006), Volume 81 of LNI
Google Scholar
Intel: Intel cilk plus language specification (2010) ver. 0.9. http://www.cilkplus.org/sites/default/files/open_specifications/cilk_plus_language_specification_0_9.pdf
Michaela, M., Byckling, M., Ilieva, N., Saarinen, S., Schliephake, M., Weinberg, V.: Best practice guide intel xeon phi v1.1, PRACE, 7 Capacities (2014)
Google Scholar
Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. CoRR abs/1302.1078 (2013)
Google Scholar
Ramachandran, A., Vienne, J., der Wijngaart, R.A., Koesterke, L., Sharapov, I.: Performance evaluation of nas parallel benchmarks on intel xeon phi. In: IEEE ICPP, pp. 736–743 (2013)
Google Scholar
Lima, J.V., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xkaapi on Intel xeon phi coprocessor. In: Symposium on Computer Architecture and High Performance Computing
Google Scholar
Eisenlor, J., Hudak, D., Tomko, K., Prince, T.: Dense linear algebra factorization in OpenMP and Cilk Plus on Intel MIC: development experiences and performance analysis. In: TACCIntel Highly Parallel Computing Symposium (2012)
Google Scholar
Wu, Q., Yang, C., Tang, T., Xiao, L.: Mic acceleration of short-range molecular dynamics simulations. In: Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores. COSMIC ’13, New York, NY, USA, ACM (2013) 2:1–2:8
Google Scholar
Cramer, T., Schmidl, D., Klemm, M., Mey, D.: Openmp programming on intel xeon phi coprocessors: An early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium at RWTH Aachen University, pp. 38–44 (2012)
Google Scholar
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Euro-Par 2013 Parallel Processing, pp. 547–559. Springer, Berlin (2013), Volume 8097 of LNCS
Google Scholar
Reinders, J.: An overview of programming for intel xeon processors and intel xeon phi coprocessors, Intel. https://software.intel.com/sites/default/files/article/330164/an-overview-of-programming-for-intel-xeon-processors-and-intel-xeon-phi-coprocessors_1.pdf (2012)
Green, R.W.: Openmp* thread affinity control. compiler methodology for intelR mic architecture. https://software.intel.com/en-us/articles/openmp-thread-affinity-control (2012)

Download references

Acknowledgments

The work was supported by Intel within grant “Intel Phi Parallel Processing Lab”. The author wishes to thank Intel, especially Marek Zmuda, Marek Piosik and Robert Bard from Intel for provision of Xeon Phi equipment, literature and needed support.

Author information

Authors and Affiliations

Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdańsk, Poland
Paweł Czarnul

Authors

Paweł Czarnul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paweł Czarnul .

Editor information

Editors and Affiliations

Faculty of Computer Science and Manageme, Wrocław University of Technology, Wrocła, Wrocław, Poland
Jerzy Świątek
Faculty of Computer Science and Manageme, Wrocław University of Technology, Wrocła, Wroclaw, Poland
Leszek Borzemski
Faculty of Computer Science and Manageme, Wrocław University of Technology, Wrocła, Wrocław, Poland
Adam Grzech
Faculty of Computer Science and Manageme, Wrocław University of Technology, Wrocła, Wrocław, Poland
Zofia Wilimowska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Czarnul, P. (2016). Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework. In: Świątek, J., Borzemski, L., Grzech, A., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 36th International Conference on Information Systems Architecture and Technology – ISAT 2015 – Part III. Advances in Intelligent Systems and Computing, vol 431. Springer, Cham. https://doi.org/10.1007/978-3-319-28564-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-28564-1_9
Published: 24 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28562-7
Online ISBN: 978-3-319-28564-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel multilevel recursive approximate inverse techniques for solving general sparse linear systems

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel multilevel recursive approximate inverse techniques for solving general sparse linear systems

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation