Abstract
The paper proposes an approach for efficient and flexible parallelization of divide-and-conquer computations using the modern Intel Xeon Phi accelerators. Many real-life problems follow the divide-and-conquer paradigm and consequently generate either balanced or imbalanced trees. The paper proposes an OpenMP multi-threaded implementation of a general framework that requires coding basic divide-and-conquer constructs such as data partitioning, computations and result integration. Mapping computations onto threads is handled by the underlying runtime layer. The paper presents performance results for a parallel adaptive quadrature integration resulting in an irregular and imbalanced tree. It is shown that speed-ups obtained reach around 90 for parallelization of an irregular adaptive integration code compared to maximum speed-ups of 98 for code without thread management at various levels of the divide-and-conquer tree. Results for various thread affinities are shown. The framework, for which the source code is presented, can be easily reused for any other divide-and-conquer application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Czarnul, P., Rosciszewski, P.: Optimization of execution time under power consumption constraints in a heterogeneous parallel system with gpus and cpus. In: Distributed Computing and Networking, pp. 66–80. Springer Berlin (2014), Volume 8314 of LNCS
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, New South Wales (2013)
Rugina, R., Rinard, M.: Automatic parallelization of divide and conquer algorithms. In: Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 72–83. PPoPP ’99, New York, NY, USA, ACM (1999)
Freisleben, B., Kielmann, T.: Automatic parallelization of divide-and-conquer algorithms. In: Parallel Processing: CONPAR 92—VAPP V, pp. 849–850. Springer, Berlin (1992), Volume 634 of Lecture Notes in Computer Science
Czarnul, P.: Programming, tuning and automatic parallelization of irregular divide-and-conquer applications in Dampvm/DAC. Int. J. High Perform. Comput. Appl. 17, 77–93 (2003)
Eriksson, M.V., Keßler, C.W., Chalabine, M.: Load balancing of irregular parallel divideand-conquer algorithms in group-spmd programming environments. In: ARCS Workshops, pp. 313–322. GI (2006), Volume 81 of LNI
Intel: Intel cilk plus language specification (2010) ver. 0.9. http://www.cilkplus.org/sites/default/files/open_specifications/cilk_plus_language_specification_0_9.pdf
Michaela, M., Byckling, M., Ilieva, N., Saarinen, S., Schliephake, M., Weinberg, V.: Best practice guide intel xeon phi v1.1, PRACE, 7 Capacities (2014)
Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. CoRR abs/1302.1078 (2013)
Ramachandran, A., Vienne, J., der Wijngaart, R.A., Koesterke, L., Sharapov, I.: Performance evaluation of nas parallel benchmarks on intel xeon phi. In: IEEE ICPP, pp. 736–743 (2013)
Lima, J.V., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xkaapi on Intel xeon phi coprocessor. In: Symposium on Computer Architecture and High Performance Computing
Eisenlor, J., Hudak, D., Tomko, K., Prince, T.: Dense linear algebra factorization in OpenMP and Cilk Plus on Intel MIC: development experiences and performance analysis. In: TACCIntel Highly Parallel Computing Symposium (2012)
Wu, Q., Yang, C., Tang, T., Xiao, L.: Mic acceleration of short-range molecular dynamics simulations. In: Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores. COSMIC ’13, New York, NY, USA, ACM (2013) 2:1–2:8
Cramer, T., Schmidl, D., Klemm, M., Mey, D.: Openmp programming on intel xeon phi coprocessors: An early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium at RWTH Aachen University, pp. 38–44 (2012)
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Euro-Par 2013 Parallel Processing, pp. 547–559. Springer, Berlin (2013), Volume 8097 of LNCS
Reinders, J.: An overview of programming for intel xeon processors and intel xeon phi coprocessors, Intel. https://software.intel.com/sites/default/files/article/330164/an-overview-of-programming-for-intel-xeon-processors-and-intel-xeon-phi-coprocessors_1.pdf (2012)
Green, R.W.: Openmp* thread affinity control. compiler methodology for intelR mic architecture. https://software.intel.com/en-us/articles/openmp-thread-affinity-control (2012)
Acknowledgments
The work was supported by Intel within grant “Intel Phi Parallel Processing Lab”. The author wishes to thank Intel, especially Marek Zmuda, Marek Piosik and Robert Bard from Intel for provision of Xeon Phi equipment, literature and needed support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Czarnul, P. (2016). Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework. In: Świątek, J., Borzemski, L., Grzech, A., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 36th International Conference on Information Systems Architecture and Technology – ISAT 2015 – Part III. Advances in Intelligent Systems and Computing, vol 431. Springer, Cham. https://doi.org/10.1007/978-3-319-28564-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-28564-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28562-7
Online ISBN: 978-3-319-28564-1
eBook Packages: EngineeringEngineering (R0)