Abstract
Recent years have seen the emergence of two independent programming models challenging the traditional two-tier combination of message passing and thread-level work-sharing: partitioned global address space (PGAS) and task-based concurrency. In the PGAS programming model, synchronization and communication between processes are decoupled, providing significant potential for reducing communication overhead. At the same time, task-based programming allows to exploit a large degree of shared-memory concurrency. The inherent lack of fine-grained synchronization in PGAS can be addressed through fine-grained task synchronization across process boundaries. In this work, we propose the use of task data dependencies describing the data-flow in the global address space to synchronize the execution of tasks created in parallel on multiple processes. We present a description of the global data dependencies, describe the necessary interactions between the distributed scheduler instances required to handle them, and discuss our implementation in the context of the DASH PGAS framework. We evaluate our approach using the Blocked Cholesky Factorization and the LULESH proxy app, demonstrating the feasibility and scalability of our approach.
We gratefully acknowledge funding by the German Research Foundation (DFG) through the German Priority Programme 1648 Software for Exascale Computing (SPPEXA) in the SmartDASH project and would like to thank all members of the DASH team. We would like to thank the members of the Innovative Computing Lab at the University of Tennessee, Knoxville for their support on PaRSEC.
Joseph Schuchart is a doctoral student at the University of Stuttgart and the main author, claiming exclusive authorship of the design and implementation of the API and distributed scheduler as well as leading authorship of the global task dependency design, the evaluation scenario selection, and interpretation of experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.P.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2018). https://doi.org/10.1109/TPDS.2017.2766064
Amarasinghe, S., et al.: Exascale software study: software challenges in extreme scale systems. Technical report, DARPA IPTO, Air Force Research Labs (2009)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11, November 2012. https://doi.org/10.1109/SC.2012.71
Belli, R., Hoefler, T.: Notified access: extending remote memory access programming models for producer-consumer synchronization. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemariner, P., Dongarra, J.: Dague: a generic distributed DAG engine for high performance computing, pp. 1151–1158. IEEE, Anchorage (2011)
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)
Chapman, B.M., Eachempati, D., Chandrasekaran, S.: OpenMP. In: Balaji, P. (ed.) Programming Models for Parallel Computing, pp. 281–322. MIT Press, Cambridge (2015)
Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM Sigplan Notices (2005)
Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. (2011). https://doi.org/10.1142/S0129626411000151
Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46
Gómez-Iglesias, A., Pekurovsky, D., Hamidouche, K., Zhang, J., Vienne, J.: Porting scientific libraries to PGAS in XSEDE resources: practice and experience. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, XSEDE 2015. ACM (2015)
Grossman, M., Kumar, V., Budimlic, Z., Sarkar, V.: Integrating asynchronous task parallelism with OpenSHMEM (2016). https://www.cs.rice.edu/~zoran/Publications_files/asyncshmem2016.pdf
Hoque, R., Herault, T., Bosilca, G., Dongarra, J.: Dynamic task discovery in parsec: a data-flow task-based runtime. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2017. ACM (2017). https://doi.org/10.1145/3148226.3148233
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: PGAS 2014. ACM (2014). http://doi.acm.org/10.1145/2676870.2676883
Kalé, L., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of OOPSLA 1993 (1993)
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973 (2013)
Kumar, V., Zheng, Y., Cavé, V., Budimlić, Z., Sarkar, V.: HabaneroUPC++: a compiler-free PGAS library. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. ACM (2014). https://doi.org/10.1145/2676870.2676879
Long, B.: Additional parallel features in fortran. SIGPLAN Fortran Forum, 16–23, July 2016. https://doi.org/10.1145/2980025.2980027
Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010. ACM (2010). http://doi.acm.org/10.1145/1810085.1810091
OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.5 (2015). http://www.openmp.org/mp-documents/openmp-4.5.pdf
Reinders, J.: Intel threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. O’Reilly & Associates, Sebastopol (2007)
Robison, A.D.: Composable parallel patterns with Intel Cilk Plus. Comput. Sci. Eng. (2013). https://doi.org/10.1109/MCSE.2013.21
Saraswat, V., et al.: The Asynchronous Partitioned Global Address Space Model (2017)
Schuchart, J., Kowalewski, R., Fuerlinger, K.: Recent experiences in Using MPI-3 RMA in the DASH PGAS runtime. In: Proceedings of Workshops of HPC Asia, HPC Asia 2018. ACM (2018). https://doi.org/10.1145/3176364.3176367
Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_11
Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1
Shudler, S., Calotoiu, A., Hoefler, T., Wolf, F.: Isoefficiency in practice: configuring and understanding the performance of task-based applications. SIGPLAN Not., January 2017. https://doi.org/10.1145/3155284.3018770
Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015. https://doi.org/10.1145/2807591.2807629
Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. (2012). https://doi.org/10.1002/cpe.2831
Tillenius, M.: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. (2015). http://epubs.siam.org/doi/10.1137/140989716
Tsugane, K., Lee, J., Murai, H., Sato, M.: Multi-tasking execution in PGAS language XcalableMP and communication optimization on many-core clusters. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. ACM (2018). https://doi.org/10.1145/3149457.3154482
YarKhan, A.: Dynamic task execution on shared and distributed memory architectures. Ph.D. thesis (2012)
Yelick, K., et al.: Productivity and performance using partitioned global address space languages. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO 2007. ACM (2007)
Zhou, H., Idrees, K., Gracia, J.: Leveraging MPI-3 Shared-memory extensions for efficient PGAS runtime systems. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 373–384. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_29
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Schuchart, J., Gracia, J. (2019). Global Task Data-Dependencies in PGAS Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-20656-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)