Nothing Special   »   [go: up one dir, main page]

Skip to main content

Global Task Data-Dependencies in PGAS Applications

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Included in the following conference series:

Abstract

Recent years have seen the emergence of two independent programming models challenging the traditional two-tier combination of message passing and thread-level work-sharing: partitioned global address space (PGAS) and task-based concurrency. In the PGAS programming model, synchronization and communication between processes are decoupled, providing significant potential for reducing communication overhead. At the same time, task-based programming allows to exploit a large degree of shared-memory concurrency. The inherent lack of fine-grained synchronization in PGAS can be addressed through fine-grained task synchronization across process boundaries. In this work, we propose the use of task data dependencies describing the data-flow in the global address space to synchronize the execution of tasks created in parallel on multiple processes. We present a description of the global data dependencies, describe the necessary interactions between the distributed scheduler instances required to handle them, and discuss our implementation in the context of the DASH PGAS framework. We evaluate our approach using the Blocked Cholesky Factorization and the LULESH proxy app, demonstrating the feasibility and scalability of our approach.

We gratefully acknowledge funding by the German Research Foundation (DFG) through the German Priority Programme 1648 Software for Exascale Computing (SPPEXA) in the SmartDASH project and would like to thank all members of the DASH team. We would like to thank the members of the Innovative Computing Lab at the University of Tennessee, Knoxville for their support on PaRSEC.

Joseph Schuchart is a doctoral student at the University of Stuttgart and the main author, claiming exclusive authorship of the design and implementation of the API and distributed scheduler as well as leading authorship of the global task dependency design, the evaluation scenario selection, and interpretation of experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.P.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2018). https://doi.org/10.1109/TPDS.2017.2766064

    Article  Google Scholar 

  2. Amarasinghe, S., et al.: Exascale software study: software challenges in extreme scale systems. Technical report, DARPA IPTO, Air Force Research Labs (2009)

    Google Scholar 

  3. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11, November 2012. https://doi.org/10.1109/SC.2012.71

  4. Belli, R., Hoefler, T.: Notified access: extending remote memory access programming models for producer-consumer synchronization. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)

    Google Scholar 

  5. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemariner, P., Dongarra, J.: Dague: a generic distributed DAG engine for high performance computing, pp. 1151–1158. IEEE, Anchorage (2011)

    Google Scholar 

  6. Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)

    Article  Google Scholar 

  7. Chapman, B.M., Eachempati, D., Chandrasekaran, S.: OpenMP. In: Balaji, P. (ed.) Programming Models for Parallel Computing, pp. 281–322. MIT Press, Cambridge (2015)

    Google Scholar 

  8. Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM Sigplan Notices (2005)

    Article  Google Scholar 

  9. Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. (2011). https://doi.org/10.1142/S0129626411000151

    Article  MathSciNet  Google Scholar 

  10. Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46

    Chapter  Google Scholar 

  11. Gómez-Iglesias, A., Pekurovsky, D., Hamidouche, K., Zhang, J., Vienne, J.: Porting scientific libraries to PGAS in XSEDE resources: practice and experience. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, XSEDE 2015. ACM (2015)

    Google Scholar 

  12. Grossman, M., Kumar, V., Budimlic, Z., Sarkar, V.: Integrating asynchronous task parallelism with OpenSHMEM (2016). https://www.cs.rice.edu/~zoran/Publications_files/asyncshmem2016.pdf

  13. Hoque, R., Herault, T., Bosilca, G., Dongarra, J.: Dynamic task discovery in parsec: a data-flow task-based runtime. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2017. ACM (2017). https://doi.org/10.1145/3148226.3148233

  14. Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: PGAS 2014. ACM (2014). http://doi.acm.org/10.1145/2676870.2676883

  15. Kalé, L., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of OOPSLA 1993 (1993)

    Google Scholar 

  16. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973 (2013)

    Google Scholar 

  17. Kumar, V., Zheng, Y., Cavé, V., Budimlić, Z., Sarkar, V.: HabaneroUPC++: a compiler-free PGAS library. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. ACM (2014). https://doi.org/10.1145/2676870.2676879

  18. Long, B.: Additional parallel features in fortran. SIGPLAN Fortran Forum, 16–23, July 2016. https://doi.org/10.1145/2980025.2980027

    Article  Google Scholar 

  19. Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010. ACM (2010). http://doi.acm.org/10.1145/1810085.1810091

  20. OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.5 (2015). http://www.openmp.org/mp-documents/openmp-4.5.pdf

  21. Reinders, J.: Intel threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. O’Reilly & Associates, Sebastopol (2007)

    Google Scholar 

  22. Robison, A.D.: Composable parallel patterns with Intel Cilk Plus. Comput. Sci. Eng. (2013). https://doi.org/10.1109/MCSE.2013.21

    Article  Google Scholar 

  23. Saraswat, V., et al.: The Asynchronous Partitioned Global Address Space Model (2017)

    Google Scholar 

  24. Schuchart, J., Kowalewski, R., Fuerlinger, K.: Recent experiences in Using MPI-3 RMA in the DASH PGAS runtime. In: Proceedings of Workshops of HPC Asia, HPC Asia 2018. ACM (2018). https://doi.org/10.1145/3176364.3176367

  25. Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_11

    Chapter  Google Scholar 

  26. Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1

    Chapter  Google Scholar 

  27. Shudler, S., Calotoiu, A., Hoefler, T., Wolf, F.: Isoefficiency in practice: configuring and understanding the performance of task-based applications. SIGPLAN Not., January 2017. https://doi.org/10.1145/3155284.3018770

    Article  Google Scholar 

  28. Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015. https://doi.org/10.1145/2807591.2807629

  29. Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. (2012). https://doi.org/10.1002/cpe.2831

    Article  Google Scholar 

  30. Tillenius, M.: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. (2015). http://epubs.siam.org/doi/10.1137/140989716

  31. Tsugane, K., Lee, J., Murai, H., Sato, M.: Multi-tasking execution in PGAS language XcalableMP and communication optimization on many-core clusters. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. ACM (2018). https://doi.org/10.1145/3149457.3154482

  32. YarKhan, A.: Dynamic task execution on shared and distributed memory architectures. Ph.D. thesis (2012)

    Google Scholar 

  33. Yelick, K., et al.: Productivity and performance using partitioned global address space languages. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO 2007. ACM (2007)

    Google Scholar 

  34. Zhou, H., Idrees, K., Gracia, J.: Leveraging MPI-3 Shared-memory extensions for efficient PGAS runtime systems. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 373–384. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_29

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Schuchart .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schuchart, J., Gracia, J. (2019). Global Task Data-Dependencies in PGAS Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20656-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20655-0

  • Online ISBN: 978-3-030-20656-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics