Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3217189.3217191acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging

Published: 12 June 2018 Publication History

Abstract

Optimizing communication is essential for high-performance computing because synchronization bottlenecks inhibit the overall performance and scalability of parallel applications. Today's cutting-edge computing hardware, as well as networking interfaces like Cray Aries/Gemini, features extremely low latency and high bandwidth remote memory access (RMA) operations for optimized data movement. However for any efficient data movement to occur between two logical processing units, software substrates must be able to properly exploit hardware resources for the underlying fabric. Overheads due to coarse granular synchronization and stalls during irregular access of remote memory regions may hint at two adverse effects of resource under-utilization in time and space. We introduce a uGNI-based distributed remote memory allocator called "rmalloc" which expands RDMA-enabled memory utilization, and a communication substrate called "rpipe" that tries to mitigate synchronization bottlenecks. Our UNIX-inspired RMA programming model is simple to use and equally applicable to both higher-level applications as well as lower-level runtime systems for enabling efficient data movement. Our micro-benchmark results suggest that "rmalloc" default next-fit allocator outperforms MPI-3.0 RMA by 1.5X and up to 6X in most cases, while other variants of "rmalloc" (i.e. best-fit, worst-fit) reduce external fragmentation and perform comparably or better than the default "rmalloc" allocator for irregular RMA.

References

[1]
Bob Alverson, Edwin Froese, Larry Kaplan, and Duncan Roweth. 2012. Cray XC series network. Cray Inc., White Paper WP-Aries01-1112 (2012).
[2]
Christian Bell, Dan Bonachea, Rajesh Nishtala, and Katherine Yelick. 2006. Optimizing bandwidth limited problems using one-sided communication and overlap. In IPDPS 2006. IEEE, 10--pp.
[3]
Roberto Belli and Torsten Hoefler. 2015. Notified access: Extending remote memory access programming models for producer-consumer synchronization. In IPDPS, 2015 IEEE International. IEEE, 871--881.
[4]
Jeff Bonwick et al. 1994. The Slab Allocator: An Object-Caching Kernel Memory Allocator. In USENIX summer, Vol. 16. Boston, MA, USA.
[5]
UPC Consortium et al. 2005. UPC language specifications v1. 2. Technical Report. Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US).
[6]
Jack Dongarra et al. 2013. Mpi: A message-passing interface standard version 3.0. High Performance Computing Center Stuttgart (HLRS) (2013).
[7]
Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91.
[8]
Daniel Grünewald and Christian Simmendinger. 2013. The GASPI API specification and its implementation GPI 2.0. In 7th International Conference on PGAS Programming Models, Vol. 243.
[9]
Sean Hefty. 2012. Rsockets. In 2012 OpenFabris International Workshop, Monterey, CA, USA.
[10]
Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E Grant, and Ron Brightwell. 2017. sPIN: High-performance streaming Processing in the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 59.
[11]
Khaled Z Ibrahim, Paul H Hargrove, Costin Iancu, and Katherine Yelick. 2014. An evaluation of one-sided and two-sided communication paradigms on relaxed-ordering interconnect. In IPDPS, 2014 IEEE 28th International. IEEE, 1115--1125.
[12]
Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K Panda, William Gropp, and Rajeev Thakur. 2004. High performance MPI-2 one-sided communication over InfiniBand. In CCGrid 2004. IEEE International Symposium on. IEEE, 531--538.
[13]
E. Kissel and M. Swany. 2016. Photon: Remote Memory Access Middleware for High-Performance Runtime Systems. In IPDPSW 2016. 1736--1743.
[14]
Patrick MacArthur and Robert D Russell. 2014. An efficient method for stream semantics over RDMA. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 841--851.
[15]
Simon Pickartz, Pablo Reble, Carsten Clauss, and Stefan Lankes. 2014. SWIFT: A Transparent and Flexible Communication Layer for PCIe-Coupled Accelerators and (Co-) Processors. In Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 371--380.
[16]
Thomas Sterling, Matthew Anderson, P. Kevin Bohan, Maciej Brodowicz, Abhishek Kulkarni, and Bo Zhang. 2014. Towards Exascale Co-design in a Runtime System. In EASC 2014. Stockholm, Sweden.
[17]
Yanhua Sun, Gengbin Zheng, Laximant V Kale, Terry R Jones, and Ryan Olson. 2012. A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect. In IPDPS 2012 IEEE 26th International. IEEE, 751--762.
[18]
Abhinav Vishnu, Prachi Gupta, Amith R Mamidala, and Dhabaleswar K Panda. 2006. A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing. ACM, 85.
[19]
Abhinav Vishnu, Gopal Santhanaraman, Wei Huang, Hyun-Wook Jin, and Dhabaleswar K Panda. 2005. Supporting MPI-2 one sided communication on multi-rail InfiniBand clusters: Design challenges and performance benefits. In SC. Springer, 137--147.
[20]
U Wickrmasinghe and A Lumsdaine. 2018. Enabling Efficient Inter-node Message Passing and Remote Memory Access via a uGNI based Light-weight Network Substrate for Cray Interconnects. (2018). (in press).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ROSS'18: Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers
June 2018
44 pages
ISBN:9781450358644
DOI:10.1145/3217189
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPDC '18
Sponsor:

Acceptance Rates

ROSS'18 Paper Acceptance Rate 5 of 7 submissions, 71%;
Overall Acceptance Rate 58 of 169 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 81
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media