Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3178487.3178535acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Graph partitioning applied to DAG scheduling to reduce NUMA effects

Published: 10 February 2018 Publication History

Abstract

The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices containing the data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are typically applied by the system software.
We propose techniques at the runtime system level to reduce NUMA effects on parallel applications. We leverage runtime system metadata in terms of a task dependency graph. Our approach, based on graph partitioning methods, is able to provide parallel performance improvements of 1.12X on average with respect to the state-of-the-art.

References

[1]
Jairo Balart, Alejandro Duran, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, and Jesús Labarta. 2004. Nanos Mercurium: A Research Compiler for OpenMP. In EWOMP. http://people.ac.upc.edu/aduran/papers/2004/mercurium_ewomp04.pdf
[2]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. SIGARCH Comput. Archit. News 41 (2013), 381--394.
[3]
Matthias Diener, Eduardo H. M. Cruz, Philippe O. A. Navaux, Anselm Busse, and Hans-Ulrich Heiß. 2014. kMAF: Automatic Kernel-Level Management of Thread and Data Affinity. In PACT. 277--288.
[4]
Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, and Nathalie Drach. 2016. Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management. In PACT. 125--137.
[5]
Rabab al-Omairy, Guillermo Miranda, Hatem Ltaief, Rosa M. Badia, Xavier Martorell, Jesús Labarta, and David Keyes. 2015. Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing. Supercomput. Front. Innov. 2 (2015), 49--72.
[6]
François Pellegrini. 1994. Static Mapping by Dual Recursive Bipartitioning of Process Architecture Graphs. In SHPCC.
[7]
Xavier Teruel, Xavier Martorell, Alejandro Duran, Roger Ferrer, and Eduard Ayguadé. 2007. Support for OpenMP Tasks in Nanos V4. In CASCON.
[8]
Mustafa M. Tikir and Jeffrey K. Hollingsworth. 2008. Hardware Monitors for Dynamic Page Migration. J. Parallel Distrib. Comput. 68 (2008), 1186--1200.
[9]
Raul Vidal, Marc Casas, Miquel Moretó, Dimitrios Chasapis, Roger Ferrer, Xavier Martorell, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads. In IWOMP. 60--72.

Cited By

View all
  • (2023)Programming parallel dense matrix factorizations and inversion for new-generation NUMA architecturesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.004175:C(51-65)Online publication date: 1-May-2023
  • (2022)TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00085(1-15)Online publication date: Nov-2022
  • (2020)Thread-Placement Learning2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00050(877-887)Online publication date: Nov-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2018
442 pages
ISBN:9781450349826
DOI:10.1145/3178487
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 1
    PPoPP '18
    January 2018
    426 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3200691
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NUMA
  2. graph partitioning
  3. scheduling
  4. shared memory
  5. task-based programming model

Qualifiers

  • Poster

Funding Sources

  • Ministerio de Economía y Competitividad
  • Ministerio de Educación, Cultura y Deporte
  • European Research Council

Conference

PPoPP '18

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Programming parallel dense matrix factorizations and inversion for new-generation NUMA architecturesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.004175:C(51-65)Online publication date: 1-May-2023
  • (2022)TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00085(1-15)Online publication date: Nov-2022
  • (2020)Thread-Placement Learning2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00050(877-887)Online publication date: Nov-2020
  • (2021)HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00063(388-397)Online publication date: Sep-2021
  • (2020)Black-box inter-application traffic monitoring for adaptive container placementProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3374007(259-266)Online publication date: 30-Mar-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media