Article

Using Dynamic Parallelism for Fine-Grained, Irregular Workloads: A Case Study of the N-Queens Problem

Authors:

Max Plauth,

Frank Feinbube,

Frank Schlegel,

Andreas PolzeAuthors Info & Claims

CANDAR '15: Proceedings of the 2015 Third International Symposium on Computing and Networking (CANDAR)

Pages 404 - 407

https://doi.org/10.1109/CANDAR.2015.26

Published: 08 December 2015 Publication History

Abstract

GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called dynamic parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing dynamic parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the n-queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of dynamic parallelism.

Cited By

View all

Jarząbek źCzarnul P(2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11227-017-2091-x

Recommendations

Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Currently, we are facing a situation where applications exhibit increasing computational demands and where a large variety of parallel processor systems are available. In this paper we focus on exploiting fine-grain parallelism for three applications ...
Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs

Dynamic Programming (DP) is an important and popular method for solving a wide variety of discrete optimization problems such as scheduling, string-editing, packaging, and inventory management. DP breaks problems into simpler subproblems and combines ...
Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs
COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CANDAR '15: Proceedings of the 2015 Third International Symposium on Computing and Networking (CANDAR)

December 2015

623 pages

ISBN:9781467397971

Publisher

IEEE Computer Society

United States

Publication History

Published: 08 December 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jarząbek źCzarnul P(2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11227-017-2091-x

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Recommendations

Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations