Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2388996.2389098acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Hierarchical task mapping of cell-based AMR cosmology simulations

Published: 10 November 2012 Publication History


Cosmology simulations are highly communication-intensive, thus it is critical to exploit topology-aware task mapping techniques for performance optimization. To exploit the architectural properties of multiprocessor clusters (the performance gap between inter-node and intra-node communication as well as the gap between inter-socket and intra-socket communication), we design and develop a hierarchical task mapping scheme for cell-based AMR (Adaptive Mesh Refinement) cosmology simulations, in particular, the ART application. Our scheme consists of two parts: (1) an inter-node mapping to map application processes onto nodes with the objective of minimizing network traffic among nodes and (2) an intra-node mapping within each node to minimize the maximum size of messages transmitted between CPU sockets. Experiments on production supercomputers with 3D torus and fat-tree topologies show that our scheme can significantly reduce application communication cost by up to 50%. More importantly, our scheme is generic and can be extended to many other applications.


T. Plewa, T. Linde, and V. G. Weirs, Adaptive Mesh Refinement--Theory and Applications. Berlin: Springer, 2005.
Enzo. {Online}. Available:
A. V. Kravtsov, A. A. Klypin, and A. M. Khokhlov, "Adaptive refinement tree: A new high-resolution N-body code for cosmological simulations," The Astrophysical Journal Supplement Series, vol. 111, pp. 73--94, Jul. 1997.
T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale, "Topology-aware task mapping for reducing communication contention on large parallel machines," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2006.
A. Bhatele and L. V. Kale, "Application-specific topology-aware mapping for three dimensional topologies," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2008, pp. 1--8.
A. Bhatele, "Automating topology aware mapping for supercomputers," Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, Aug. 2010.
Q. Meng, J. Luitjens, and M. Berzins, "Dynamic task scheduling for the Uintah framework," in Proc. IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS), 2010, pp. 1--10.
J. Luitjens and M. Berzins, "Improving the performance of Uintah: A large-scale adaptive meshing computational framework," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2010, pp. 1--10.
A. V. Kravtsov, "High-resolution simulations of structure formation in the universe," Ph.D. dissertation, New Mexico State University, Las Cruces, Dec. 1999.
J. Wu, R. E. Gonzalez, Z. Lan, N. Y. Gnedin, A. V. Kravtsov, D. H. Rudd, and Y. Yu, "Performance emulation of cell-based AMR cosmology simulations," in Proc. IEEE International Conference on Cluster Computing (CLUSTER), 2011, pp. 8--16.
Y. Yu, D. H. Rudd, Z. Lan, N. Y. Gnedin, A. V. Kravtsov, and J. Wu, "Improving parallel IO performance of cell-based AMR cosmology applications," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2012, pp. 933--944.
M. S. Warren and J. K. Salmon, "A parallel hashed oct-tree N-body algorithm," in Proc. ACM/IEEE conference on Supercomputing, 1993, pp. 12--21.
A. M. Khokhlov, "Fully threaded tree algorithms for adaptive refinement fluid dynamics simulations," J. Comput. Phys., vol. 143, pp. 519--543, Jul. 1998.
A. Patra and J. T. Oden, "Problem decomposition for adaptive hp finite element methods," Computing Systems in Engineering, vol. 6, no. 2, pp. 97--109, 1995.
A. R. Butz, "Alternative algorithm for Hilbert's space-filling curve," IEEE Transactions on Computers, vol. C-20, no. 4, pp. 424--426, Apr. 1971.
Flash. {Online}. Available:
J. Steensland, S. Chandra, and M. Parashar, "An application-centric characterization of domain-based SFC partitioners for parallel SAMR," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1275--1289, Dec. 2002.
C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. Wilcox, "Extreme-scale AMR," in Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010, pp. 1--12.
I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, "Scalable communication-aware task mapping algorithms for interconnected multicore systems," in Proc. IEEE International Conference on High Performance Computing and Communications (HPCC), 2011, pp. 759--764.
T. Hoefler and M. Snir, "Generic topology mapping strategies for large-scale parallel architectures," in Proc. the international conference on Supercomputing (ICS), 2011, pp. 75--84.
F. Ercal, J. Ramanujam, and P. Sadayappan, "Task allocation onto a hypercube by recursive mincut bipartitioning," in Proc. the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1, ser. C3P, 1988, pp. 210--221.
F. Pellegrini, "Static mapping by dual recursive bipartitioning of process architecture graphs," in Proc. the Scalable High-Performance Computing Conference, 1994, pp. 486--493.
F. pellegrini and J. Roman, "Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs," in High-Performance Computing and Networking, ser. Lecture Notes in Computer Science, vol. 1067, 1996, pp. 493--498.
G. Karypis and V. Kumar, "Multilevel k-way partitioning scheme for irregular graphs," J. Parallel Distrib. Comput., vol. 48, no. 1, pp. 96--129, Jan. 1998.
hMETIS. {Online}. Available:
I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, "Hierarchical mapping for HPC applications," in Proc. IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011, pp. 1815--1823.
The NICS Kraken Website. {Online}. Available:
The TACC Ranger Website. {Online}. Available:
Intel MPI Benchmarks. {Online}. Available:
MPI: A message-passing interface standard. version 2.2. {Online}. Available:

Cited By

View all
  • (2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017
  • (2016)Communication and cooling aware job allocation in data centers for communication-intensive workloadsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.01696:C(181-193)Online publication date: 1-Oct-2016
  • (2015)Hierarchical task mapping for parallel applications on supercomputersThe Journal of Supercomputing10.1007/s11227-014-1324-571:5(1776-1802)Online publication date: 1-May-2015
  • Show More Cited By
  1. Hierarchical task mapping of cell-based AMR cosmology simulations



    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors


    Published In

    cover image ACM Conferences
    SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
    November 2012
    1161 pages



    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 10 November 2012

    Check for updates


    • Research-article


    SC '12

    Acceptance Rates

    SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics


    Cited By

    View all
    • (2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017
    • (2016)Communication and cooling aware job allocation in data centers for communication-intensive workloadsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.01696:C(181-193)Online publication date: 1-Oct-2016
    • (2015)Hierarchical task mapping for parallel applications on supercomputersThe Journal of Supercomputing10.1007/s11227-014-1324-571:5(1776-1802)Online publication date: 1-May-2015
    • (2013)2HOTProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503220(1-12)Online publication date: 17-Nov-2013

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media