Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2388996.2389098acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Hierarchical task mapping of cell-based AMR cosmology simulations

Published: 10 November 2012 Publication History

Abstract

Cosmology simulations are highly communication-intensive, thus it is critical to exploit topology-aware task mapping techniques for performance optimization. To exploit the architectural properties of multiprocessor clusters (the performance gap between inter-node and intra-node communication as well as the gap between inter-socket and intra-socket communication), we design and develop a hierarchical task mapping scheme for cell-based AMR (Adaptive Mesh Refinement) cosmology simulations, in particular, the ART application. Our scheme consists of two parts: (1) an inter-node mapping to map application processes onto nodes with the objective of minimizing network traffic among nodes and (2) an intra-node mapping within each node to minimize the maximum size of messages transmitted between CPU sockets. Experiments on production supercomputers with 3D torus and fat-tree topologies show that our scheme can significantly reduce application communication cost by up to 50%. More importantly, our scheme is generic and can be extended to many other applications.

References

[1]
T. Plewa, T. Linde, and V. G. Weirs, Adaptive Mesh Refinement--Theory and Applications. Berlin: Springer, 2005.
[2]
Enzo. {Online}. Available: http://lca.ucsd.edu/portal/software/enzo
[3]
A. V. Kravtsov, A. A. Klypin, and A. M. Khokhlov, "Adaptive refinement tree: A new high-resolution N-body code for cosmological simulations," The Astrophysical Journal Supplement Series, vol. 111, pp. 73--94, Jul. 1997.
[4]
T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale, "Topology-aware task mapping for reducing communication contention on large parallel machines," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2006.
[5]
A. Bhatele and L. V. Kale, "Application-specific topology-aware mapping for three dimensional topologies," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2008, pp. 1--8.
[6]
A. Bhatele, "Automating topology aware mapping for supercomputers," Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, Aug. 2010.
[7]
Q. Meng, J. Luitjens, and M. Berzins, "Dynamic task scheduling for the Uintah framework," in Proc. IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS), 2010, pp. 1--10.
[8]
J. Luitjens and M. Berzins, "Improving the performance of Uintah: A large-scale adaptive meshing computational framework," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2010, pp. 1--10.
[9]
A. V. Kravtsov, "High-resolution simulations of structure formation in the universe," Ph.D. dissertation, New Mexico State University, Las Cruces, Dec. 1999.
[10]
J. Wu, R. E. Gonzalez, Z. Lan, N. Y. Gnedin, A. V. Kravtsov, D. H. Rudd, and Y. Yu, "Performance emulation of cell-based AMR cosmology simulations," in Proc. IEEE International Conference on Cluster Computing (CLUSTER), 2011, pp. 8--16.
[11]
Y. Yu, D. H. Rudd, Z. Lan, N. Y. Gnedin, A. V. Kravtsov, and J. Wu, "Improving parallel IO performance of cell-based AMR cosmology applications," in Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2012, pp. 933--944.
[12]
M. S. Warren and J. K. Salmon, "A parallel hashed oct-tree N-body algorithm," in Proc. ACM/IEEE conference on Supercomputing, 1993, pp. 12--21.
[13]
A. M. Khokhlov, "Fully threaded tree algorithms for adaptive refinement fluid dynamics simulations," J. Comput. Phys., vol. 143, pp. 519--543, Jul. 1998.
[14]
A. Patra and J. T. Oden, "Problem decomposition for adaptive hp finite element methods," Computing Systems in Engineering, vol. 6, no. 2, pp. 97--109, 1995.
[15]
A. R. Butz, "Alternative algorithm for Hilbert's space-filling curve," IEEE Transactions on Computers, vol. C-20, no. 4, pp. 424--426, Apr. 1971.
[16]
Flash. {Online}. Available: http://flash.uchicago.edu/site/
[17]
J. Steensland, S. Chandra, and M. Parashar, "An application-centric characterization of domain-based SFC partitioners for parallel SAMR," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1275--1289, Dec. 2002.
[18]
C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. Wilcox, "Extreme-scale AMR," in Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010, pp. 1--12.
[19]
I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, "Scalable communication-aware task mapping algorithms for interconnected multicore systems," in Proc. IEEE International Conference on High Performance Computing and Communications (HPCC), 2011, pp. 759--764.
[20]
T. Hoefler and M. Snir, "Generic topology mapping strategies for large-scale parallel architectures," in Proc. the international conference on Supercomputing (ICS), 2011, pp. 75--84.
[21]
F. Ercal, J. Ramanujam, and P. Sadayappan, "Task allocation onto a hypercube by recursive mincut bipartitioning," in Proc. the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1, ser. C3P, 1988, pp. 210--221.
[22]
F. Pellegrini, "Static mapping by dual recursive bipartitioning of process architecture graphs," in Proc. the Scalable High-Performance Computing Conference, 1994, pp. 486--493.
[23]
F. pellegrini and J. Roman, "Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs," in High-Performance Computing and Networking, ser. Lecture Notes in Computer Science, vol. 1067, 1996, pp. 493--498.
[24]
G. Karypis and V. Kumar, "Multilevel k-way partitioning scheme for irregular graphs," J. Parallel Distrib. Comput., vol. 48, no. 1, pp. 96--129, Jan. 1998.
[25]
hMETIS. {Online}. Available: http://glaros.dtc.umn.edu/gkhome/metis/hmetis/overview
[26]
I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, "Hierarchical mapping for HPC applications," in Proc. IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011, pp. 1815--1823.
[27]
The NICS Kraken Website. {Online}. Available: https://www.xsede.org/web/guest/nics-kraken
[28]
The TACC Ranger Website. {Online}. Available: http://www.tacc.utexas.edu/user-services/user-guides/ranger-user-guide
[29]
Intel MPI Benchmarks. {Online}. Available: http://software.intel.com/en-us/articles/intel-mpi-benchmarks/
[30]
MPI: A message-passing interface standard. version 2.2. {Online}. Available: http://www.mpi-forum.org/

Cited By

View all
  • (2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017
  • (2016)Communication and cooling aware job allocation in data centers for communication-intensive workloadsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.01696:C(181-193)Online publication date: 1-Oct-2016
  • (2015)Hierarchical task mapping for parallel applications on supercomputersThe Journal of Supercomputing10.1007/s11227-014-1324-571:5(1776-1802)Online publication date: 1-May-2015
  • Show More Cited By
  1. Hierarchical task mapping of cell-based AMR cosmology simulations

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
    November 2012
    1161 pages
    ISBN:9781467308045

    Sponsors

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 10 November 2012

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '12
    Sponsor:

    Acceptance Rates

    SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017
    • (2016)Communication and cooling aware job allocation in data centers for communication-intensive workloadsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.01696:C(181-193)Online publication date: 1-Oct-2016
    • (2015)Hierarchical task mapping for parallel applications on supercomputersThe Journal of Supercomputing10.1007/s11227-014-1324-571:5(1776-1802)Online publication date: 1-May-2015
    • (2013)2HOTProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503220(1-12)Online publication date: 17-Nov-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media