Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503273acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A framework for hybrid parallel flow simulations with a trillion cells in complex geometries

Published: 17 November 2013 Publication History

Abstract

waLBerla is a massively parallel software framework for simulating complex flows with the lattice Boltzmann method (LBM). Performance and scalability results are presented for SuperMUC, the world's fastest x86-based supercomputer ranked number 6 on the Top500 list, and JUQUEEN, a Blue Gene/Q system ranked as number 5.
We reach resolutions with more than one trillion cells and perform up to 1.93 trillion cell updates per second using 1.8 million threads. The design and implementation of waLBerla is driven by a careful analysis of the performance on current petascale supercomputers. Our fully distributed data structures and algorithms allow for efficient, massively parallel simulations on these machines. Elaborate node level optimizations and vectorization using SIMD instructions result in highly optimized compute kernels for the single- and two-relaxation-time LBM. Excellent weak and strong scaling is achieved for a complex vascular geometry of the human coronary tree.

References

[1]
C. K. Aidun and J. R. Clausen. Lattice-Boltzmann method for complex flows. Annual Review of Fluid Mechanics, 42:439--472, 2010.
[2]
L. Axner, J. Bernsdorf, T. Zeiser, P. Lammers, J. Linxweiler, and A. Hoekstra. Performance evaluation of a parallel sparse lattice Boltzmann solver. Journal of Computational Physics, 227(10):4895--4911, 2008.
[3]
J. Bærentzen and H. Aanæs. Signed distance computation using the angle weighted pseudonormal. Visualization and Computer Graphics, IEEE Transactions on, 11(3):243--253, 2005.
[4]
M. Bernaschi, M. Bisson, M. Fatica, S. Melchionna, and S. Succi. Petaflop hydrokinetic simulations of complex flows on massive GPU clusters. Computer Physics Communications, 2012.
[5]
P. L. Bhatnagar, E. P. Gross, and M. Krook. A model for collision processes in gases. I. Small amplitude processes in charged and neutral one-component systems. Phys. Rev., 94:511--525, May 1954.
[6]
C. Burstedde, O. Ghattas, G. Stadler, T. Tu, and L. C. Wilcox. Towards adaptive mesh PDE simulations on petascale computers. Proceedings of Teragrid, 8, 2008.
[7]
C. Burstedde, L. C. Wilcox, and O. Ghattas. p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM Journal on Scientific Computing, 33(3):1103--1133, 2011.
[8]
C. Feichtinger, S. Donath, H. Köstler, J. Götz, and U. Rüde. WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2):105--112, 2011.
[9]
C. Feichtinger, J. Habich, H. Köstler, G. Hager, U. Rüde, and G. Wellein. A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. Parallel Computing, 37(9):536--549, 2011.
[10]
J. Fietz, M. J. Krause, C. Schulz, P. Sanders, and V. Heuveline. Optimized hybrid parallel lattice Boltzmann fluid flow simulations on complex geometries. In C. Kaklamanis, T. Papatheodorou, and P. G. Spirakis, editors, Euro-Par 2012 Parallel Processing, volume 7484 of Lecture Notes in Computer Science, pages 818--829. Springer Berlin Heidelberg, 2012.
[11]
M. Gilge. IBM System Blue Gene Solution: Blue Gene/Q Application Development. IBM Redbook Draft SG24-7948-00, 2012.
[12]
I. Ginzburg, F. Verhaeghe, and D. d'Humieres. Study of simple hydrodynamic solutions with the two-relaxation-times lattice Boltzmann scheme. Communications in computational physics, 3(3):519--581, 2008.
[13]
I. Ginzburg, F. Verhaeghe, and D. d'Humieres. Two-relaxation-time lattice Boltzmann scheme: About parametrization, velocity, pressure and mixed boundary conditions. Communications in computational physics, 3(2):427--478, 2008.
[14]
J. Götz. Massively Parallel Direct Numerical Simulation of Particulate Flows. PhD thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, May 2012.
[15]
J. Götz, K. Iglberger, M. Stürmer, and U. Rüde. Direct numerical simulation of particulate flows on 294912 processor cores. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--11. IEEE Computer Society, 2010.
[16]
Green500. www.green500.org, Mar 2013.
[17]
D. Groen, J. Hetherington, H. B. Carver, R. W. Nash, M. O. Bernabeu, and P. V. Coveney. Analyzing and modeling the performance of the HemeLB lattice-Boltzmann simulation environment. arXiv preprint arXiv:1209.3972, 2012.
[18]
J. Habich, C. Feichtinger, H. Köstler, G. Hager, and G. Wellein. Performance Engineering for the Lattice Boltzmann Method on GPGPUs: Architectural Requirements and Performance Results. Computers & Fluids, 2012.
[19]
G. Hager, J. Treibig, J. Habich, and G. Wellein. Exploring performance and power properties of modern multicore chips via simple machine models. CoRR, abs/1208.2908, 2012.
[20]
R. Haring and B. Team. The Blue Gene/Q compute chip. In The 23rd Symposium on High Performance Chips (Hot Chips), 2011.
[21]
K. Iglberger. Software Design of a Massively Parallel Rigid Body Framework. PhD thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, Sep 2010.
[22]
Intel Coorporation. Intel architecture code analyzer, version 2.0.1, http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/, Jan 2013.
[23]
M. W. Jones. 3D distance from a point to a triangle. Technical report, Department of Computer Science, University of Wales, 1995.
[24]
G. Karypis and V. Kumar. A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, 1992.
[25]
B.-K. Koo, A. Erglis, J.-H. Doh, D. V. Daniels, S. Jegere, H.-S. Kim, A. Dunning, T. DeFrance, A. Lansky, J. Leipsic, et al. Diagnosis of ischemia-causing coronary stenoses by noninvasive fractional flow reserve computed from coronary computed tomographic angiograms. Journal of the American College of Cardiology, 58(19):1989--1997, 2011.
[26]
M. Krause, T. Gengenbach, and V. Heuveline. Hybrid parallel simulations of fluid flows in complex geometries: Application to the human lungs. In M. Guarracino, F. Vivien, J. Träff, M. Cannatoro, M. Danelutto, A. Hast, F. Perla, A. Knüpfer, B. Martino, and M. Alexander, editors, Euro-Par 2010 Parallel Processing Workshops, volume 6586 of Lecture Notes in Computer Science, pages 209--216. Springer Berlin Heidelberg, 2011.
[27]
M. Mazzeo and P. Coveney. HemeLB: A high performance parallel lattice-Boltzmann code for large scale fluid flow in complex geometries. Computer Physics Communications, 178(12):894--914, 2008.
[28]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
[29]
B. Payne and A. Toga. Distance field manipulation of surface models. Computer Graphics and Applications, IEEE, 12(1):65--71, 1992.
[30]
A. Peters, S. Melchionna, E. Kaxiras, J. Lätt, J. Sircar, M. Bernaschi, M. Bison, and S. Succi. Multiscale simulation of cardiovascular flows on the IBM Bluegene/P: Full heart-circulation system at red-blood cell resolution. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society.
[31]
N. H. Pijls, B. de Bruyne, K. Peels, P. H. van der Voort, H. J. Bonnier, J. Bartunek, and J. J. Koolen. Measurement of fractional flow reserve to assess the functional severity of coronary-artery stenoses. New England Journal of Medicine, 334(26):1703--1708, 1996. 8637515.
[32]
Y. Qian, D. d'Humieres, and P. Lallemand. Lattice BGK models for navier-stokes equation. EPL (Europhysics Letters), 17(6):479, 1992.
[33]
R. Schöne, D. Hackenberg, and D. Molka. Memory performance at reduced CPU clock speeds: an analysis of current x86_64 processors. In Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems, pages 9--9. USENIX Association, 2012.
[34]
SuperMUC Petascale System. https://www.lrz.de/services/compute/supermuc/systemdescription/, Mar 2013.
[35]
Top500. www.top500.org, Mar 2013.
[36]
J. Treibig and G. Hager. Introducing a performance model for bandwidth-limited loop kernels. In Parallel Processing and Applied Mathematics, pages 615--624. Springer, 2010.
[37]
G. Wellein, T. Zeiser, G. Hager, and S. Donath. On the single processor performance of simple lattice Boltzmann kernels. Computers & Fluids, 35(8):910--919, 2006.
[38]
S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 52(4):65--76, 2009.

Cited By

View all
  • (2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
  • (2024)SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Advanced Heterogeneous SupercomputersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334370635:2(324-337)Online publication date: Feb-2024
  • (2024)Flow and transport in the vadose zone: On the impact of partial saturation and Peclet number on non-Fickian, pre-asymptotic dispersionAdvances in Water Resources10.1016/j.advwatres.2024.104774191(104774)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
  • (2024)SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Advanced Heterogeneous SupercomputersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334370635:2(324-337)Online publication date: Feb-2024
  • (2024)Flow and transport in the vadose zone: On the impact of partial saturation and Peclet number on non-Fickian, pre-asymptotic dispersionAdvances in Water Resources10.1016/j.advwatres.2024.104774191(104774)Online publication date: Sep-2024
  • (2024)On the importance of data encoding in quantum Boltzmann methodsQuantum Information Processing10.1007/s11128-023-04216-623:1Online publication date: 11-Jan-2024
  • (2023)Scalable Flow Simulations with the Lattice Boltzmann MethodProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592176(297-303)Online publication date: 9-May-2023
  • (2023)More Recent Advances in (Hyper)Graph PartitioningACM Computing Surveys10.1145/357180855:12(1-38)Online publication date: 2-Mar-2023
  • (2023)Improving the Performance of Lattice Boltzmann Method with Pipelined Algorithm on A Heterogeneous Multi-zone ProcessorParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_3(28-41)Online publication date: 8-Apr-2023
  • (2022)Performance Evaluation of Lattice Boltzmann Method for Fluid Simulation on A64FX Processor and Supercomputer FugakuInternational Conference on High Performance Computing in Asia-Pacific Region10.1145/3492805.3492811(1-9)Online publication date: 7-Jan-2022
  • (2022)Propagation Pattern for Moment Representation of the Lattice Boltzmann MethodIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309845633:3(642-653)Online publication date: 1-Mar-2022
  • (2022)Exploring Spatial Indexing for Accelerated Feature Retrieval in HPC2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00070(605-614)Online publication date: May-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media