Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software

Published: 26 August 2011 Publication History

Abstract

We present three related out-of-core parallel mesh generation algorithms and their implementations for small size computational clusters. Computing out-of-core permits to solve larger problems than otherwise possible on the same hardware setup. Also, when using shared computing resources with high demand, a problem can take longer to compute in terms of wall-clock time when using an in-core algorithm on many nodes instead of using an out-of-core algorithm on few nodes. The difference is due to wait-in-queue delays that can grow exponentially to the number of requested nodes. In one specific case, using our best method and only 16 nodes it can take several times less wall-clock time to generate a 2 billion element mesh than to generate the same size mesh in-core with 121 nodes.
Although our best out-of-core method exhibits unavoidable overheads (could be as low as 19% in some cases) over the corresponding in-core method (for mesh sizes that fit completely in-core), this is a modest and expected performance penalty. We evaluated our methods on traditional clusters of workstations as well as presented preliminary performance evaluation on emerging BlueWaters supercomputer.

References

[1]
Aggarwal, A. and Vitter, Jeffrey, S. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9, 1116--1127.
[2]
Barker, K., Chernikov, A., Chrisochoides, N., and Pingali, K. 2004. A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parall. Distrib. Syst. 15, 2, 183--192.
[3]
Blelloch, G. E., Hardwick, J., Miller, G. L., and Talmor, D. 1999. Design and implementation of a practical parallel Delaunay algorithm. Algorithmica 24, 243--269.
[4]
Chernikov, A. and Chrisochoides, N. 2006a. Generalized delaunay mesh refinement: From scalar to parallel. In Proceedings of the 15th International Meshing Roundtable. Springer, Berlin, 563--580.
[5]
Chernikov, A. and Chrisochoides, N. 2006b. Parallel guaranteed quality delaunay uniform mesh refinement. SIAM J. Sci. Comput. 28, 1907--1926.
[6]
Chernikov, A. and Chrisochoides, N. 2008a. Algorithm 872: Parallel 2d constrained delaunay mesh generation. ACM Trans. Math. Softw. 34, 6--25.
[7]
Chernikov, A. and Chrisochoides, N. 2008b. Three-dimensional delaunay refinement for multi-core processors. In Proceedings of the 22nd ACM International Conference on Supercomputing. ACM, New York, 214--224.
[8]
Chernikov, A. N. and Chrisochoides, N. P. 2004. Practical and efficient point insertion scheduling method for parallel guaranteed quality Delaunay refinement. In Proceedings of the 18th International Conference on Supercomputing. ACM, New York, 48--57.
[9]
Chernikov, A. N. and Chrisochoides, N. P. 2009. Generalized two-dimensional Delaunay mesh refinement. SIAM J. Sci. Comput. 31, 5, 3387--3403.
[10]
Choi, J., Dongarra, J., Pozo, R., and Walker, D. 1992. Scalapack: A scalable linear algebra for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computation. IEEE, Los Alamitos, CA, 120--127.
[11]
Chrisochoides, N. 2005. Parallel mesh generation. In Numerical Solution of Partial Differential Equations on Parallel Computers 51, Eds. M. Bruaset, P. Bjorstad, A. Tveito.
[12]
D'Azevedo, E. F. and Dongarra, J. 2000. The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines. Concurrency Pract. Exp. 12, 15, 1481--1493.
[13]
Dehne, F., Dittrich, W., and Hutchinson, D. 1997. Efficient external memory algorithms by simulating coarse-grained parallel algorithms. In Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, 106--115.
[14]
Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling, S., and Sorensen, D. 1987. Prospectus for the development of a linear algebra library for high-performance computers. Tech. rep. ANL/MCS-TM-97. Argonne, IL.
[15]
Dong, S., Lucor, D., and Karniadakis, G. E. 2004. Flow past a stationary and moving cylinder: DNS at Re = 10,000. In Proceedings of the Users Group Conference (DOD_UGC'04). IEEE, Los Alamitos, CA, 88--95.
[16]
Dongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R. J. 1988. An extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14, 1, 1--17.
[17]
Foteinos, P. A., Chernikov, A. N., and Chrisochoides, N. P. 2010. Fully generalized two-dimensional constrained Delaunay mesh refinement. SIAM J. Sci. Comput. 32, 5, 2659--2686.
[18]
Karniadakis, G. and Orszag, S. 1993. Nodes, modes, and flow codes. Physics Today 46, 34--42.
[19]
Kot, A., Chernikov, A., and Chrisochoides, N. 2005. Out-of-core parallel delaunay mesh generation for shared memory machines. In Proceedings of the 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation.
[20]
Kot, A., Chernikov, A., and Chrisochoides, N. 2006. Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos, CA, 104--114.
[21]
Linardakis, L. and Chrisochoides, N. 2008a. Algorithm 870: A static geometric medial axis domain decomposition in 2d euclidean space. ACM Trans. Math. Softw. 34, 4:1--4:28.
[22]
Linardakis, L. and Chrisochoides, N. 2008b. Graded delaunay decoupling method for parallel guaranteed quality planar mesh generation. SIAM J. Sci. Comput. 30, 1875--1891.
[23]
Nave, D., Chrisochoides, N., and Chew, L. P. 2002. Guaranteed: quality parallel Delaunay refinement for restricted polyhedral domains. In Proceedings of the 18th Annual Symposium on Computational Geometry (SCG'02). ACM, New York, 135--144.
[24]
Nodine, M. H. and Vitter, J. S. 1995. Greed sort: optimal deterministic sorting on parallel disks. J. ACM 42, 4, 919--933.
[25]
Ruppert, J. 1995. A Delaunay refinement algorithm for quality 2-dimensional mesh generation. J. Algorithms 18, 3, 548--585.
[26]
Salmon, J. and Warren, M. 1997. Parallel out-of-core methods for N-body simulation. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA.
[27]
Shewchuk, J. R. 1996. Triangle: Engineering a 2D quality mesh generator and delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, M. C. Lin and D. Manocha, Eds., Lecture Notes in Computer Science Series, vol. 1148. Springer-Verlag, Berlin, 203--222.
[28]
Shewchuk, J. R. 1997. Delaunay refinement mesh generation. Ph.D. thesis, Carnegie Mellon University.
[29]
Shewchuk, J. R. 2002. Delaunay refinement algorithms for triangular mesh generation. Comput. Geom. Theory Appl. 22, 1--3, 21--74.
[30]
Toledo, S. and Gustavson, F. 1996. The design and implementation of solar, a portable library for scalable out-of-core linear algebra computations. In Proceedings of the 4th Annual Workshop on I/O in Parallel and Distributed Systems. ACM, New York, 28--40.
[31]
Tu, T. and O'Hallaron, D. R. 2004. A computational database system for generating unstructured hexahedral meshes with billions of elements. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'04). IEEE, Los Alamitos, CA, 25--39.
[32]
Tu, T. and O'Hallaron, D. R. 2005. Extracting hexahedral mesh structures from balanced linear octrees. In Proceedings of the 13th International Meshing Roundtable. Springer, Berlin, 191--200.
[33]
Vitter, J. S. and Nodine, M. H. 1993. Large-scale sorting in uniform memory hierarchies. J. Parall. Distrib. Comput. 17, 107--114.
[34]
Vitter, J. S. and Shriver, E. A. M. 1993. Algorithms for parallel memory ii: Hierarchical multilevel memories. Algorithmica 12, 148--169.
[35]
Vitter, J. S., Shriver, E. A. M., and Z, E. A. M. S. 1994. Algorithms for parallel memory i: Two-level memories. Algorithmica 12, 110--147.
[36]
Walters, R. A. 2005. Coastal Ocean Models: Two useful finite element methods. Cont. Shelf Res. 25, 775--793.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 16, Issue
2011
411 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/1963190
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2011
Published in JEA Volume 16

Author Tags

  1. Delaunay
  2. Effective computing
  3. off-the-shelf
  4. wall-clock time

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 216
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media