Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1413370.1413379acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Adapting a message-driven parallel application to GPU-accelerated clusters

Published: 15 November 2008 Publication History

Abstract

Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster.

References

[1]
J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten, "Accelerating molecular modeling applications with graphics processors," J. Comp. Chem., vol. 28, pp. 2618--2640, 2007.
[2]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, "GPU computing," Proceedings of the IEEE, vol. 96, pp. 879--899, 2008.
[3]
J. A. Anderson, C. D. Lorenz, and A. Travesset, "General purpose molecular dynamics simulations fully implemented on graphics processing units," J. Chem. Phys., vol. 227, no. 10, pp. 5342--5359, 2008.
[4]
I. Ufimtsev and T. Martinez, "Quantum chemistry on graphical processing units. 1. strategies for two-electron integral evaluation," Journal of Chemical Theory and Computation, vol. 4, no. 2, pp. 222--231, 2008.
[5]
C. I. Rodrigues, D. J. Hardy, J. E. Stone, K. Schulten, and W.-M. W. Hwu, "GPU acceleration of cutoff pair potentials for molecular modeling applications," in CF '08: Proceedings of the 2008 conference on Computing frontiers. New York, NY, USA: ACM, 2008, pp. 273--282.
[6]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream Computing on Graphics Hardware," in SIGGRAPH '04: ACM SIGGRAPH 2004 Papers. New York, NY, USA: ACM Press, 2004, pp. 777--786.
[7]
M. McCool, S. Du Toit, T. Popa, B. Chan, and K. Moule, "Shader algebra," ACM Transactions on Graphics, vol. 23, no. 3, pp. 787--795, Aug. 2004.
[8]
M. Charalambous, P. Trancoso, and A. Stamatakis, "Initial experiences porting a bioinformatics application to a graphics processor." in Panhellenic Conference on Informatics, 2005, pp. 415--425.
[9]
D. R. Horn, M. Houston, and P. Hanrahan, "ClawHMMER: A streaming HMMer-search implementation," in SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2005, p. 11.
[10]
E. Elsen, V. Vishal, M. Houston, V. Pande, P. Hanrahan, and E. Darve, "N-body simulations on GPUs," Stanford University, Stanford, CA, Tech. Rep., Jun. 2007, http://arxiv.org/abs/0706.3060.
[11]
"NVIDIA CUDA Compute Unified Device Architecture Programming Guide," NVIDIA, NVIDIA, Santa Clara, CA, USA, 2007.
[12]
M. McCool, "Data-parallel programming on the Cell BE and the GPU using the RapidMind development platform," in GSPx Multicore Applications Conference, Oct./Nov. 2006.
[13]
Advanced Micro Devices Inc., "Brook+ SC07 BOF session," in Supercomputing 2007 Conference, Nov. 2007.
[14]
J. Stratton, S. Stone, and W. mei Hwu, "MCUDA: An efficient implementation of CUDA kemels on multi-cores," University of Illinois at Urbana-Champaign, Tech. Rep. IMPACT-08-01, March 2008. {Online}. Available: http://www.gigascale.org/pubs/1278.html
[15]
J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA," ACM Queue, vol. 6, no. 2, pp. 40--53, 2008.
[16]
Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-Stover, "GPU cluster for high performance computing," in SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2004, p. 47.
[17]
D. Göddeke, R. Strzodka, J. Mohd-Yusof, P. McCormick, S. H. M. Buijssen, M. Grajewski, and S. Turek, "Exploring weak scalability for FEM calculations on a GPU-enhanced cluster," Parallel Comput., vol. 33, no. 10--11, pp. 685--699, 2007.
[18]
J. N. Glosli, K. J. Caspersen, J. A. Gunnels, D. F. Richards, R. E. Rudd, and F. H. Streitz, "Extending stability beyond CPU millennium: A micron-scale atomistic simulation of Kelvin-Helmholtz instability," in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, 2007.
[19]
J. W. Sheaffer, D. P. Luebke, and K. Skadron, "A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors," in GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on graphics hardware. Aire-la-Ville, Switzerland, Switzerland: Eurographics Association, 2007, pp. 55--64.
[20]
Q. Gao, W. Yu, W. Huang, and D. K. Panda, "Application-transparent checkpoint/restart for MPI programs over Infiniband," International Conference on Parallel Processing, 2006., pp. 471--478, Aug. 2006.
[21]
L. Kalé, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, and K. Schulten, "NAMD2: Greater scalability for parallel molecular dynamics," J. Comp. Phys., vol. 151, pp. 283--312, 1999.
[22]
J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten, "Scalable molecular dynamics with NAMD," J. Comp. Chem., vol. 26, pp. 1781--1802, 2005.
[23]
M. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L. Kalé, R. Skeel, K. Schulten, and R. Kufrin, "MDScope - A visual computing environment for structural biology," Comput. Phys. Commun., vol. 91, no. 1--3, pp. 111--134, 1995.
[24]
M. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L. Kalé, R. D. Skeel, and K. Schulten, "NAMD - A parallel, object-oriented molecular dynamics program," Int. J. Supercomp. Appl. High Perform. Comp., vol. 10, pp. 251--268, 1996.
[25]
J. Phillips, G. Zheng, S. Kumar, and L. Kale, "NAMD: Biomolecular simulation on thousands of processors." in Proceedings of the IEEE/ACM SC2002 Conference, Technical Paper 277. IEEE Press, 2002.
[26]
P. L. Freddolino, F. Liu, M. Gruebele, and K. Schulten, "Tenmicrosecond MD simulation of a fast-folding WW domain," Biophys. J., vol. 94, pp. L75-L77, 2008.
[27]
P. L. Freddolino, A. S. Arkhipov, S. B. Larson, A. McPherson, and K. Schulten, "Molecular dynamics simulations of the complete satellite tobacco mosaic virus," Structure, vol. 14, pp. 437--449, 2006.
[28]
L. V. Kale and S. Krishnan, "Charm++: Parallel Programming with Message-Driven Objects," in Parallel Programming using C++, G. V. Wilson and P. Lu, Eds. MIT Press, 1996, pp. 175--213.
[29]
L. V. Kale, E. Bohm, C. L. Mendes, T. Wilmarth, and G. Zheng, "Programming Petascale Applications with Charm++ and AMPI," in Petascale Computing: Algorithms and Applications, D. Bader, Ed. Chapman & Hall / CRC Press, 2008, pp. 421--441.
[30]
A. Gursoy and L. Kalé, "Performance and modularity benefits of messagedriven execution," Journal of Parallel and Distributed Computing, vol. 64, pp. 461--480, 2004.
[31]
S. Pakin, V. Karamcheti, and A. A. Chien, "Fast messages: Efficient, portable communication for workstation clusters and mpps," IEEE Parallel Distrib. Technol., vol. 5, no. 2, pp. 60--73, 1997.
[32]
T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, "Active messages: a mechanism for integrated communication and computation," SIGARCH Comput. Archit. News, vol. 20, no. 2, pp. 256--266, 1992.
[33]
L. V. Kalé, "The virtualization model of parallel programming: Runtime optimizations and the state of art," in LACSI 2002, Albuquerque, October 2002.
[34]
W. Huang, G. Santhanaraman, H. W. Jin, Q. Gao, and D. K. Panda, "Design of high performance MVAPICH2: MPI2 over InfiniBand," in Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID '06). Los Alamitos, CA, USA: IEEE Computer Society, 2006, pp. 43--48.
[35]
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar, "Scaling applications to massively parallel machines using projections performance analysis tool," in Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, vol. 22, no. 3, February 2006, pp. 347--358.

Cited By

View all
  • (2021)AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamicsInternational Journal of High Performance Computing Applications10.1177/1094342021100645235:5(432-451)Online publication date: 1-Sep-2021
  • (2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019
  • (2018)Scalable molecular dynamics with NAMD on the Summit systemIBM Journal of Research and Development10.1147/JRD.2018.288898662:6(4:1-4:9)Online publication date: 1-Nov-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing
November 2008
739 pages
ISBN:9781424428359

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '08
Sponsor:

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamicsInternational Journal of High Performance Computing Applications10.1177/1094342021100645235:5(432-451)Online publication date: 1-Sep-2021
  • (2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019
  • (2018)Scalable molecular dynamics with NAMD on the Summit systemIBM Journal of Research and Development10.1147/JRD.2018.288898662:6(4:1-4:9)Online publication date: 1-Nov-2018
  • (2018)What You Should Know About NAMD and Charm++ But Were Hoping to IgnoreProceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity10.1145/3219104.3219134(1-6)Online publication date: 22-Jul-2018
  • (2016)dCUDAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014974(1-12)Online publication date: 13-Nov-2016
  • (2015)Scheduling independent tasks on multi-cores with GPU acceleratorsConcurrency and Computation: Practice & Experience10.1002/cpe.335927:6(1625-1638)Online publication date: 25-Apr-2015
  • (2014)A CPUProceedings of Workshop on General Purpose Processing Using GPUs10.1145/2588768.2576787(64-71)Online publication date: 1-Mar-2014
  • (2014)Mapping to irregular torus topologies and other techniques for petascale biomolecular simulationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.12(81-91)Online publication date: 16-Nov-2014
  • (2014)Petascale Tcl with NAMD, VMD, and Swift/TProceedings of the 1st First Workshop for High Performance Technical Computing in Dynamic Languages10.1109/HPTCDL.2014.7(6-17)Online publication date: 16-Nov-2014
  • (2013)G-CharmProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465444(349-358)Online publication date: 10-Jun-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media