Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1366230.1366243acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

A study of the effects of machine geometry and mapping on distributed transpose performance

Published: 05 May 2008 Publication History

Abstract

This paper describes a parallel strategy to extend the scalability of a small 3D FFT on thousands of Blue Gene/L processors. The approach is to execute the intermediate phases of the 3D FFT on smaller processor subsets. Performance measurements of the standalone 3D FFT on two communication protocols, MPI and BG/L ADE are presented. While the performance of the 3D-FFT with MPI-based and BG/L ADE-based implementations exhibited qualitatively similar behavior, the BG/L ADE-based version has lower communication cost than the MPI based version for small message sizes. Measurements also show that the proposed approach is effective in improving Particle-Mesh-based N-body simulation performance significantly at the limits of scalability.

References

[1]
http://www-unix.mcs.anl.gov/mpi/mpich2/.
[2]
N. Adiga, G. Almasi, G. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, C. Cascaval, J. Castaños, W. Chan, L. Ceze, P. Coteus, S. Chatterjee, D. Chen, G. Chiu, T. Cipolla, P. Crumley, K. Desai, A. Deutsch, T. Domany, M. Dombrowa, W. Donath, M. Eleftheriou, C. Erway, J. Esch, B. Fitch, J. Gagliano, A. Gara, R. Garg, R. Germain, M. Giampapa, B. Gopalsamy, J. Gunnels, M. Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, R. Jackson, T. Jamal-Eddine, G. Kopcsay, E. Krevat, M. Kurhekar, A. Lanzetta, D. Lieber, L. Liu, M. Lu, M. Mendell, A. Misra, Y. Moatti, L. Mok, J. Moreira, B. Nathanson, M. Newton, M. Ohmacht, A. Oliner, V. Pandit, R. Pudota, R. Rand, R. Regan, B. Rubin, A. Ruehli, S. Rus, R. Sahoo, A. Sanomiya, E. Schenfeld, M. Sharma, E. Shmueli, S. Singh, P. Song, V. Srinivasan, B. Steinmacher-Burow, K. Strauss, C. Surovic, R. Swetz, T. Takken, R. Tremaine, M. Tsao, A. Umamaheshwaran, P. Verma, P. Vranas, T. Ward, M. Wazlowski, W. Barrett, C. Engel, B. Drehmel, B. Hilgart, D. Hill, F. Kasemkhani, D. Krolak, C. Li, T. Liebsch, J. Marcella, A. Muff, A. Okomo, M. Rouse, A. Schram, M. Tubbs, G. Ulsh, C. Wait, J. Wittrup, M. Bae, K. Dockser, L. Kissel, M. Seager, J. Vetter, and K. Yates. An overview of the Blue Gene supercomputer. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--22, November 2002.
[3]
N. Adiga, M. A. Blumrich, D. Chen, P. Coteus, A. Gara, M. E. Giampapa, P. Heidelberger, S. Singh, B. D. Steinmacher-Burow, T. Takken, M. Tsao, and P. Vranas. Blue Gene L torus interconnection network. IBM Journal of Research and Development, 49(2/3):265--276, 2005.
[4]
G. Almasi, C. Archer, J. Castanos, J. Gunnels, C. Erway, P. Heidelberger, X. Martorell, J. Moreira, K. Pinnow, J. Ratterman, B. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing services for the Blue Gene L supercomputer. IBM Journal of Research and Development, 49(2/3):393--406, 2005.
[5]
G. Almaši, C. Archer, C. C. Eway, P. Heidelberger, X. Martorell, J. E. Moreira, B. D. Steinmacher-Burow, and Y. Zheng. Optimization of MPI collective operations on Blue Gene\l systems. ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 253 -- 262, New York, NY, USA, 2005. ACM Press.
[6]
N. Anupindi, M. An, J. Cooley, and Q. Yang. A new and efficient fft algorithm for distributed memory systems. In Parallel and Distributed Systems, 1994. International Conference on, pages 107--112, 1994.
[7]
C. E. Cramer and J. A. Board. The development and integration of a distributed 3D FFT for a cluster of workstations. In 4th Annual Linux Showcase and Conference, pages 121--128, Atlanta, GA, October 2000.
[8]
T. Darden, D. York, and L. Pedersen. Particle mesh ewald: An n log(n) method for ewald sums in large systems. The Journal of Chemical Physics, 98(12):10089--10092, 1993.
[9]
H. Q. Ding, R. D. Ferraro, and D. B. Gennery. A portable 3D FFT package for distributed-memory parallel architecture. In SIAM Conference on Parallel Processing for Scientific Computing, 1995.
[10]
A. Edelman, P. McCorquodale, and S. Toledo. The future fast Fourier transform? In SIAM J. Sci. Comput., volume 20, pages 1094--1114, 1999.
[11]
M. Eleftheriou, B. Fitch, A. Rayshubskiy, T. Ward, and R. Germain. Performance measurements of the 3d FFT on the Blue Gene/L supercomputer. In J. Cunha and P. Medeiros, editors, Euro-Par 2005 Parallel Processing: 11th International Euro-Par Conference, Lisbon, Portugal, August 30-September 2, 2005, volume 3648 of Lecture Notes in Computer Science, pages 795--803. Springer-Verlag, 2005.
[12]
M. Eleftheriou, B. Fitch, A. Rayshubskiy, T. Ward, and R. Germain. Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements. IBM Journal of Research and Development, 49(2/3):457--464, 2005.
[13]
B. Fitch, R. Germain, M. Mendell, J. Pitera, M. Pitman, A. Rayshubskiy, Y. Sham, F. Suits, W. Swope, T. Ward, Y. Zhestkov, and R. Zhou. Blue Matter, an application framework for molecular simulation on Blue Gene. Journal of Parallel and Distributed Computing, 63:759--773, 2003.
[14]
B. G. Fitch, A. Rayshubskiy, M. Eleftheriou, T. C. Ward, M. Giampapa, M. C. Pitman, and R. S. Germain. Blue matter: Approaching the limits of concurrency for molecular dynamics. Research Report RC23956, IBM Research Division, April 2006. To appear in the Proceedings of the 2006 ACM/IEEE conference on Supercomputing http://sc06.supercomputing.org/schedule/pdf/pap246.pdf.
[15]
B. G. Fitch, A. Rayshubskiy, M. Eleftheriou, T. C. Ward, M. Giampapa, M. C. Pitman, and R. S. Germain. Molecular dynamics-blue matter: approaching the limits of concurrency for classical molecular dynamics. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 87, New York, NY, USA, November 2006. ACM Press.
[16]
M. Frigo and S. G. Johnson. The fastest Fourier transform in the west. Technical Report MIT-LCS-TR-728, Laboratory for Computing Sciences,MIT, Cambridge, MA, 1997.
[17]
M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 3, pages 1381--1384, 1998.
[18]
A. Gara et al. Overview of the Blue Gene/ system architecture. IBM Journal of Research and Development, 49(2/3):195--212, 2005.
[19]
M. Giampapa, R. Bellofatto, M. A. Blumrich, D. Chen, M. B. Dombrowa, A. Gara, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, B. J. Nathanson, B. D. Steinmacher-Burow, M. Ohmacht, V. Salapura, and P. Vranas. Blue Gene/L advanced diagnostics environment. IBM Journal of Research and Development, 49(2/3):319--332, 2005.
[20]
P. D. Haynes and M. Cote. Parallel fast Fourier transforms for electronic structure calculations. Comp. Phys. Comm., 130:121, 2000.
[21]
R. Hockney and J. Eastwood. Computer Simulation Using Particles. Institute of Physics Publishing, 1988.
[22]
IBM. ESSL for AIX V4.2/ESSL for Linux on POWER V4.2.2 Guide and Reference, 4 edition, November 2005. SA22-7904-03.
[23]
L. Kale, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, and K. Schulten. Namd2: Greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151(1):283--312, May 1999.
[24]
R. Kettimuthu and S. Muthukrishnan. A performance study of parallel FFT in clos and mesh networks. In H. R. Arabnia, editor, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2005, Las Vegas, Nevada, USA, June 27-30, 2005, Volume 3, pages 1056--1062. CSREA Press, June 2005.
[25]
S. Kumar and P. Heidelberger. Performance analysis of all-to-all communication on the blue gene/l supercomouter. Research Report RC24327, IBM Research Division, August 2007.
[26]
R. A. Lippert, K. J. Bowers, R. O. Dror, M. P. Eastwood, B. A. Gregersen, J. L. Klepeis, I. Kolossvary, and D. E. Shaw. A common, avoidable source of error in molecular dynamics integrators. Journal of Chemical Physics, 126:046101, 2007.
[27]
J. Lorenz, S. Kral, F. Franchetti, and C. Ueberhuber. Vectorization techniques for the Blue Gene/L double FPU. IBM Journal of Research and Development, 49(2/3):437--446, 2005.\vfill\eject
[28]
A. Bonelli, F. Franchetti, J. Lorenz, M. Pueschel, and C. W. Ueberhuber. Performance optimization of the discrete fourier transform on distributed memory computers. In Proceedings of ISPA 06. Lecture Notes in Computer Science, volume 4330, 2006.
[29]
P. Minary, J. Morrone, D. Yarne, M. Tuckerman, and G. Martyna. Long range interactions on wires: A reciprocal space based formalism. Journal of Chemical Physics, 121(23):11949, 2004.
[30]
D. A. Pearlman, D. A. Case, J. W. Caldwell, W. S. Ross, T. E. C. III, D. DeBolt, D. Ferguson, G. Seibel, and P. Kollman. Comp. Phys. Commun., 91:1--41, 1995.
[31]
M. C. Pitman, F. Suits, K. Gawrisch, and S. E. Feller. Molecular dynamics investigation of dynamical properties of phosphatidylethanolamine lipid bilayers. Journal of Chemical Physics, 122(24):244715, 2005.
[32]
M. Z. Ramesh C. Agarwal, Fred G. Gustavson. A high performance parallel algorithm for 1-d fft. 1994.
[33]
W. Swope, H. Andersen, P. Berens, and K. Wilson. A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: Application to small water clusters. Journal of Chemical Physics, 76:637--649, 1982.
[34]
R. Vadali, Y. SHI, S. Kumar, L. Kale, M. Tuckerman, and G. Martyna. Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers. Journal of Computational Chemistry, 25:2006, 2004.
[35]
H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for Blue Gene/L supercomputer. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, pages 52--52, November 2006.
[36]
E. Zapata, F. Rivera, I. Benavides, J. Carazo, and R. Peskin. Multidimensional fast fourier transform into simd hypercubes. Computers and Digital Techniques, IEE Proceedings-, 137(4):253--260, 1990.

Cited By

View all
  • (2019)Reproducibility in Benchmarking Parallel Fast Fourier Transform based ApplicationsCompanion of the 2019 ACM/SPEC International Conference on Performance Engineering10.1145/3302541.3313105(5-8)Online publication date: 27-Mar-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '08: Proceedings of the 5th conference on Computing frontiers
May 2008
334 pages
ISBN:9781605580777
DOI:10.1145/1366230
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. blue gene
  2. fft

Qualifiers

  • Research-article

Conference

CF '08
Sponsor:
CF '08: Computing Frontiers Conference
May 5 - 7, 2008
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Reproducibility in Benchmarking Parallel Fast Fourier Transform based ApplicationsCompanion of the 2019 ACM/SPEC International Conference on Performance Engineering10.1145/3302541.3313105(5-8)Online publication date: 27-Mar-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media