Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

Sadaf Alam²⁰ &
Jeffrey Vetter²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3514))

Included in the following conference series:

International Conference on Computational Science

1343 Accesses

Abstract

Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hardware resources and its high memory bandwidth. A Cray X1 node is composed of four MSPs, which in turn are composed of four single streaming processors (SSP). Each SSP contains a superscalar processing unit and two vector processing units. Compiler vectorization provides loop level parallelization and uses the vector processing hardware. Multistreaming code generation by the compiler permits execution across the SSPs of an MSP on a block of code. In this paper, we analyze overall impact of loop-level compiler optimization on a scientific application called Parallel Ocean Program (POP). POP has been extensively optimized for X1 by instrumenting the code using X1 compiler directives. We compare and contrast automatic and manual optimization schemes available on X1 and analyze their impact on the code performance and scalability. Our results show that the addition of compiler directives increases the average vector length, thereby improving the single node performance significantly. However, this code scales at a slower rate as the local workload volume decreases and the communication costs increase.

Download to read the full chapter text

Chapter PDF

Portable SIMD Performance with OpenMP* 4.x Compiler Directives

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Using Arm’s scalable vector extension on stencil codes

Article 08 April 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agarwal, P.K., et al.: ORNL Cray X1 evaluation status report. In: Proceedings of the 46th Cray User Group Conference (2004)
Google Scholar
Aho, A.V., Hill, M., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., USA (1986)
Google Scholar
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architecture: A Dependence Based Approach, 1st edn. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Dunigan, T.H., et al.: Early evaluation of the Cray X1. In: Proceedings of the 17th Annual International Conference on Supercomputing (2003)
Google Scholar
Dunigan, T.H., et al.: Performance Evaluation of the Cray X1 Distributed Memory Architecture. IEEE Micro 25(1) (2005)
Google Scholar
Optimizing Applications on the Cray X1 System: Loopmark Listings. Available at http://www.cray.com
Optimizing Applications on the Cray X1 System: Using CrayPAT Tools. Available at http://www.cray.com
Shan, H., Strohmaier, E.: Performance Characteristics of the Cray X1 and Their Implications for Application Performance Tuning. In: Proceedings of the 18th Annual International Conference on Supercomputing (2004)
Google Scholar
van der Steen, A.J., Dongarra, J.J.: Overview of Recent Supercomputers (2004)
Google Scholar
Numrich, R.W., Reid, J.K.: Co-Array Fortran for Parallel Programming. ACM SIGPLAN Fortran Forum 17(2) (1998)
Google Scholar
Cray Fortran Compiler Commands and Directives Reference Manual. Available at http://www.cray.com
Cray X1 System Overview. Available at http://www.cray.com
The Parallel Ocean Program Homepage, http://climate.lanl.gov/Models/POP
Worley, P., Levesque, J.: The Performance Evolution of the Parallel Ocean Program on the Cray X1. In: Proceedings of the 46th Cray User Group Conference (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Mathematics Division, Oak Ridge National Laboratory,
Sadaf Alam & Jeffrey Vetter

Authors

Sadaf Alam
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, Emory University, Atlanta, Georgia, USA
Vaidy S. Sunderam
Department of Mathematics and Computer Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Geert Dick van Albada
Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Peter M. A. Sloot
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack J. Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, S., Vetter, J. (2005). Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds) Computational Science – ICCS 2005. ICCS 2005. Lecture Notes in Computer Science, vol 3514. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428831_38

Download citation

DOI: https://doi.org/10.1007/11428831_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26032-5
Online ISBN: 978-3-540-32111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

Abstract

Chapter PDF

Similar content being viewed by others

Portable SIMD Performance with OpenMP* 4.x Compiler Directives

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Using Arm’s scalable vector extension on stencil codes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

Abstract

Chapter PDF

Similar content being viewed by others

Portable SIMD Performance with OpenMP* 4.x Compiler Directives

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Using Arm’s scalable vector extension on stencil codes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation