Abstract
Today, the challenge is to exploit the parallelism available in the way of multi-core architectures by the software. This could be done by re-writing the application, by exploiting the hardware capabilities or expect the compiler/software runtime tools to do the job for us. With the advent of multi-core architectures ([1] [2]), this problem is becoming more and more relevant. Even today, there are not many run-time tools to analyze the behavioral pattern of such performance critical applications, and to re-compile them. So, techniques like OpenMP for shared memory programs are still useful in exploiting parallelism in the machine. This work tries to study if the loop parallelization (both with and without applying transformations) can be a good case for running scientific programs efficiently on such multi-core architectures. We have found the results to be encouraging and we strongly feel that this could lead to some good results if implemented fully in a production compiler for multi-core architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
AMD Multi-core Products (2006), http://multicore.amd.com/en/products/
Multi-core from Intel Products and Platforms (2006), http://www.intel.com/products/processor/
OpenMP, http://www.openmp.org
Wolfe, M.J.: Techniques for improving the inherent parallelism in programs. Technical Report 78-929, Department of Computer Science, University of Illinois at Urbana-Champaign (July 1990)
Wolfe, M.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading
Banerjee, U.K.: Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell (1993)
Banerjee, U.K.: Loop Parallelization. Kluwer Academic Publishers, Norwell (1994)
Pthreads reference, https://computing.llnl.gov/tutorials/pthreads/
DHollander, E.H.: Partitioning and Labelling of loops by Unimodular Transformation. IEEE Transactions on Parallel and Distributed Systems 3(4) (1992)
Saas, R., Mutka, M.: Enabling unimodular transformation. In: Supercomputing 1994, November 1994, pp. 753–762 (1994)
Banerjee, U.: Unimodular Transformations of Double Loop. In: Advances in Languages and Compilers for Parallel Processing, pp. 192–219 (1991)
Prakash, S.R., Srikant, Y.N.: An Approach to Global Data Partitioning for Distributed Memory Machines. In: IPPS/SPDP (1999)
Prakash, S.R., Srikant, Y.N.: Communication Cost Estimation and Global Data Partitioning for Distributed Memory Machines. In: Fourth International Conference on High Performance Computing, Bangalore (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raghavendra, P. et al. (2010). A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-13119-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13118-9
Online ISBN: 978-3-642-13119-6
eBook Packages: Computer ScienceComputer Science (R0)