Learning from Automatically Versus Manually Parallelized NAS Benchmarks

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13829))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

302 Accesses
1 Citations

Abstract

By comparing automatically versus manually parallelized NAS Benchmarks, we identify code sections that differ, and we discuss opportunities for advancing auto-parallelizers. We find ten patterns that challenge current parallelization technology. We also measure the potential impact of advanced techniques that could perform the needed transformations automatically. While some of our findings are not surprising and difficult to attain – compilers need to get better at identifying parallelism in outermost loops and in loops containing function calls – other opportunities are within reach and can make a difference. They include combining loops into parallel regions, avoiding load imbalance, and improving reduction parallelization.

Advancing compilers through the study of hand-optimized code is a necessary path to move the forefront of compiler research. Very few recent papers have pursued this goal, however. The present work tries to fill this void.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Towards an Achievable Performance for the Loop Nests

Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime Systems

Article Open access 24 April 2018

References

Amini, M., et al.: Par4all: from convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (Jan 2012) (2012)
Google Scholar
Barakhshan, P., Eigenmann, R.: A comparison between automatically versus manually parallelized NAS Benchmarks. Technical Report, Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA (Aug 2022). https://arxiv.org/abs/2212.00165
Barakhshan, P., Eigenmann, R.: iCetus: a semi-automatic parallel programming assistant. In: Li, X., Chandrasekaran, S. (eds.) Lang. Compilers Parallel Comput., pp. 18–32. Springer International Publishing, Cham (2022)
Chapter Google Scholar
Bhosale, A., Barakhshan, P., Rosas, M.R., Eigenmann, R.: Automatic and interactive program parallelization using the Cetus source to source compiler infrastructure v2.0. Electronics 11(5), 809 (2022)
Google Scholar
Bhosale, A., Barakhshan, P., Rosas, M.R., Eigenmann, R.: The Cetus compiler manual (2022). https://sites.udel.edu/cetus-cid/the-cetus-compiler-manual/
Bhosale, A., Eigenmann, R.: On the automatic parallelization of subscripted subscript patterns using array property analysis. In: Proceedings of the ACM International Conference on Supercomputing, pp. 392–403 (2021)
Google Scholar
Blume, W.J.: Success and limitations in automatic parallelization of the perfect benchmarks programs. Master’s thesis, University of Illinois at Urbana-Champaign, Center for Supercomputing Res. & Dev. (July 1992)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 101–113. PLDI ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1375581.1375595
Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: Languages and Compilers for Parallel Computing, pp. 126–139 (2010)
Google Scholar
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks(R). IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)
Article Google Scholar
Gomez-Sousa, H., Arenaz, M., Rubinos-Lopez, O., Martinez-Lorenzo, J.A.: Novel source-to-source compiler approach for the automatic parallelization of codes based on the method of moments. In: 2015 9th European Conference on Antennas and Propagation (EuCAP), pp. 1–6 (2015)
Google Scholar
Martorell, X., et al.: Techniques supporting threadprivate in openMP. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, p. 7 (2006). https://doi.org/10.1109/IPDPS.2006.1639501
Mosseri, I., Alon, L.-O., Harel, R.E., Oren, G.: ComPar: optimized multi-compiler for automatic openmp S2S parallelization. In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16
Chapter Google Scholar
NASA Advanced Supercomputing (NAS) Division: NAS Parallel Benchmarks (2022). https://www.nas.nasa.gov/software/npb.html
Prema, S., Jehadeesan, R., Panigrahi, B.K.: Identifying pitfalls in automatic parallelization of NAS parallel benchmarks. In: 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH), pp. 1–6 (Feb 2017). https://doi.org/10.1109/PARCOMPTECH.2017.8068329
Prema, S., Nasre, R., Jehadeesan, R., Panigrahi, B.: A study on popular auto-parallelization frameworks. Concurr. Comput. Pract. Experience 31(17), e5168 (2019). https://doi.org/10.1002/cpe.5168
Article Google Scholar
Quinlan, D., Liao, C.: The ROSE source-to-source compiler infrastructure. In: Cetus users and compiler infrastructure workshop, in conjunction with PACT, vol. 2011, p. 1. Citeseer (2011)
Google Scholar
SNUNPB(2013): NAS Parallel Benchmarks C version (2019). http://aces.snu.ac.kr/software/snu-npb/
Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® OpenMP C++/Fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6(1) (2002)
Google Scholar
University of Delaware: Cetus, a parallelizing source-to-source compiler for C programs (2022). https://sites.udel.edu/cetus-cid/
Wikipedia: Dennard scaling (2022). https://en.wikipedia.org/wiki/Dennard_scaling

Download references

Acknowledgements

This work was supported by the National Science Foundation (NSF) under Awards Nos. 1931339, 2209639, and 1833846.

Author information

Authors and Affiliations

University of Delaware, Newark, DE, USA
Parinaz Barakhshan & Rudolf Eigenmann

Authors

Parinaz Barakhshan
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Eigenmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parinaz Barakhshan .

Editor information

Editors and Affiliations

University of Illinois Urbana-Champaign, Urbana, IL, USA
Charith Mendis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barakhshan, P., Eigenmann, R. (2023). Learning from Automatically Versus Manually Parallelized NAS Benchmarks. In: Mendis, C., Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2022. Lecture Notes in Computer Science, vol 13829. Springer, Cham. https://doi.org/10.1007/978-3-031-31445-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-31445-2_3
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31444-5
Online ISBN: 978-3-031-31445-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from Automatically Versus Manually Parallelized NAS Benchmarks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Towards an Achievable Performance for the Loop Nests

Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning from Automatically Versus Manually Parallelized NAS Benchmarks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Towards an Achievable Performance for the Loop Nests

Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation