Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3229710.3229748acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Dynamic Tuning of OpenMP Memory Bound Applications in Multisocket Systems using MATE

Published: 13 August 2018 Publication History

Abstract

The performance of OpenMP applications executed in multisocket multicore processors can be limited by the memory interface. In a multisocket environment, each multicore processor can present a performance degradation in memory-bound parallel regions when sharing the same Last Level Cache (LLC). In this case the use of all available resources is not always the best choice in terms of execution time and/or efficiency. The best configuration for an application depends on the system's architecture, the input data, and the data evolution; hence, it can vary from execution to execution or even during the same execution. This means that, in order to find a configuration that makes an efficient use of the available resources, an adequate methodology and tools are needed. In this work we present the integration of a performance model for OpenMP memory bound applications in a dynamic performance tuning tool called MATE. For achieving this integration, MATE was extended to support measurement of hardware counters, and the performance model was adapted for determining the best number of threads for an application and for being implemented in MATE as a Tunlet.
The developed Tunlet has been evaluated using different multi-socket architectures and memory bound application benchmarks, showing that the proposed approach can be efficient.

References

[1]
César Allande, Josep Jorba, Anna Sikora, and Eduardo César. 2014. A Performance Model for OpenMP Memory Bound Applications in Multisocket Systems. Procedia Computer Science 29 (2014), 2208 -- 2218. 2014 International Conference on Computational Science.
[2]
César Allande, Josep Jorba, Anna Sikora, and Eduardo César. 2015. Performance model based on memory footprint for OpenMP memory bound applications. In Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, ParCo 2015, 1--4 September 2015, Edinburgh, Scotland, UK. 73--82.
[3]
D.H. Bailey, E. Barszcz, J.T. Barton, D.S. Browning, R.L. Carter, L. Dagum, R.A. Fatoohi, P.O. Frederickson, T.A. Lasinski, R.S. Schreiber, H.D. Simon, V. Venkatakrishnan, and S.K. Weeratunga. 1991. The Nas Parallel Benchmarks. The International Journal of Super computing Applications 5, 3 (1991), 63--73. arXiv:https://doi.org/10.1177/109434209100500306
[4]
Shajulin Benedict. 2018. SCALE-EA: A Scalability Aware Performance Tuning Framework for OpenMP Applications. Scalable Computing: Practice and Experience 19, 1 (2018), 15--30.
[5]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. 2000. A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications 14, 3 (2000), 189--204. arXiv:https://doi.org/10.1177/109434200001400303
[6]
Eduardo César, Anna Morajko, Tomàs Margalef, Joan Sorribes, Antonio Espinosa, and Emilio Luque. 2002. Dynamic performance tuning supported by program specification. Scientific Programming 10, 1 (2002), 35--44.
[7]
Eduardo César, Andreu Moreno, Joan Sorribes, and Emilio Luque. 2006. Modeling Master/Worker applications for automatic performance tuning. Parallel Comput. 32, 7--8 (2006), 568--589.
[8]
Anamika Chowdhury, Madhura Kumaraswamy, and Michael Gerndt. 2017. READEX Tool Suite for Energy-efficiency Tuning of HPC Applications. In Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC '17). ACM, New York, NY, USA, 11--12.
[9]
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Abraham, Daniel Becker, and Bernd Mohr. 2010. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22, 6 (3 2010), 702 -- 719.
[10]
Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S. Müller, and Wolfgang E. Nagel. 2008. The Vampir Performance Analysis Tool-Set. In Tools for High Performance Computing - Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart. 139--155.
[11]
Andrea Martinez, Anna Sikora, Eduardo César, and Joan Sorribes. 2014. ELASTIC: A large scale dynamic tuning environment. Scientific Programming 22, 4 (2014), 261--271.
[12]
John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. http://www.cs.virginia.edu/stream/.
[13]
A. Morajko, P. Caymes-Scutari, T. Margalef, and E. Luque. 2006. MATE: Monitoring, Analysis and Tuning Environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience 19, 11 (2006), 1517--1531.
[14]
ARB OpenMP. 2013. OpenMP application program interface version 4.0.
[15]
Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. The International Journal of High Performance Computing Applications 20, 2 (2006), 287--311.
[16]
Anna Sikora, Eduardo César, Isaías A. Comprés Ureña, and Michael Gerndt. 2016. Auto tuning of MPI Applications Using PTF. In Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, Kyoto, Japan, May 31 - June 04, 2016. 31--38.
[17]
Open Source. 2016. Dyninst: An application program interface (api) for runtime code generation. Online, http://www.dyninst.org (2016).
[18]
Cristian Ţăpuş, I-Hsin Chung, Jeffrey K Hollingsworth, et al. 2002. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing. IEEE Computer Society Press, 1--11.
[19]
Zheng Wang and Michael F.P. O'Boyle. 2009. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach. SIGPLAN Not. 44, 4 (Feb. 2009), 75--84.

Cited By

View all
  • (2020)How to Evaluate Various Commonly Used Program Classification Methods?Advanced Computer Architecture10.1007/978-981-15-8135-9_17(233-248)Online publication date: 5-Sep-2020
  • (2019)Hardware Counters’ Space Reduction for Code Region CharacterizationEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_6(74-86)Online publication date: 13-Aug-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
August 2018
409 pages
ISBN:9781450365239
DOI:10.1145/3229710
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic tuning
  2. hardware counters
  3. parallel/distributed application
  4. performance analysis
  5. performance models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP '18 Comp

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)How to Evaluate Various Commonly Used Program Classification Methods?Advanced Computer Architecture10.1007/978-981-15-8135-9_17(233-248)Online publication date: 5-Sep-2020
  • (2019)Hardware Counters’ Space Reduction for Code Region CharacterizationEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_6(74-86)Online publication date: 13-Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media