Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3588195.3592984acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Published: 07 August 2023 Publication History

Abstract

Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoising Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, LULESH, XSBench, RSBench, miniFE, miniAMR, and Quicksilver benchmarks and applications. We apply our multimodal learning techniques to the tasks of (i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, (ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in almost all experiments.

References

[1]
Jordi Alcaraz, Anna Sikora, and Eduardo César. 2019. Hardware counters' space reduction for code region characterization. In European Conference on Parallel Processing. Springer, 74--86.
[2]
Jordi Alcaraz, Steven Sleder, Ali TehraniJamsaz, Anna Sikora, Ali Jannesari, Joan Sorribes, and Eduardo Cesar. 2021. Building representative and balanced datasets of OpenMP parallel regions. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 67--74.
[3]
Jordi Alcaraz, Ali TehraniJamsaz, Akash Dutta, Anna Sikora, Ali Jannesari, Joan Sorribes, and Eduardo Cesar. 2022. Predicting number of threads using balanced datasets for openMP regions. Computing (2022), 1--19.
[4]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), Vol. 51, 4 (2018), 1--37.
[5]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).
[6]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, Vol. 53, 4 (2018), 404--419.
[7]
AMD. [n.d.]. AMD OpenCL accelerated parallel processing SDK. https://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/.
[8]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. Opentuner: An extensible framework for program autotuning. In Proceedings of the 23rd international conference on Parallel architectures and compilation. 303--316.
[9]
P Balaprakash, R Egele, and P Hovland. 2020. ytopt. https://github.com/ytopt-team/ytopt (GitHub repository). Argonne National Laboratory. (2020).
[10]
Md Abdullah Shahneous Bari, Nicholas Chaimov, Abid M Malik, Kevin A Huck, Barbara Chapman, Allen D Malony, and Osman Sarood. 2016. Arcs: Adaptive runtime configuration selection for power-constrained openmp applications. In 2016 IEEE international conference on cluster computing (CLUSTER). IEEE.
[11]
Md Abdullah Shahneous Bari, Abid M Malik, Ahmad Qawasmeh, and Barbara Chapman. 2019. Performance and energy impact of OpenMP runtime configurations on power constrained systems. Sustainable Computing: Informatics and Systems, Vol. 23 (2019), 1--12.
[12]
E Barszcz, J Barton, L Dagum, P Frederickson, T Lasinski, R Schreiber, V Venkatakrishnan, S Weeratunga, D Bailey, D Browning, et al. 1991. The nas parallel benchmarks. In The International Journal of Supercomputer Applications. Citeseer.
[13]
Tal Ben-Nun, Alice Shoshana Jakobovits, and Torsten Hoefler. 2018. Neural Code Comprehension: A Learnable Representation of Code Semantics. Advances in Neural Information Processing Systems, Vol. 31 (2018), 3585--3597.
[14]
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1--4.
[15]
Alexander Brauckmann, Andrés Goens, Sebastian Ertel, and Jeronimo Castrillon. 2020. Compiler-based graph representations for deep learning models of code. In Proceedings of the 29th International Conference on Compiler Construction. 201--211.
[16]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). Ieee, 44--54.
[17]
Shuai Che, Jeremy W Sheaffer, Michael Boyer, Lukasz G Szafaryn, Liang Wang, and Kevin Skadron. 2010. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In IEEE International Symposium on Workload Characterization (IISWC'10). IEEE, 1--11.
[18]
Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4--2, Vol. 1, 4 (2015), 1--4.
[19]
NVIDIA Corporation. [n.d.]. CUDA. http://developer.nvidia.com/object/cuda.html.
[20]
Chris Cummins, Zacharias V Fisches, Tal Ben-Nun, Torsten Hoefler, Michael FP O'Boyle, and Hugh Leather. 2021. PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations. In International Conference on Machine Learning. PMLR, 2244--2253.
[21]
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 219--232.
[22]
Christopher Edward Cummins. 2020. Deep learning for compilers. (2020).
[23]
Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, and Chul-Joo Kim. 2018. A deep tree-based model for software defect prediction. arXiv preprint arXiv:1802.00921 (2018).
[24]
Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S Meredith, Philip C Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd workshop on general-purpose computation on graphics processing units. 63--74.
[25]
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (ATC). 1--14. https://www.flux.utah.edu/paper/duplyakin-atc19
[26]
Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Anna Sikora, Eduardo Cesar, and Ali Jannesari. 2022. Pattern-based autotuning of openmp loops using graph neural networks. In 2022 IEEE/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S). IEEE, 26--31.
[27]
Akash Dutta, Jee Choi, and Ali Jannesari. 2023. Power Constrained Autotuning using Graph Neural Networks. In IPDPS 2023--37th IEEE International Parallel & Distributed Processing Symposium.
[28]
Davide Gadioli, Emanuele Vitali, Gianluca Palermo, and Cristina Silvano. 2018. Margot: a dynamic autotuning framework for self-aware approximate computing. IEEE transactions on computers, Vol. 68, 5 (2018), 713--728.
[29]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263--1272.
[30]
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 innovative parallel computing (InPar). Ieee, 1--10.
[31]
Dominik Grewe, Zheng Wang, and Michael FP O'Boyle. 2013. Portable mapping of data parallel programs to opencl for heterogeneous systems. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 1--10.
[32]
Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 100--112.
[33]
Ameer Haj-Ali, Nesreen K Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, and Ion Stoica. 2020. NeuroVectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 242--255.
[34]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.
[35]
Si Hammond, Christian Trott, and Noah Evans. 2022. miniFE. https://github.com/Mantevo/miniFE. GitHub repository (2022).
[36]
Xueyuan Han, Thomas Pasquier, Adam Bates, James Mickens, and Margo Seltzer. 2020. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020).
[37]
Zia Ul Huda, Rohit Atre, Ali Jannesari, and Felix Wolf. 2016. Automatic parallel pattern detection in the algorithm structure design space. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 43--52.
[38]
Michael Jahrer. 2017. Porto Seguro's Safe Driver Prediction. https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/44629. (2017).
[39]
Ian Karlin, Abhinav Bhatele, Jeff Keasler, Bradford L. Chamberlain, Jonathan Cohen, Zachary DeVito, Riyaz Haque, Dan Laney, Edward Luke, Felix Wang, David Richards, Martin Schulz, and Charles Still. 2013a. Exploring Traditional and Emerging Parallel Programming Models using a Proxy Application. In 27th IEEE International Parallel & Distributed Processing Symposium (IEEE IPDPS 2013). Boston, USA.
[40]
Ian Karlin, Jeff Keasler, and J Robert Neely. 2013b. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
[41]
Jakub Katarzy'nski and Maciej Cytowski. 2014. Towards autotuning of OpenMP applications on multicore architectures. arXiv preprint arXiv:1401.4063 (2014).
[42]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30 (2017).
[43]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[44]
Jaehoon Koo, Prasanna Balaprakash, Michael Kruse, Xingfu Wu, Paul Hovland, and Mary Hall. 2021. Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations. arXiv preprint arXiv:2105.04555 (2021).
[45]
Sameer Kulkarni, John Cavazos, Christian Wimmer, and Douglas Simon. 2013. Automatic construction of inlining heuristics using machine learning. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 1--12.
[46]
Lawrence Livermore National Lab. 2022. Quicksilver. https://github.com/LLNL/Quicksilver.
[47]
Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning. PMLR, 3835--3845.
[48]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[49]
Chunhua Liao, Pei-Hung Lin, Joshua Asplund, Markus Schordan, and Ian Karlin. 2017. DataRaceBench: a benchmark suite for systematic evaluation of data race detection tools. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.
[50]
Ilya Loshchilov and Frank Hutter. 2018. Fixing weight decay regularization in adam. (2018).
[51]
Alberto Magni, Christophe Dubach, and Michael O'Boyle. 2014. Automatic optimization of thread-coarsening for graphics processors. In Proceedings of the 23rd international conference on Parallel architectures and compilation. 455--466.
[52]
John D. McCalpin. 1991--2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. http://www.cs.virginia.edu/stream/.
[53]
John D. McCalpin. 1995 a. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.
[54]
John D McCalpin. 1995 b. Stream benchmark. Link: www. cs. virginia. edu/stream/ref. html# what, Vol. 22, 7 (1995).
[55]
Charith Mendis, Cambridge Yang, Yewen Pu, Dr Amarasinghe, Michael Carbin, et al. 2019. Compiler auto-vectorization with imitation learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[56]
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning parameter choices in HPC applications using Bayesian optimization. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 831--840.
[57]
Philip J Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A portable interface to hardware performance counters. In Proceedings of the department of defense HPCMP users group conference, Vol. 710. Citeseer.
[58]
Dheya Mustafa, Rudolf Eigenmann, et al. 2011. Performance analysis and tuning of automatically parallelized OpenMP applications. In International Workshop on OpenMP. Springer, 151--164.
[59]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In ICML.
[60]
Louis-Noël Pouchet et al. 2012. Polybench: The polyhedral benchmark suite. URL: http://www. cs. ucla. edu/pouchet/software/polybench, Vol. 437 (2012), 1--1.
[61]
Dhanesh Ramachandram and Graham W Taylor. 2017. Deep multimodal learning: A survey on recent advances and trends. IEEE signal processing magazine, Vol. 34, 6 (2017), 96--108.
[62]
Piyumi Rameshka, Pasindu Senanayake, Thulana Kannangara, Praveen Seneviratne, Sanath Jayasena, Tharindu Rusira, and Mary Hall. 2019. Rigel: A Framework for OpenMP PerformanceTuning. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 2093--2102.
[63]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from" big code". ACM SIGPLAN Notices, Vol. 50, 1 (2015), 111--124.
[64]
Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, and Devesh Tiwari. 2021. Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 1280--1295.
[65]
Aparna Sasidharan and Marc Snir. 2016. MiniAMR-A miniapp for Adaptive Mesh Refinement. (2016).
[66]
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks, Vol. 61 (2015), 85--117.
[67]
Sangmin Seo, Gangwon Jo, and Jaejin Lee. 2011. Performance characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE international symposium on workload characterization (IISWC). IEEE, 137--148.
[68]
Vinu Sreenivasan, Rajath Javali, Mary Hall, Prasanna Balaprakash, Thomas RW Scogland, and Bronis R de Supinski. 2019. A framework for enabling OpenMP autotuning. In International Workshop on OpenMP. Springer, 50--60.
[69]
Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. 2021. Value learning for throughput optimization of deep learning workloads. Proceedings of Machine Learning and Systems, Vol. 3 (2021).
[70]
Mark Stephenson and Saman Amarasinghe. 2005. Predicting unroll factors using supervised classification. In International symposium on code generation and optimization. IEEE, 123--134.
[71]
John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, Vol. 127 (2012), 27.
[72]
Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Songyuan Li, and Jabbar Abdul. 2021. Recent Advances and Trends in Multimodal Deep Learning: A Review. arXiv preprint arXiv:2105.11087 (2021).
[73]
Jianing Sun and Yingxue Zhang. 2019. Multi-graph convolutional neural networks for representation learning in recommendation. In IEEE ICDM.
[74]
Cristian Tapus, I-Hsin Chung, and Jeffrey K Hollingsworth. 2002. Active harmony: Towards automated performance tuning. In SC'02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. IEEE, 44--44.
[75]
Ali Tehranijamsaz, Mihail Popov, Akash Dutta, Emmanuelle Saillard, and Ali Jannesari. 2022. Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization. In IPDPS 2022--36th IEEE International Parallel & Distributed Processing Symposium.
[76]
Jayaraman J Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In Proceedings of the 2018 international conference on supercomputing. 385--395.
[77]
John R Tramm, Andrew R Siegel, Benoit Forget, and Colin Josey. 2014a. Performance analysis of a reduced data movement algorithm for neutron cross section data in monte carlo simulations. In International Conference on Exascale Applications and Software. Springer, 39--56.
[78]
John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. 2014b. XSBench-the development and verification of a performance abstraction for Monte Carlo reactor analysis. The Role of Reactor Physics toward a Sustainable Future (PHYSOR) (2014).
[79]
Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In 2010 39th International Conference on Parallel Processing Workshops. IEEE, 207--216.
[80]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[81]
S VenkataKeerthy, Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna Upadrasta, and YN Srikant. 2020. Ir2vec: Llvm ir based scalable program embeddings. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 17, 4 (2020), 1--27.
[82]
Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Yanfang Ye, and Philip S Yu. 2020. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. arXiv preprint arXiv:2011.14867 (2020).
[83]
Zheng Wang, Georgios Tournavitis, Björn Franke, and Michael FP O'boyle. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 11, 1 (2014), 1--26.
[84]
Chad Wood, Giorgis Georgakoudis, David Beckingsale, David Poliakoff, Alfredo Gimenez, Kevin Huck, Allen Malony, and Todd Gamblin. 2021. Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning. In International Conference on High Performance Computing. Springer, 453--472.
[85]
Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, and Mary Hall. 2021. Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version). arXiv preprint arXiv:2104.13242 (2021).
[86]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.
[87]
Tomofumi Yuki and Louis-Noël Pouchet. 2015. Polybench 4.0.
[88]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. Graphit: A high-performance graph dsl. Proceedings of the ACM on Programming Languages, Vol. 2, OOPSLA (2018), 1--30.

Cited By

View all
  • (2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
  • (2024)MUPPETProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649246(22-31)Online publication date: 3-Mar-2024
  • (2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024

Index Terms

  1. Performance Optimization using Multimodal Modeling and Heterogeneous GNN

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
      August 2023
      350 pages
      ISBN:9798400701559
      DOI:10.1145/3588195
      • General Chair:
      • Ali R. Butt,
      • Program Chairs:
      • Ningfang Mi,
      • Kyle Chard
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 August 2023

      Check for updates

      Author Tags

      1. OpenCL
      2. OpenMP
      3. auto-tuning
      4. heterogeneous graph neural networks
      5. multimodal learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      HPDC '23

      Acceptance Rates

      Overall Acceptance Rate 166 of 966 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)717
      • Downloads (Last 6 weeks)84
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
      • (2024)MUPPETProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649246(22-31)Online publication date: 3-Mar-2024
      • (2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media