Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3671016.3671380acmconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article
Open access

Global-State Aware Automatic NUMA Balancing

Published: 24 July 2024 Publication History

Abstract

Non-uniform memory access (NUMA) has become a standard architecture for modern servers. However, NUMA effect (i.e., local memory access typically takes shorter time than remote memory accesses) is unavoidable. To address this issue, Automatic NUMA Balancing(Auto-NUMA) was proposed. Nevertheless, Auto-NUMA can improve or hurt performance of an application, depending on its characteristics which is difficult for end users to know.
To tackle this problem, we propose Global-State Aware Automatic NUMA Balancing (GSA-Auto-NUMA). It innovates two techniques. First, GSA-Auto-NUMA identifies a set of key metrics to accurately assess the current state of a NUMA system. Second, GSA-Auto-NUMA utilizes these metrics to make real-time decisions on whether to enable Auto-NUMA through five steps of evaluation.
We implemented GSA-Auto-NUMA on both ARM and x86 platforms and validated its performance through experiments. The results show that, unlike Auto-NUMA, GAS-Auto-NUMA does not hurt performance at least, and improves performance for most applications. More over, GSA-Auto-NUMA outperforms Auto-NUMA up to 0.47 × and 1.20 × on ARM and x86 NUMA servers, respectively.

References

[1]
Reto Achermann. 2020. GitHub - mitosis-project/mitosis-workload-hashjoin: The HashJoin workload used for evaluation. https://github.com/mitosis-project/mitosis-workload-hashjoin. [Accessed 26-03-2024].
[2]
Reto Achermann, Ashish Panwar, Abhishek Bhattacharjee, Timothy Roscoe, and Jayneel Gandhi. 2020. Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 283–300. https://doi.org/10.1145/3373376.3378468
[3]
AMD. 2024. 4th Generation AMD EPYC™ Processors. https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series.html. [Accessed 25-03-2024].
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (Toronto, Ontario, Canada) (PACT ’08). Association for Computing Machinery, New York, NY, USA, 72–81. https://doi.org/10.1145/1454115.1454128
[5]
Mei-Ling Chiang and Wei-Lun Su. 2021. Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems. Applied Sciences 11, 14 (2021), 6486. https://doi.org/10.3390/app11146486
[6]
Mei-Ling Chiang, Wei-Lun Su, Shu-Wei Tu, and Zhen-Wei Lin. 2019. Memory-aware kernel mechanism and policies for improving internode load balancing on NUMA systems. Software: Practice and Experience 49, 10 (2019), 1485–1508. https://doi.org/10.1002/spe.2731 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.2731
[7]
Younghyun Cho, Camilo A. Celis Guzman, and Bernhard Egger. 2018. Maximizing system utilization via parallelism management for co-located parallel applications. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (Limassol, Cyprus) (PACT ’18). Association for Computing Machinery, New York, NY, USA, Article 14, 14 pages. https://doi.org/10.1145/3243176.3243199
[8]
Linux community. 2024. Perf Wiki — perf.wiki.kernel.org. https://perf.wiki.kernel.org/index.php/Main_Page. [Accessed 17-04-2024].
[9]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic management: a holistic approach to memory placement on NUMA systems. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 381–394. https://doi.org/10.1145/2451116.2451157
[10]
Dell. 2024. PowerEdge Rack Servers – Enterprise Servers. https://www.dell.com/en-au/dt/servers/poweredge-rack-servers.htm?hve=explore+poweredge-rack-servers##tab0=0&tab1=0&accordion0. [Accessed 25-03-2024].
[11]
Adi Yoaz Don Soltis, Irma Esmer and Sailesh Kottapalli. 2017. The New Intel Xeon Scalable Processor(formerly skylake-sp). https://old.hotchips.org/wp-content/uploads/hc_archives/hc29/HC29.22-Tuesday-Pub/HC29.22.90-Server-Pub/HC29.22.930-Xeon-Skylake-sp-Kumar-Intel.pdf. [Accessed 26-03-2024].
[12]
Jill Dunbar. 2023. NAS Parallel Benchmarks. https://www.nas.nasa.gov/software/npb.html. [Accessed 26-03-2024].
[13]
Thomas W. Edgar and David O. Manz. 2017. Chapter 4 - Exploratory Study. In Research Methods for Cyber Security, Thomas W. Edgar and David O. Manz (Eds.). Syngress, The United Kingdom Netherlands, 95–130. https://doi.org/10.1016/B978-0-12-805349-2.00004-2
[14]
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire, and Dejan Kostić. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys ’19). Association for Computing Machinery, New York, NY, USA, Article 8, 17 pages. https://doi.org/10.1145/3302424.3303977
[15]
Mel Gorman. 2012. Foundation for automatic NUMA balancing. https://lwn.net/Articles/523065/. [Accessed 25-03-2024].
[16]
Red Hat. 2024. numad. https://access.redhat.com/documentation/enus/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-tool_reference-numad. [Accessed 27-03-2024].
[17]
Hisilicon. 2024. Kunpeng 920 Chipset. https://www.hisilicon.com/en/products/Kunpeng/Huawei-Kunpeng/Huawei-Kunpeng-920. [Accessed 25-03-2024].
[18]
HP. 2024. HPE Cray XD supercomputers. https://www.hpe.com/au/en/compute/hpc/supercomputing/cray-exascale-supercomputer.html. [Accessed 25-03-2024].
[19]
Rob J Hyndman. 2011. Moving Averages.
[20]
Intel. 2017. Intel® Xeon® Processor Scalable Memory Family Uncore Performance Monitoring. https://kib.kiev.ua/x86docs/Intel/PerfMon/336274-001.pdf. [Accessed 26-03-2024].
[21]
Intel. 2024. Intel® Xeon® Platinum Processors. https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable/platinum/products.html. [Accessed 25-03-2024].
[22]
CHENG Jian. 2024. scheduling. https://github.com/gatieme/LDD-LinuxDeviceDrivers/blob/master/study/kernel/00-DESCRIPTION/SCHEDULER.md. [Accessed 03-04-2024].
[23]
jtramm. 2024. GitHub - ANL-CESAR/XSBench: XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark. https://github.com/ANL-CESAR/XSBench. [Accessed 26-03-2024].
[24]
The kernel development community. 2020. HiSilicon SoC uncore Performance Monitoring Unit (PMU), The Linux Kernel documentation. https://www.kernel.org/doc/html/v5.5/admin-guide/perf/hisi-pmu.html. [Accessed 26-03-2024].
[25]
Christoph Lameter. 2013. NUMA (Non-Uniform Memory Access): An Overview: NUMA becomes more common because memory controllers get close to execution units on microprocessors.Queue 11, 7 (jul 2013), 40–51. https://doi.org/10.1145/2508834.2513149
[26]
Lenovo. 2024. Shop Rack Servers | Next Gen ThinkSystem 1u, 2u, 4u Rack Servers. https://www.lenovo.com/us/en/c/servers-storage/servers/racks/. [Accessed 25-03-2024].
[27]
Tan Li, Yufei Ren, Dantong Yu, and Shudong Jin. 2017. Analysis of NUMA effects in modern multicore systems for the design of high-performance data transfer applications. Future Generation Computer Systems 74 (2017), 41–50. https://doi.org/10.1016/j.future.2017.04.001
[28]
Nakul Manchanda and Karan Anand. 2010. Non-uniform memory access (numa). New York University 4 (2010).
[29]
Marcos Maroñas, Antoni Navarro, Eduard Ayguadé, and Vicenç Beltran. 2023. Mitigating the NUMA effect on task-based runtime systems. The Journal of Supercomputing 79, 13 (2023), 14287–14312.
[30]
The Linux Kernel Organization. 2023. The Linux Kernel Archives. https://www.kernel.org/. [Accessed 27-03-2024].
[31]
Ashish Panwar. 2021. GitHub - mitosis-project/vmitosis-workloads. https://github.com/mitosis-project/vmitosis-workloads. [Accessed 26-03-2024].
[32]
Ashish Panwar, Reto Achermann, Arkaprava Basu, Abhishek Bhattacharjee, K. Gopinath, and Jayneel Gandhi. 2021. Fast local page-tables for virtualized NUMA servers with vMitosis. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS ’21). Association for Computing Machinery, New York, NY, USA, 194–210. https://doi.org/10.1145/3445814.3446709
[33]
Mihail Popov, Alexandra Jimborean, and David Black-Schaffer. 2019. Efficient thread/page/parallelism autotuning for NUMA systems. In Proceedings of the ACM International Conference on Supercomputing (Phoenix, Arizona) (ICS ’19). Association for Computing Machinery, New York, NY, USA, 342–353. https://doi.org/10.1145/3330345.3330376
[34]
Jianmin Qian. 2022. Research On Resource Management Optimization Strategy for NUMA Architecture In Virtualized Environment. Ph. D. Dissertation. Shanghai Jiao Tong University.
[35]
Hongliang Qu and Zhibin Yu. 2024. WASP: Workload-Aware Self-Replicating Page-Tables for NUMA Servers. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (La Jolla, CA, USA) (ASPLOS ’24). Association for Computing Machinery, New York, NY, USA, 1233–1249. https://doi.org/10.1145/3620665.3640369
[36]
Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau. 2018. Multi-CPU Scheduling. https://pages.cs.wisc.edu/ remzi/OSTEP/cpu-sched-multi.pdf. [Accessed 25-03-2024].
[37]
Isaac Sánchez Barrera, David Black-Schaffer, Marc Casas, Miquel Moretó, Anastasiia Stupnikova, and Mihail Popov. 2020. Modeling and optimizing NUMA effects and prefetching with machine learning. In Proceedings of the 34th ACM International Conference on Supercomputing (Barcelona, Spain) (ICS ’20). Association for Computing Machinery, New York, NY, USA, Article 34, 13 pages. https://doi.org/10.1145/3392717.3392765
[38]
sherlock wang. 2020. Hisi Perf Uncore Event Introduce. https://blog.csdn.net/scarecrow_byr/article/details/104402257. [Accessed 26-03-2024].
[39]
Jaehyun Song, Minwoo Ahn, Gyusun Lee, Euiseong Seo, and Jinkyu Jeong. 2021. A Performance-Stable NUMA Management Scheme for Linux-Based HPC Systems. IEEE Access 9 (2021), 52987–53002. https://doi.org/10.1109/ACCESS.2021.3069991
[40]
Ubuntu. 2021. Ubuntu 18.04.6 LTS (Bionic Beaver). https://releases.ubuntu.com/18.04.6/?_gl=1*19l7y64*_gcl_au*MTU5ODUzNTY5My4xNzExNTA1NjYz&_ga=2.268622902.611587924.1711505663-838900404.1711505663. [Accessed 27-03-2024].
[41]
Rik van Riel. 2014. Automatic NUMA Balancing. https://www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf. [Accessed 25-03-2024].
[42]
Markus Velten, Robert Schöne, Thomas Ilsche, and Daniel Hackenberg. 2022. Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors. In Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (Beijing, China) (ICPE ’22). Association for Computing Machinery, New York, NY, USA, 165–175. https://doi.org/10.1145/3489525.3511689
[43]
Jing Xia, Chuanning Cheng, Xiping Zhou, Yuxing Hu, and Peter Chun. 2021. Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services. IEEE Micro 41, 5 (2021), 67–75. https://doi.org/10.1109/MM.2021.3085578

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware
July 2024
518 pages
ISBN:9798400707056
DOI:10.1145/3671016
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2024

Check for updates

Author Tags

  1. Automatic NUMA Balancing(Auto-NUMA)
  2. Last Level Cache(LLC)
  3. Load Balancing
  4. Perf
  5. Uncore PMU

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Internetware 2024
Sponsor:

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 211
    Total Downloads
  • Downloads (Last 12 months)211
  • Downloads (Last 6 weeks)90
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media