PAARes: an efficient process allocation based on the available resources of cluster nodes

J. L. Quiroz-Fabián¹,
G. Román-Alonso¹,
M. A. Castro-García¹ &
…
M. Aguilar-Cornejo¹

153 Accesses
Explore all metrics

Abstract

The process allocation is a problem in high performance computing, especially when using heterogeneous architectures involving diverse performance characteristics such as number of cores and their frequencies, multithreading technologies, cache memory etc. In order to improve the application performance, it is necessary to consider which processing units are the most suitable to execute the application processes. In this paper, PAARes (Process Allocation based on the Available Resources) strategy is implemented that automatically collects the system information for the process distribution by considering the processing capacity of each node in a cluster and their available resources. To demonstrate the efficiency and efficacy of the proposed strategy, the NAS (NASA Advanced Supercomputing) parallel benchmark is run on homogeneous and heterogeneous clusters under both dedicated and non-dedicated environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

A Self-adaptive HPL-Based Benchmark with Dynamic Task Parallelism for Multicore Systems

Benchmarking Performance: Influence of Task Location on Cluster Throughput

Availability of data and materials

Not applicable.

Notes

In OpenMPI a slot is an allocation unit for a process.
Communications where the sender, receiver and message are well defined
Process that generates load on the CPU performing floating point operations.

References

Acun B, Hardy DJ, Kale LV, Li K, Phillips JC, Stone JE (2018) Scalable molecular dynamics with NAMD on the summit system. IBM J Res Dev 62(6):4–149. https://doi.org/10.1147/JRD.2018.2888986
Article Google Scholar
Guo Z, Lu D, Yan Y, Hu S, Liu R, Tan G, Sun N, Jiang W, Liu L, Chen Y, Zhang L, Chen M, Wang H, Jia W (2022) Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms. In: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP’22. Association for Computing Machinery, New York, pp 205–218. https://doi.org/10.1145/3503221.3508425
Morillo J, Vassaux M, Coveney PV, Garcia-Gasulla M (2022) Hybrid parallelization of molecular dynamics simulations to reduce load imbalance. J Supercomput 78(7):9184–9215. https://doi.org/10.1007/s11227-021-04214-4
Article Google Scholar
Pérez-Espinosa A, Aguilar-Cornejo M, Dagdug L (2020) First-passage, transition path, and looping times in conical varying-width channels: comparison of analytical and numerical results. AIP Adv 10(5):055201. https://doi.org/10.1063/5.0004026
Article Google Scholar
Qiu H, Xu C, Li D, Wang H, Li J, Wang Z (2022) Parallelizing and balancing coupled DSMC/PIC for large-scale particle simulations. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 390–401. https://doi.org/10.1109/IPDPS53621.2022.00045
Mata AN, Castellanos Abrego NP, Alonso GR, Castro García MA, Garza GL, God ínez Fernández JR (2018) Parallel simulation of sinoatrial node cells synchronization. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp 126–133. https://doi.org/10.1109/PDP2018.2018.00025
Cordero-Sánchez S, Rojas-González F, Román-Alonso G, Castro-García MA, Aguilar-Cornejo M, Matadamas-Hernández J (2016) Pore networks subjected to variable connectivity and geometrical restrictions: a simulation employing a multicore system. J Comput Sci 16:177–189. https://doi.org/10.1016/j.jocs.2016.06.003
Article Google Scholar
Ando S, Kaneda M, Suga K (2022) Permeability prediction of fibrous porous media by the lattice Boltzmann method with a fluid-structure boundary reconstruction scheme. J Ind Text 51(4_suppl):6902–6923
Article Google Scholar
Pearson C, Javeed A, Devine K (2022) Machine learning for CUDA+MPI design rules. In: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 880–889. https://doi.org/10.1109/IPDPSW55747.2022.00144
Alemany S, Nucciarone J, Pissinou N (2021) Jespipe: a plugin-based, open MPI framework for adversarial machine learning analysis. In: 2021 IEEE International Conference on Big Data (Big Data), pp 3663–3670. https://doi.org/10.1109/BigData52589.2021.9671385
Al-Rahayfeh A, Atiewi S, Abuhussein A, Almiani M (2019) Novel approach to task scheduling and load balancing using the dominant sequence clustering and mean shift clustering algorithms. Future Internet. https://doi.org/10.3390/fi11050109
Article Google Scholar
Tyagi R, Gupta SK (2018) A survey on scheduling algorithms for parallel and distributed systems. In: Mishra A, Basu A, Tyagi V (eds) Silicon Photonics & High Performance Computing. Springer, Singapore, pp 51–64
Chapter Google Scholar
Nasa: NASA Advanced Supercomputing Division. https://www.nas.nasa.gov/publications/npb.html#url. Accessed April 2022
Feng H, Misra V, Rubenstein D (2007) PBS: a unified priority-based scheduler. SIGMETRICS Perform Eval Rev 35(1):203–214. https://doi.org/10.1145/1269899.1254906
Article Google Scholar
Zhao T, Gu J, Zhang X (2021) Two-level scheduling technology for heterogeneous clusters using analytical hierarchy processes. In: 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), pp 121–127. https://doi.org/10.1109/ICCCS52626.2021.9449223
Intel: Running an MPI Program. https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/running-an-mpi-program.html. Accessed April 2022
mpich: Using the Hydra Process Manager. https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager. Accessed April 2022
openmpi: Running MPI jobs. https://www.open-mpi.org/faq/?category=running. Accessed April 2022
Li K (2008) Optimal load distribution in nondedicated heterogeneous cluster and grid computing environments. J Syst Architect 54(1):111–123. https://doi.org/10.1016/j.sysarc.2007.04.003
Article Google Scholar
Skenteridou K, Karatza HD (2015) Job scheduling in a grid cluster. In: 2015 International Conference on Computer, Information and Telecommunication Systems (CITS), pp 1–5. https://doi.org/10.1109/CITS.2015.7297738
Ullman J (1975) Np-complete scheduling problems. J Comput Syst Sci 10:384–393
Article MATH MathSciNet Google Scholar
Cao H, Jin H, Wu X, Wu S, Shi X (2010) DAGMap: efficient and dependable scheduling of DAG workflow job in grid. J Supercomput 51(2):201–223. https://doi.org/10.1007/s11227-009-0284-7
Article Google Scholar
Ganapathi RB, Gopalakrishnan A, McGuire RW (2017) MPI process and network device affinitization for optimal HPC application performance. In: 2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI), pp 80–86. https://doi.org/10.1109/HOTI.2017.12
Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part II. Euro-Par’10. Springer, Berlin, pp 199–210. http://dl.acm.org/citation.cfm?id=1885276.1885299
Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104
Article Google Scholar
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp 97–104
Goglin B (2014) Managing the topology of heterogeneous cluster nodes with hardware locality (HWLOC). In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp 74–81. https://doi.org/10.1109/HPCSim.2014.6903671
Gropp W (2002) MPICH2: a new start for MPI implementations. In: Kranzlmüller D, Volkert J, Kacsuk P, Dongarra J (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, Berlin, pp 7–7
Hursey J, Squyres JM (2013) Advancing application process affinity experimentation: Open MPI’s lama-based affinity interface. In: Proceedings of the 20th European MPI Users’ Group Meeting. EuroMPI’13. ACM, New York, pp 163–168. https://doi.org/10.1145/2488551.2488603
Goglin B (2017) On the overhead of topology discovery for locality-aware scheduling in HPC. In: PDP2017—25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2017). IEEE Computer Society, St Petersburg, p 9. https://doi.org/10.1109/PDP.2017.35
Goglin B (2018) Memory footprint of locality information on many-core platforms. In: IEEE (ed.) 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018), Held in Conjunction with IPDPS, Vancouver, BC, Canada, p 10. https://hal.inria.fr/hal-01644087
Leng T, Ali R, Hsieh J, Mashayekhi V, Rooholamini R (2002) An empirical study of hyper-threading in high performance computing clusters
Marr DT, Binns F, Hill DL, Hinton G, Koufaty DA, Miller AJ, Upton M (2002) Hyper-threading technology architecture and microarchitecture. Intel Technol J 6(1)
openmpi: mpirun. https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php. Accessed April 2022

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Universidad Autónoma Metropolitana, Mexico City, Mexico
J. L. Quiroz-Fabián, G. Román-Alonso, M. A. Castro-García & M. Aguilar-Cornejo

Authors

J. L. Quiroz-Fabián
View author publications
You can also search for this author in PubMed Google Scholar
G. Román-Alonso
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Castro-García
View author publications
You can also search for this author in PubMed Google Scholar
M. Aguilar-Cornejo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Dr. José Luis and Dr. Graciela wrote the main manuscript text, Dr. Miguel and Dr. Manuel did the experiments, Dr. José Luis and Dr. Manuel prepared figures, and Dr. Graciela and Dr. Miguel prepared tables. All authors conceived of the presented idea, discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to J. L. Quiroz-Fabián.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Quiroz-Fabián, J.L., Román-Alonso, G., Castro-García, M.A. et al. PAARes: an efficient process allocation based on the available resources of cluster nodes. J Supercomput 79, 10423–10441 (2023). https://doi.org/10.1007/s11227-023-05085-7

Download citation

Accepted: 27 January 2023
Published: 08 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11227-023-05085-7

PAARes: an efficient process allocation based on the available resources of cluster nodes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

A Self-adaptive HPL-Based Benchmark with Dynamic Task Parallelism for Multicore Systems

Benchmarking Performance: Influence of Task Location on Cluster Throughput

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

PAARes: an efficient process allocation based on the available resources of cluster nodes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

A Self-adaptive HPL-Based Benchmark with Dynamic Task Parallelism for Multicore Systems

Benchmarking Performance: Influence of Task Location on Cluster Throughput

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation