Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters
<p>Workflow of the proposed AMS-ERA process for optimal job scheduling.</p> "> Figure 2
<p>dAHP 3-level criteria for node selection.</p> "> Figure 3
<p>Cluster configuration with 10 worker nodes placed in two racks with a master node.</p> "> Figure 4
<p>The worker nodes profiling based on their CPU, mem, and disk resource utilization for workload <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math>. Intra-node similarity reveals the performance of nodes clustered in high-, medium-, and low-performance nodes for wordcount jobs. The size of the bubble reveals the percentage of disk utilization. (<b>a</b>) Wordcount and (<b>b</b>) terasort.</p> "> Figure 5
<p>(<b>a</b>) A comparison of the execution times in seconds for wordcount jobs with workloads {<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math>} between Hadoop-Fair, FOG, IDaPS, and AMS-ERA. (<b>b</b>) depicts the execution times for terasort with the same workloads.</p> "> Figure 5 Cont.
<p>(<b>a</b>) A comparison of the execution times in seconds for wordcount jobs with workloads {<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math>} between Hadoop-Fair, FOG, IDaPS, and AMS-ERA. (<b>b</b>) depicts the execution times for terasort with the same workloads.</p> "> Figure 6
<p>(<b>a</b>) A comparison of the performance percentage of AMS-ERA execution times against Hadoop-Fair, FOG, and IDaPS for wordcount jobs with workloads {<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math>} (<b>b</b>) Performance percentage of AMS-ERA against Hadoop-Fair, FOG, and IDaPS for terasort.</p> "> Figure 7
<p>A comparison local task allocation rate (percentage) for AMS-ERA, Hadoop-Fair, FOG, IDaPS for (<b>a</b>) wordcount jobs with workloads {<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math>} and (<b>b</b>) for terasort jobs.</p> "> Figure 8
<p>A comparison of the percentage of CPU, mem, and disk resource utilization for (<b>a</b>) a wordcount workload <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>l</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> and (<b>b</b>) a terasort workload.</p> ">
Abstract
:1. Introduction
- We introduce the AMS-ERA approach to optimize resource allocation in frugal Hadoop clusters with Single-Board Computers (SBCs). By considering CPU, memory, and disk requirements for jobs, and aligning these with available resources, AMS-ERA enhances resource allocation to improve performance and efficiency.
- The proposed method involves profiling available resources in the cluster using K-means clustering and dynamically placing jobs based on a refined Analytical Hierarchy Process (AHP). This dynamic placement ensures optimal resource utilization and load balancing in heterogeneous clusters.
- We construct a heterogeneous 11-node Hadoop cluster using popular SBC devices to validate our approach. The work demonstrates that AMS-ERA achieves significant performance improvements compared to other scheduling strategies like Hadoop-Fair, FOG, and IDaPS using various IO-intensive and CPU-intensive Hadoop microbenchmarks such as terasort and wordcount.
2. Related Works
2.1. SBC in Cloud, Edge Clusters
2.2. Hadoop YARN Scheduling Challenges in Resource-Constrained Clusters
3. Adaptive Multi-Criteria Selection for Efficient Resource Allocation
3.1. Motivation
3.2. Problem Definition
3.3. K-Means with Elbow Clustering
Algorithm 1: K-means clustering with elbow | |
1: | Start: Obtain RM listing for n nodes. |
2: | apply Min–Max normalization to rescale the dataset |
3: | |
4: | is a datapoint |
5: | for K-means using Equations (5)–(7) |
6: | foreach} |
7: | |
8: | to the closest centroid |
recalculate each centroid as in Equation (6) | |
9: | end for |
11: | return |
12: | end |
3.4. Dynamic AHP-Based Job Scoring
3.5. Efficient Resource Allocation
Algorithm 2: AMS-ERA resource allocation | |
1: | Start: Obtain RM listing for n nodes |
2: | |
3: | |
4: | foreach |
5: | for m |
6: | |
7: | if then continue |
8: | else re-compute |
9: | end if |
10: | determine M matrix |
11: | |
12: | end for |
13: | foreach |
14: | |
15: | |
16: | end for |
4. Experimental Evaluation
4.1. Experiment Setup
4.2. Generating Job Workloads for Validation
- The Hadoop wordcount benchmark is a CPU-intensive task because it involves processing large volumes of text data to count the occurrences of each word. This process requires significant computational resources, particularly for tasks like tokenization, sorting, and aggregation, which are essential steps in the word-counting process. As a result, the benchmark primarily stresses the CPU’s processing capabilities rather than other system resources such as memory or disk I/O. These 10 jobs are posted to the cluster simultaneously.
- The Hadoop terasort benchmark is an IO-intensive task because it involves sorting a large volume of data. This process requires substantial input/output (IO) operations as it reads and writes data to and from storage extensively during the sorting process. The benchmark stresses the system’s IO subsystem, including disk read and write speeds, as well as network bandwidth if the data are distributed across multiple nodes in a cluster.
4.3. Node Clustering Based on Intra-Node Similarity Metrics
4.4. Workload Execution Time
4.5. Local Job Placement and Resource Utilization
4.6. Cost of Frugal Hadoop Cluster Setup
5. Conclusions and Future Work
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Awaysheh, F.M.; Tommasini, R.; Awad, A. Big Data Analytics from the Rich Cloud to the Frugal Edge. In Proceedings of the 2023 IEEE International Conference on Edge Computing and Communications (EDGE), Chicago, IL, USA, 2–8 July 2023; pp. 319–329. [Google Scholar]
- Qin, W. How to Unleash Frugal Innovation through Internet of Things and Artificial Intelligence: Moderating Role of Entrepreneurial Knowledge and Future Challenges. Technol. Forecast. Soc. Chang. 2024, 202, 123286. [Google Scholar] [CrossRef]
- Neto, A.J.A.; Neto, J.A.C.; Moreno, E.D. The Development of a Low-Cost Big Data Cluster Using Apache Hadoop and Raspberry Pi. A Complete Guide. Comput. Electr. Eng. 2022, 104, 108403. [Google Scholar] [CrossRef]
- Vanderbauwhede, W. Frugal Computing—On the Need for Low-Carbon and Sustainable Computing and the Path towards Zero-Carbon Computing. arXiv 2023, arXiv:2303.06642. [Google Scholar]
- Chandramouli, H.; Shwetha, K.S. Integrated Data, Task and Resource Management to Speed Up Processing Small Files in Hadoop Cluster. Int. J. Intell. Eng. Syst. 2024, 17, 572–584. [Google Scholar] [CrossRef]
- Han, T.; Yu, W. A Review of Hadoop Resource Scheduling Research. In Proceedings of the 2023 8th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Okinawa, Japan, 23–25 November 2023; pp. 26–30. [Google Scholar]
- Jeyaraj, R.; Paul, A. Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization. IEEE Access 2022, 10, 55842–55855. [Google Scholar] [CrossRef]
- Saba, T.; Rehman, A.; Haseeb, K.; Alam, T.; Jeon, G. Cloud-Edge Load Balancing Distributed Protocol for IoE Services Using Swarm Intelligence. Clust. Comput. 2023, 26, 2921–2931. [Google Scholar] [CrossRef]
- Guo, Z.; Fox, G. Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), Ottawa, ON, Canada, 13–16 May 2012; pp. 714–716. [Google Scholar]
- Bae, M.; Yeo, S.; Park, G.; Oh, S. Novel Data-placement Scheme for Improving the Data Locality of Hadoop in Heterogeneous Environments. Concurr. Comput. 2021, 33, e5752. [Google Scholar] [CrossRef]
- Bawankule, K.L.; Dewang, R.K.; Singh, A.K. Historical Data Based Approach for Straggler Avoidance in a Heterogeneous Hadoop Cluster. J. Ambient Intell. Humaniz. Comput. 2021, 12, 9573–9589. [Google Scholar] [CrossRef]
- Thakkar, H.K.; Sahoo, P.K.; Veeravalli, B. RENDA: Resource and Network Aware Data Placement Algorithm for Periodic Workloads in Cloud. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2906–2920. [Google Scholar] [CrossRef]
- Ghazali, R.; Adabi, S.; Rezaee, A.; Down, D.G.; Movaghar, A. CLQLMRS: Improving Cache Locality in MapReduce Job Scheduling Using Q-Learning. J. Cloud Comput. 2022, 11, 45. [Google Scholar] [CrossRef]
- Ding, F.; Ma, M. Data Locality-Aware and QoS-Aware Dynamic Cloud Workflow Scheduling in Hadoop for Heterogeneous Environment. Int. J. Web Grid Serv. 2023, 19, 113–135. [Google Scholar] [CrossRef]
- Postoaca, A.-V.; Negru, C.; Pop, F. Deadline-Aware Scheduling in Cloud-Fog-Edge Systems. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, Australia, 11–14 May 2020; pp. 691–698. [Google Scholar]
- Vengadeswaran, S.; Balasundaram, S.R.; Dhavakumar, P. IDaPS—Improved Data-Locality Aware Data Placement Strategy Based on Markov Clustering to Enhance MapReduce Performance on Hadoop. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 101973. [Google Scholar] [CrossRef]
- Adnan, A.; Tahir, Z.; Asis, M.A. Performance Evaluation of Single Board Computer for Hadoop Distributed File System (HDFS). In Proceedings of the 2019 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 24–25 July 2019; pp. 624–627. [Google Scholar]
- Qureshi, B.; Koubaa, A. On Energy Efficiency and Performance Evaluation of Single Board Computer Based Clusters: A Hadoop Case Study. Electronics 2019, 8, 182. [Google Scholar] [CrossRef]
- Fati, S.M.; Jaradat, A.K.; Abunadi, I.; Mohammed, A.S. Modelling Virtual Machine Workload in Heterogeneous Cloud Computing Platforms. J. Inf. Technol. Res. 2020, 13, 156–170. [Google Scholar] [CrossRef]
- Sebbio, S.; Morabito, G.; Catalfamo, A.; Carnevale, L.; Fazio, M. Federated Learning on Raspberry Pi 4: A Comprehensive Power Consumption Analysis. In Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, Taormina, Italy, 4–7 December 2023; ACM: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Shwe, T.; Aritsugi, M. Optimizing Data Processing: A Comparative Study of Big Data Platforms in Edge, Fog, and Cloud Layers. Appl. Sci. 2024, 14, 452. [Google Scholar] [CrossRef]
- Raspberry Pi. Available online: https://www.raspberrypi.com/ (accessed on 7 May 2024).
- Lee, E.; Oh, H.; Park, D. Big Data Processing on Single Board Computer Clusters: Exploring Challenges and Possibilities. IEEE Access 2021, 9, 142551–142565. [Google Scholar] [CrossRef]
- Lambropoulos, G.; Mitropoulos, S.; Douligeris, C.; Maglaras, L. Implementing Virtualization on Single-Board Computers: A Case Study on Edge Computing. Computers 2024, 13, 54. [Google Scholar] [CrossRef]
- Mills, J.; Hu, J.; Min, G. Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT. IEEE Internet Things J. 2020, 7, 5986–5994. [Google Scholar] [CrossRef]
- Krpic, Z.; Loina, L.; Galba, T. Evaluating Performance of SBC Clusters for HPC Workloads. In Proceedings of the 2022 International Conference on Smart Systems and Technologies (SST), Osijek, Croatia, 19–21 October 2022; pp. 173–178. [Google Scholar]
- Lim, S.; Park, D. Improving Hadoop Mapreduce Performance on Heterogeneous Single Board Computer Clusters. SSRN Preprint 2023. [Google Scholar] [CrossRef]
- Srinivasan, K.; Chang, C.Y.; Huang, C.H.; Chang, M.H.; Sharma, A.; Ankur, A. An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance. J. Inf. Process. Syst. 2018, 14, 989–1009. [Google Scholar] [CrossRef]
- Fu, W.; Wang, L. Load Balancing Algorithms for Hadoop Cluster in Unbalanced Environment. Comput. Intell. Neurosci. 2022, 2022, 1545024. [Google Scholar] [CrossRef]
- Yao, Y.; Gao, H.; Wang, J.; Sheng, B.; Mi, N. New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters. IEEE Trans. Cloud Comput. 2021, 9, 1158–1171. [Google Scholar] [CrossRef]
- Javanmardi, A.K.; Yaghoubyan, S.H.; Bagherifard, K.; Nejatian, S.; Parvin, H. A Unit-Based, Cost-Efficient Scheduler for Heterogeneous Hadoop Systems. J. Supercomput. 2021, 77, 1–22. [Google Scholar] [CrossRef]
- Ullah, I.; Khan, M.S.; Amir, M.; Kim, J.; Kim, S.M. LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters. IEEE Access 2020, 8, 111751–111762. [Google Scholar] [CrossRef]
- Zhou, R.; Li, Z.; Wu, C. An Efficient Online Placement Scheme for Cloud Container Clusters. IEEE J. Sel. Areas Commun. 2019, 37, 1046–1058. [Google Scholar] [CrossRef]
- Zhou, Z.; Shojafar, M.; Alazab, M.; Abawajy, J.; Li, F. AFED-EF: An Energy-Efficient VM Allocation Algorithm for IoT Applications in a Cloud Data Center. IEEE Trans. Green Commun. Netw. 2021, 5, 658–669. [Google Scholar] [CrossRef]
- Zhou, Z.; Abawajy, J.; Chowdhury, M.; Hu, Z.; Li, K.; Cheng, H.; Alelaiwi, A.A.; Li, F. Minimizing SLA Violation and Power Consumption in Cloud Data Centers Using Adaptive Energy-Aware Algorithms. Future Gener. Comput. Syst. 2018, 86, 836–850. [Google Scholar] [CrossRef]
- Banerjee, P.; Roy, S.; Sinha, A.; Hassan, M.; Burje, S.; Agrawal, A.; Bairagi, A.K.; Alshathri, S.; El-Shafai, W. MTD-DHJS: Makespan-Optimized Task Scheduling Algorithm for Cloud Computing With Dynamic Computational Time Prediction. IEEE Access 2023, 11, 105578–105618. [Google Scholar] [CrossRef]
- Zhang, L. Research on K-Means Clustering Algorithm Based on MapReduce Distributed Programming Framework. Procedia Comput. Sci. 2023, 228, 262–270. [Google Scholar] [CrossRef]
- Postoaca, A.V.; Pop, F.; Prodan, R. H-Fair: Asymptotic Scheduling of Heavy Workloads in Heterogeneous Data Centers. In Proceedings of the 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 1–4 May 2018; pp. 366–369. [Google Scholar]
- Guo, T.; Bahsoon, R.; Chen, T.; Elhabbash, A.; Samreen, F.; Elkhatib, Y. Cloud Instance Selection Using Parallel K-Means and AHP. In Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion, Auckland, New Zealand, 2–5 December 2019; ACM: New York, NY, USA, 2019; pp. 71–76. [Google Scholar]
- Odroid Xu4. Available online: https://www.hardkernel.com/shop/odroid-xu4-special-price/ (accessed on 7 May 2024).
- RockPro64. Available online: https://pine64.com/product/rockpro64-4gb-single-board-computer/ (accessed on 7 May 2024).
- Herodotou, H.; Lim, H.; Luo, G.; Borisov, N.; Dong, L.; Cetin, F.; Babu, S. Starfish: A Self-Tuning System for Big Data Analytics. In Proceedings of the CIDR 2011—5th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 9–12 January 2011; Conference Proceedings. pp. 261–272. [Google Scholar]
- Syakur, M.A.; Khotimah, B.K.; Rochman, E.M.S.; Satoto, B.D. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. IOP Conf. Ser. Mater. Sci. Eng. 2018, 336, 012017. [Google Scholar] [CrossRef]
- Kim, H.-J.; Baek, J.-W.; Chung, K. Associative Knowledge Graph Using Fuzzy Clustering and Min-Max Normalization in Video Contents. IEEE Access 2021, 9, 74802–74816. [Google Scholar] [CrossRef]
- Singh, A.; Das, A.; Bera, U.K.; Lee, G.M. Prediction of Transportation Costs Using Trapezoidal Neutrosophic Fuzzy Analytic Hierarchy Process and Artificial Neural Networks. IEEE Access 2021, 9, 103497–103512. [Google Scholar] [CrossRef]
Category | Representative Works | Resource Awareness | Testbed/Evaluation Criteria |
---|---|---|---|
Task placement | [9]: Task scheduling based on network heterogeneity | net | X |
[12]: RENDA—Estimation of node performance for task placement | CPU, mem | Servers/Benchmarks | |
[36]: Dynamic Heuristic Johnson Sequencing technique for task placement | X | Simulation | |
[14]: DQ-DCWS—Optimization of workflow using dynamic programming | disk | Simulation | |
[30]: HASTE: Resource management for improved task placement | CPU, mem | Servers/Benchmarks | |
Data locality | [10]: Varying Block size for improved data locality | disk | Servers/Benchmarks |
[27]: Resource-aware task placement in heterogeneous SBC clusters | CPU, mem, disk | SBC cluster/Benchmarks | |
[13]: CLQLMRS—Reinforcement learning improves data locality | disk, mem | Servers/Benchmarks | |
Load balancing | [11]: Historical data-based task placement in heterogeneous clusters | CPU, mem | Servers/Benchmarks |
[15]: Deadline-aware task scheduling based on available resources | X | Servers/Custom dataset | |
[29]: Dynamic feedback fair scheduling with load balancing | X | Servers/Benchmarks | |
Improved parallelism | [16]: Markov clustering-based job scoring for improved task allocation | disk | Servers/Benchmarks |
[31]: Optimizing DAG workflows for the cost of task execution | disk | Simulation | |
[32]: LSTPD—Deadline-constrained response times for MapReduce jobs | X | Servers/Benchmarks | |
Improved task selection | [37]: Task selection using K-means clustering technique | X | X |
[38]: H-Fair; improved Fair scheduler for heavy workloads in Hadoop | X | Simulation | |
[39]: Improved MapReduce workflow using K-means clustering | X | Servers/Custom dataset | |
Energy efficiency | [33]: Efficient online placement in cloud containers | X | Simulation |
[34]: AFED-EF—Classification of resources based on energy efficiency | CPU, mem, disk, net | Server/Real workload | |
[35]: Energy-efficient scheduling based on resource constraints. | CPU | Servers/Custom dataset |
SBC Device | CPU | Memory | Storage with Read MB/s | Price (USD) Incl. Storage |
---|---|---|---|---|
Raspberry Pi3 [22] | 1.4 GHz 64-bit quad-core ARM Cortex-A53 | 1 GB LPDDR3-SDRAM | 32 GB SD Card 120 MB/s | 38 |
Odroid Xu4 [40] | Exynos5 Octa ARM Cortex-A15 Quad 2 GHz and Cortex-A7 Quad 1.3 GHz | 2 GB DDR3 | 32 GB SD Card 120 MB/s | 56 |
RockPro64 [41] | 1.8 GHz Hexa Rockchip RK3399 ARM Cortex A72 and 1.4 GHz Quad Cortex-A53 | 4 GB LPDDR4-SDRAM | 64 GB SD Card 140 MB/s | 84 |
Raspberry Pi4 × 4 | 1.8 GHz Quad core ARM Cortex-A72 | 4 GB LPDDR4-SDRAM | 64 GB SD Card 140 MB/s | 59 |
Raspberry Pi4 × 8 | 1.8 GHz Quad core ARM Cortex-A72 | 8 GB LPDDR4-SDRAM | 128 GB SD Card 190 MB/s | 84 |
Raspberry Pi5 | 2.4 GHz Quad-core 64-bit ARM Cortex A76 | 8 GB LPDDR4X-SDRAM | 128 GB SD Card 190 MB/s | 98 |
Symbol | Description |
---|---|
Set J of Jobs consisting of k number of jobs | |
Cluster C consisting of x number of nodes | |
CPU utilization of ith node | |
Memory utilization of ith node | |
Disk utilization of ith node | |
A data structure detailing the available resources in the cluster | |
Optimal value for centroids in K-means algorithm | |
Centroid in K-means clustering algorithm | |
Cji | Pairwise decision criteria matrix |
CI | Consistency Index |
Maximal eigenvalue | |
Weight of CPU, mem, and disk matrices | |
Normalized scores | |
Workload I |
Cji | CPU | mem | disk |
---|---|---|---|
CPU | 1 | 1 | 2 |
mem | 1 | 1 | 2 |
disk | ½ | ½ | 1 |
CPU | ||||||
---|---|---|---|---|---|---|
1 | 0.25 | 0.33 | 0.33 | 0.25 | 0.25 | |
4 | 1 | 3 | 2 | 2 | 1 | |
3 | 0.33 | 1 | 1 | 1 | 0.5 | |
3 | 0.5 | 1 | 1 | 1 | 0.5 | |
4 | 0.5 | 1 | 1 | 1 | 0.5 | |
4 | 1 | 2 | 2 | 2 | 1 |
MEM | ||||||
---|---|---|---|---|---|---|
1 | 0.5 | 0.33 | 0.33 | 0.25 | 0.25 | |
2 | 1 | 0.5 | 0.5 | 0.33 | 0.33 | |
3 | 2 | 1 | 1 | 0.5 | 0.5 | |
3 | 2 | 1 | 1 | 0.5 | 0.5 | |
4 | 3 | 2 | 2 | 1 | 0.5 | |
4 | 3 | 2 | 2 | 2 | 1 |
DISK | ||||||
---|---|---|---|---|---|---|
1 | 1 | 0.5 | 0.5 | 0.33 | 0.33 | |
1 | 1 | 0.5 | 0.5 | 0.33 | 0.33 | |
2 | 2 | 1 | 1 | 0.5 | 0.5 | |
2 | 2 | 1 | 1 | 0.5 | 0.5 | |
3 | 3 | 2 | 2 | 1 | 1 | |
3 | 3 | 2 | 2 | 1 | 1 |
M | Weight | ||||||
---|---|---|---|---|---|---|---|
CPU | 0.4 | 0.051 | 0.278 | 0.130 | 0.138 | 0.146 | 0.258 |
mem | 0.4 | 0.056 | 0.088 | 0.152 | 0.152 | 0.244 | 0.307 |
disk | 0.2 | 0.082 | 0.082 | 0.149 | 0.149 | 0.270 | 0.270 |
score | - | 0.059 | 0.163 | 0.143 | 0.146 | 0.210 | 0.280 |
Worker Node | Rack | SBC Device | CPU Cores | Memory | Storage with Read MB/s | Operating System |
---|---|---|---|---|---|---|
W1 | 1 | RaspberryPi3 | 4 (1.4 GHz) | 1 GB | 32 GB SD Card 120 MB/s | RaspberryPi OS Lite 11 |
W2 | 1 | RockPro64 | 6 (2 × 1.8 GHz) (4 × 1.4 GHz) | 4 GB | 64 GB SD Card 140 MB/s | armbian 23.1 Jammy Gnome |
W3 | 1 | RaspberryPi4 | 4 (1.8 GHz) | 4 GB | 64 GB SD Card 140 MB/s | RaspberryPi OS Lite 11 |
W4 | 1 | RockPro64 | 6 (2 × 1.8 GHz) (4 × 1.4 GHz) | 4 GB | 32 GB SD Card 120 MB/s | armbian 23.1 Jammy Gnome |
W5 | 1 | Odroid Xu4 | 8 (4 × 2.0 GHz) (4 × 1.3 GHz) | 2 GB | 64 GB SD Card 140 MB/s | Debian Bullseye 11 |
W6 | 2 | RaspberryPi5 | 4 (2.4 GHz) | 8 GB | 128 GB SD Card 190 MB/s | RaspberryPi OS Lite 11 |
W7 | 2 | Odroid Xu4 | 8 (4 × 2.0 GHz) (4 × 1.3 GHz) | 2 GB | 32 GB SD Card 120 MB/s | Debian Bullseye 11 |
W8 | 2 | RaspberryPi3 | 4 (1.4 GHz) | 1 GB | 64 GB SD Card 140 MB/s | RaspberryPi OS Lite 11 |
W9 | 2 | RaspberryPi5 | 4 (2.4 GHz) | 8 GB | 64 GB SD Card 140 MB/s | RaspberryPi OS Lite 11 |
W10 | 2 | RaspberryPi4 | 4 (1.8 GHz) | 4 GB | 128 GB SD Card 190 MB/s | RaspberryPi OS Lite 11 |
Mapred-site.xml | Value |
---|---|
yarn.app.mapreduce.am.resource.mb | 852 |
mapreduce.map.cpu.vcores | 2 |
mapreduce.reduce.cpu.vcores | 1 |
mapreduce.map.memory.mb | 852 |
mapreduce.reduce.memory.mb | 852 |
YARN-site.xml | Value |
yarn.nodemanager.resource.memory-mb | 1024 |
yarn.nodemanager.resource.cpu-vcores | 1 |
yarn.scheduler.maximum-allocation-mb | 1024 |
yarn.scheduler.maximum-allocation-vcores | 8 |
yarn.nodemanager.vmem-pmem-ratio | 2.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qureshi, B. Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters. Electronics 2024, 13, 1836. https://doi.org/10.3390/electronics13101836
Qureshi B. Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters. Electronics. 2024; 13(10):1836. https://doi.org/10.3390/electronics13101836
Chicago/Turabian StyleQureshi, Basit. 2024. "Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters" Electronics 13, no. 10: 1836. https://doi.org/10.3390/electronics13101836
APA StyleQureshi, B. (2024). Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters. Electronics, 13(10), 1836. https://doi.org/10.3390/electronics13101836