A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks

Xuan T. Tran²,
Tien Van Do^1,2,
Csaba Rotter³ &
…
Dosam Hwang⁴

185 Accesses
7 Citations
Explore all metrics

Abstract

Yet Another Resource Negotiator (YARN) is a framework to manage and allocate resource requests from applications that process big data stored in HDFS. However, dynamic power management methods are not efficient when YARN manage applications to process big data stored in the default data layout of HDFS. In this paper, we propose a new data layout scheme that can be implemented for HDFS. A comparison between our proposal and the existing HDFS data layout scheme shows that the new data layout algorithm significantly reduces the energy consumption at the slight expense of the mean response time of jobs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Article 22 August 2018

An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster

A YARN-based Energy-Aware Scheduling Method for Big Data Applications under Deadline Constraints

Article 04 November 2022

References

Ellision, B., Minas, L.: Energy Efficiency for Information Technology: How to Reduce Power Consumption in Servers and Data Centers. Intel Press (2009)
Gandhi, A., Harchol-Balter, M., Kozuch, M.A.: Are Sleep States Effective in Data Centers?. In: Proceedings of the 2012 International Green Computing Conference (IGCC), IGCC ’12, pp. 1–10. IEEE Computer Society, Washington (2012). https://doi.org/10.1109/IGCC.2012.6322260
Shieh, W.Y., Pong, C.C.: Energy and transition-aware runtime task scheduling for multicore processors. J. Parallel Distrib. Comput. 73(9), 1225 (2013). https://doi.org/10.1016/j.jpdc.2013.05.003
Article Google Scholar
Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Futur. Gener. Comput. Syst. 28(1), 119 (2012). https://doi.org/10.1016/j.future.2011.07.001. http://www.sciencedirect.com/science/article/pii/S0167739X1100135X
Article Google Scholar
Liao, B., Yu, J., Zhang, T., Binglei, G., Hua, S., Ying, C.: Energy-efficient algorithms for distributed storage system based on block storage structure reconfiguration. J. Netw. Comput. Appl. 48(0), 71 (2015). https://doi.org/10.1016/j.jnca.2014.10.008. http://www.sciencedirect.com/science/article/pii/S1084804514002367
Article Google Scholar
Xuan, T.T., Tien, V.D., Chakka, R.: The impact of dynamic power management in computational clusters with multi-core processors. J. Sci. Ind. Res. (JSIR) 75, 339 (2016)
Google Scholar
Tang, Z., Qi, L., Cheng, Z., Li, K., Khan, S.U., Li, K.: An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. Journal of Grid Computing 14 (1), 55 (2016). https://doi.org/10.1007/s10723-015-9334-y
Article Google Scholar
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: Yet Another Resource Negotiator.. In: Proceedings of the 4th Annual Symposium on Cloud Computing. SOCC ’13, pp. 5:1–5:16. ACM, New York (2013). https://doi.org/10.1145/2523616.2523633
Konstantin, S., Hairong, K., Sanjay, R., Robert, C.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). MSST ’10, pp. 1–10. IEEE Computer Society, Washington (2010). https://doi.org/10.1109/MSST.2010.5496972
Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters.. In: Green Computing Middleware on Proceedings of the 2nd International Workshop. GCM ’11, pp. 1:1–1:6. ACM, New York (2011). https://doi.org/10.1145/2088996.2088997
Aysan, R., Down Douglas, G.: Guidelines for selecting Hadoop schedulers based on system heterogeneity. Journal of Grid Computing 12(3) (2014). https://doi.org/10.1007/s10723-014-9299-2
Goiri, Í., Le, K., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: GreenHadoop: Leveraging Green Energy in Data-processing Frameworks.. In: Proceedings of the 7th ACM European Conference on Computer Systems. EuroSys ’12, pp. 57–70. ACM, New York (2012). https://doi.org/10.1145/2168836.2168843
Mashayekhy, L., Nejad, M., Grosu, D., Lu, D., Shi, W.: Energy-Aware Scheduling of MapReduce Jobs.. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 32–39 (2014). https://doi.org/10.1109/BigData.Congress.2014.15
Song, J., He, H., Wang, Z., Yu, G., Pierson J.-M.: Modulo based data placement algorithm for energy consumption optimization of MapReduce system. Journal of Grid Computing (2016). https://doi.org/10.1007/s10723-016-9370-2
Kaushik, R.T., Bhandarkar, M.: GreenHDFS: Towards an Energy-conserving, Storage-efficient, Hybrid Hadoop Compute Cluster.. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems. HotPower’10, pp. 1–9. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1924920.1924927
Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. SIGOPS Oper. Syst. Rev. 44(1), 61 (2010). https://doi.org/10.1145/1740390.1740405
Article Google Scholar
Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proc. VLDB Endow. 3(1-2), 129 (2010). https://doi.org/10.14778/1920841.1920862
Article Google Scholar
SPEC. Fujitsu PRIMERGY rx100 s8 (intel xeon e3-1265lv3) (2013). https://www.spec.org/power_ssj2008/results/res2013q4/power_ssj2008-20131018-00643.html. Accessed 28 Feb 2017
SPEC. Acer Incorporated Acer ar380 f2 (intel xeon e5-2665) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120525-00479.html. Accessed 28 Feb 2017
SPEC. Hitachi ha8000/rs110-hhm (intel xeon e5-2470) (2012). https://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120724-00515.html. Accessed 28 Feb 2017
SPEC. Fujitsu primergy tx100 s3p (intel xeon e3-1240v2) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120726-00519.html. Accessed 28 Feb 2017
SPEC. Acer Incorporated Acer ar380 f2 (intel xeon e5-2640) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120525-00481.html. Accessed 28 Feb 2017
Verma, A., Cherkasova, L., Campbell, R.H.: Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans. Dependable Secur. Comput. 10(5), 314 (2013). https://doi.org/10.1109/TDSC.2013.14
Article Google Scholar
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. NSDI’11, pp. 295–308. USENIX Association, Berkeley (2011). http://dl.acm.org/citation.cfm?id=1972457.1972488
Do, T.V., Vu, B.T., Do, N.H., Farkas, L., Rotter, C., Tarjanyi, T.: Building Block Components to Control a Data Rate in the Apache Hadoop Compute Platform. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 23–29 (2015). https://doi.org/10.1109/ICIN.2015.7073802
Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2, 1st edn. Addison-Wesley Professional, Boston (2014)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008). https://doi.org/10.1145/1327452.1327492
Article Google Scholar

Download references

Acknowledgements

The research has been partially supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013).

This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning (2017R1A2B4009410).

The authors are grateful for anonymous reviewers’ and guest editors’ comments that helped to improve the quality of the paper.

Author information

Authors and Affiliations

Division of Knowledge and System Engineering for ICT, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Tien Van Do
Department of Networked Systems and Services, Budapest University of Technology and Economics, H-117, Magyar tudósok körútja 2., Budapest, Hungary
Xuan T. Tran & Tien Van Do
Nokia Bell Labs, Bókay János u. 36-42, 1082, Budapest, Hungary
Csaba Rotter
Department of Computer Engineering, Yeungnam University, Gyeongsan, Korea
Dosam Hwang

Authors

Xuan T. Tran
View author publications
You can also search for this author in PubMed Google Scholar
Tien Van Do
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Rotter
View author publications
You can also search for this author in PubMed Google Scholar
Dosam Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tien Van Do.

Appendix A: The Operation of HDFS and YARN in a Computing Cluster

1.1 A.1 HDFS

In a shared cluster, Hadoop Distributed File System (HDFS) [9] offers big data services for applications. The master NameNode (JVM process) manages the file metadata, and DataNodes (JVM processes) keep data in the form of data blocks/chunks. To read/write a file, an HDFS client initiates a request to the NameNode for metadata, then directly opens TCP sessions to specific DataNodes to stream data [25].

1.2 A.2 Yet Another Resource Negotiator – YARN

The Hadoop YARN framework [8, 26] orchestrates the resource management and allocation for applications with the use of three core Java software components: a central scheduler - ResourceManager (RM), per-node NodeManager (NM), and application-specific ApplicationMaster (AM).

The master ResourceManager (RM) manages a global view of the cluster resources by processing the heartbeat messages from the NodeManagers and makes decisions about resource allocation based on the negotiation with an application-specific ApplicationMaster.
A per-node NodeManager (denoted by NM_{i
j} if it is hosted in server (i,j) as illustrated in Fig. 1) holds information related to the host (e.g. hostname, node label, rack position, available memory, CPUs, etc). It is responsible for heartbeating to RM with information about available resources in the host and launching containers for job processing. A container (aka YarnChild java process) runs a task.
An application-specific ApplicationMaster (AM) (runs in a container) initiates resource requests for the application and manages all aspects of the job during its runtime.

1.3 A.3 Processing Hadoop MapReduce Applications

A Hadoop MapReduce application [27] is composed of a set of Map tasks and Reduce tasks. Map tasks read and process data blocks stored in HDFS, while Reduce tasks process Map outputs and report final.

In what follows (also Fig. 13), the cooperation of HDFS and YARN components is illustrated through the execution of a MapReduce Wordcount job that runs on YARN to count the occurrences of words in a file of two data blocks.

When the application is submitted, a job client (JVM process) is created and communicates with the ResourceManager (RM) for launching the job (action (1)) (asking the application id and requesting to launch its MRAppMaster-the ApplicationMaster of the MapReduce framework).
The RM grants a resource capacity for the MRAppMaster, then contacts with the respective NM (action (2.1)), which in turn launches an MRAppMaster container in the host (action (2.2)).
The MRAppMaster registers itself with the RM (action (3.1)), contacts the NameNode (NN) to get the input file’s metadata (map split numbers, locations of blocks) (action (3.2)).
The MRAppMaster constructs resource requests and sends to the RM. Based on the cluster available resources and scheduling mechanism, the RM responds the MRAppMaster with an allocation (action (4)). An allocation can contain one or many granted containers. The MRAppMaster iterates resource requests to the RM until all tasks (two Map tasks and one Reduce task) are computed.
With a received allocation, the MRAppMaster asks respective NM for spawning containers (action (5.1)). A called NodeManager, in turn, creates an isolated YarnChild JVM process (action (5.2)) for task execution.
Each Map task, based on the split’s metadata, opens a TCP/IP connection to a certain DataNode to read the data block (action (6)), does computation, generates and stores Map output on the local disk. The Reduce task, otherwise, reads the Map outputs and apply a reduce function over them for the final output, which is written back to HDFS without replication.
During the runtime, Map/Reduce tasks directly report their status to the MRAppMaster (action (7)).
After the Reduce task completes, the MRAppMaster deregisters itself with the RM (action (8)).

1.4 A.4 The Default HDFS Data Layout

NameNode daemon (runs inside a Java Virtual Machine) manages the filesystem tree, the metadata of all the files and directories, while DataNodes store blocks of files. Let DN_{i
j} denote the DataNode daemon in server (i,j) ($i = 1,\dots , K$, $j = 1,\dots , M(i)$), as shown in Fig. 1.

When a client submits the write request of a file (with information of type, file size and block size), the Namenode registers the file in its file system tree. For each block with size of BS Megabytes, the NameNode gives the list of RF DataNodes, where RF is the replication factor.

For RF = 3, the selection of DataNodes is as follows (see Algorithm 2):

the first replica is stored on the local node if the node is inside the cluster or on a random node if the writer runs outside (the local node does not host DataNode);
the second replica is on a random node at a different rack; and
the third is on another node in the same rack with the second one.

After receiving a list of three DataNodes for a block, the client opens a TCP pipeline to the first DataNode in the list. Data is split into 4KB pieces to be flushed from the local temporary file to the first DataNode, which will transfer data pieces to the second and then to the next. As a next piece can be transferred before the acknowledgement for the completion of the previous piece, writing replicas of the same block is roughly concurrent in a write time of t_{b
l
k} seconds.

1.5 A.5 A Locality Relaxation Algorithm for Resource Allocation in RM

In YARN, resource allocation for an application is done through a negotiation between the ApplicationMaster (AM) and the ResourceManager (RM) and decided by the RM (action (4) in the example of WordCount in Fig. 13). When AM constructs a resource request (containing the request priority, preferred node, resource capacity per container, and the number of expected containers), data locality is taken account to reduce network cost. I.e., assuming DataNode DN_{i
j} holds one or more data splits, local-data NodeManager NM_{i
j} is preferred.

The RM scheduler uses the information of a resource request and available resources for the assignment of tasks to servers as follow:

First, the RM tries to pick up the data-local node NM_{i
j} (RM calculates the available resources of node NM_{i
j} and checks if it meets the job’s resource requirement.
If failed, it starts a container on another node in the same rack i, or on any node in the cluster (off-rack) if still unsuccessfully.
If no resource is enough to start a container, a task must wait in the queue.

Let T_M and T_R denote the number of Map tasks and the number of Reduce tasks of a MapReduce job; N_{i
j} and RAM_{i
j} be the total cores and memory (GB) of NodeManager NM_{i
j}; FreeN_{i
j} (0 ≤ FreeN_{i
j} ≤ N_{i
j}) and FreeRAM_{i
j} (0 ≤ FreeRAM_{i
j} ≤ RAM_{i
j}) be the available cores and GB RAM amount on NM_{i
j} at the current instant, respectively. Assuming that a container requires resource capacity of n core and m GB RAM, the RM resource allocation is illustrated in Algorithm 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, X.T., Do, T.V., Rotter, C. et al. A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks. J Grid Computing 16, 285–298 (2018). https://doi.org/10.1007/s10723-018-9433-7

Download citation

Received: 25 May 2017
Accepted: 05 February 2018
Published: 26 February 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10723-018-9433-7