Hardware Recommendations For Hadoop
Hardware Recommendations For Hadoop
Hardware Recommendations For Hadoop
4/17/14, 10:48 AM
Overview
Hadoop is a software framework that supports large-scale distributed data analysis on commodity serv!ers. Hortonworks is a major contributor to open source initiatives (Apache Hadoop, HDFS, Pig, Hive, HBase, Zookeeper) and has extensive experience managing production level Hadoop clusters. Hortonworks recommends following the design principles that drive large, hyper-scale deployments. For a Hadoop or HBase cluster, it is critical to accurately predict the size, type, frequency, and latency of anal!ysis jobs to be run. When starting with Hadoop or HBase, begin small and gain experience by measuring actual workloads during a pilot project. This way you can easily scale the pilot environment without mak!ing any significant changes to the existing servers, software, deployment strategies, and network connec!tivity.
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm
Page 1 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
Hadoop and HBase clusters have two types of machines: masters (the HDFS NameNode, the MapRe!duce JobTracker, and the HBase Master) and slaves (the HDFS DataNodes, the MapReduce Task!Trackers, and the HBase RegionServers). The DataNodes, TaskTrackers, and HBase RegionServers are co-located or co-deployed for optimal data locality. In addition, HBase requires the use of a separate component - Zoo!Keeper - to manage the HBase cluster. Hortonworks recommends separating master and slave nodes because of the following reasons: Task workloads on the slave nodes should be isolated from the masters. Slaves nodes are frequently decommissioned for maintainance. For evaluation purpose, you can also choose to deploy Hadoop using single-node installation (all the masters and the slave processes reside on the same machine). Setting up a small cluster (of two nodes) is a very straightforward task - one node acts as both NameNode/JobTracker and the other node acts as DataNode and TaskTracker. Clusters of three or more machines typically use a dedicated NameNode/JobTracker and all the other nodes act as the slave nodes. Typically, medium to large Hadoop cluster consists of a two or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabit Ethernet (GbE) switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch). These cluster-level switches may also interconnect with other cluster-level switches or even uplink to another level of switching infrastructure.
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm
Page 2 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
One way to quickly deploy Hadoop cluster, is to opt for cloud trials or use virtual infrastructure. Horton!works makes the distribution available through Hortonworks Data Platform (HDP). HDP can be easily installed in public and private clouds using Whirr, Microsoft Azure, and Amazon Web Services. For more details, contact the Hortonworks Support Team. However, note that cloud services and virtual infrastructures are not architected for Hadoop. Hadoop and HBase deployments in this case, might experience poor performance due to virtualization and suboptimal I/O architecture.
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
Furthermore, Amdahls Law shows how resource requirements can change in grossly nonlinear ways with changing demands: a change that might be expected to reduce computation cost by 50% may instead cause a 10% change or a 90% change in net performance.
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
drives (typically eight to twelve SATA LFF drives) per server. At the time of this publication, typical capacity in production environments is around 2 TB per drive. Based on our experience, highly I/O intensive environ!ments have started using 12 x 2 TB SATA drives. The optimal balance between cost and performance is generally achieved with 7,200 RPM SATA drives. If your current or predicted storage is experiencing a growth rate you should also consider using 3 TB disks. SFF disks are being adopted in some configurations for better disk bandwidth. However, we recommend that you monitor your cluster for any potential disk failures because more disks will increase the rate of disk failures. If you do have large number of disks per server, you can use two disk controllers, so that the I/O load can be shared across multiple cores. Hortonworks strongly recommends using either SATA or SAS interconnects only. Once you have set-up an HDFS cluster using a low-cost reliable storage option, you will observe that the old data stays on the cluster indefinitely and the storage demands will grow quickly. With 12-drive sys!tems, you typically get 24 TB or 36 TB per node. Using this storage capacity in a node is only practical with Hadoop release 1.0.0 or later (because the failures are handled gracefully allowing machines to con!tinue serving from their remaining disks). It is important to note that Hadoop is storage intensive and seek efficient, but does not require fast and expensive hard drives. If your workload pattern is not I/O intensive, it is safe to add only four or six disks per node. Note that power costs are proportional to the number of disks and not to terabytes. We there!fore recommend that you add disks for storage and not for seeks. RAID vs. JBOD Using RAID on Hadoop slave machines is not recommended, because Hadoop orchestrates data redun!dancy across all the slave nodes. However, it is strongly recommended to use RAID for Hadoop master machines (especially the NameNode server). As a final consideration, we strongly recommend purchasing disk drives with good MTBF numbers, because the slave nodes in Hadoop suffer routine probabilistic failures. Your slave nodes do not need expensive support contracts that offer services like replacement of disks within two hours or less. Hadoop is designed to adapt to slave node disk failure and therefore you should treat maintenance activity for the slave nodes as an ongoing task rather than an emergency. It is a good to be able to swap out disks without taking the server out of the rack, though switching them off (briefly) is an inexpensive operation in a Hadoop cluster. Using SSD disks for master nodes can increase your costs for bulk storage at present. As the costs for these disks decrease, it could present opportunities in future. Memory sizing In a Hadoop cluster, it is critical to provide sufficient memory to keep the processors busy without swap!ping and without incurring excessive costs for non-standard motherboards. Depending on the number of cores, your slave nodes typically require 24 GB to 48 GB of RAM for Hadoop applications. For large clus!ters, this amount of memory is sufficiently provides extra RAM (approximately 4 GB) to the Hadoop framework and for your query and analysis processes (HBase and/or Map/Reduce). To detect and correct random transient errors introduced due to thermodynamic effects and cosmic rays, we strongly recommend using error correcting code (ECC) memory. Errorhttp://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm Page 6 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
correcting RAM allows you to trust the quality of your computations. Some parts (chip-kill/chip spare) have been shown to offer better protection than traditional designs, as they show less recurrence of bit errors. (See, DRAM Errors in the Wild: A Large-Scale Field Study, Schroeder et al, 2009.) If you want to retain the option of adding more memory to your servers in future, ensure there is space to do this alongside the initial memory modules. It is expensive to replace all the memory modules. Memory provisioning Memory can also be provisioned at commodity prices on low-end server motherboards. It is typical to over-provision memory. The unused RAM will be consumed either by your Hadoop applications (typically when you run more processes in parallel) or by the infrastructure (used for caching disk data to improve performance). Processors Although it is important to understand your workload pattern, for most systems we recommend using medium clock speed processors with less than two sockets. For most workloads, the extra performance per node is not cost-effective. For large clusters, use at least two quad core CPU for the slave machines. Power considerations Power is a major concern when designing Hadoop clusters. Instead of purchasing the biggest and fastest nodes, it is important to analyze the power utilization for the existing hardware. We observed huge sav!ings in pricing and power by avoiding fastest CPUs, redundant power supplies, etc. Nowadays, vendors are building machines for cloud data centers that are designed to reduce cost, power, and are light-weight. Supermicro, Dell, and HP all have such product lines for cloud providers. So, if you are buying in large volume, we recommend evaluating the strippeddown cloud servers. For slave nodes, a single power supply unit (PSU) is sufficient, but for master servers use redundant PSUs. Server designs that share PSUs across adjacent servers can offer increased reliability without increased cost. Some co-location sites bill based on the maximum-possible power budget and not the actual budget. In such a location the benefits of the power saving features of the latest CPUs are not realized completely. We therefore recommend checking the power billing options of the site in advance. Power consumption of the cluster
Electricity and cooling account for 33.33% to 50% of the equipment total life cycle cost in the modern data cen!ters.
Network This is the most challenging parameter to estimate because Hadoop workloads vary a lot. The
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm Page 7 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
key is buy!ing enough network capacity at reasonable cost so that all nodes in the cluster can communicate with each other at reasonable speeds. Large clusters typically use dual 1 GB links for all nodes in each 20-node rack and 2*10 GB interconnect links per rack going up to a pair of central switches. A good network design will consider the possibility of unacceptable congestion at critical points in the net!work under realistic loads. Generally accepted oversubscription ratios are around 4:1 at the server access layer and 2:1 between the access layer and the aggregation layer or core. Lower oversubscription ratios can be considered if higher performance is required. Additionally, we also recommend having 1 GE oversubscription between racks. It is critical to have dedicated switches for the cluster instead of trying to allocate a VC in existing switches -the load of a Hadoop cluster would impact the rest of the users of the switch. It is also equally critical to work with the networking team to ensure that the switches suit both Hadoop and their monitoring tools. Design the networking so as to retain the option of adding more racks of Hadoop/HBase servers. Getting the networking wrong can be expensive to fix. The quoted bandwidth of a switch is analogous to the miles per gallon ratings of an automobile -you are unlikely to replicate it. Deep buffering is preferable to low-latency in switches. Enabling Jumbo Frames across the cluster improves bandwidth through better checksums and possibly also provide packet integrity. Network strategy for your Hadoop clusters Analyze the ratio of network-to-computer cost. Ensure that the network cost is always around 20% of your total cost. Network costs should include your complete network, core switches, rack switches, any net!work cards needed, etc. Keep in mind that Hadoop grew up with commodity!
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm
Page 8 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
Storage options for JobTracker servers JobTracker servers do not need the RAID storage because they save their persistent state to HDFS and the JobTracker server can actually be run on a slave node with a bit of extra RAM. However, using the same hardware specification as the NameNode server provides a plan for migrating the NameNode to the same server as the JobTracker in the case of the NameNode failure and a copy of the NameNodes state can be saved to the network storage. Memory sizing The amount of memory required for the master nodes depends on the number of file system objects (files and block replicas) to be created and tracked by the NameNode. 64 GB of RAM supports approximately 100 million files. Some sites are now experimenting with 128GB of RAM, for even larger namespaces. Processors The NameNodes and their clients are very chatty. We therefore recommend providing 16 or even 24 CPU cores to handle messaging traffic for the master nodes. Network Providing multiple network ports and 10 GB bandwidth to the switch is also acceptable (if the switch can handle it).
Memory sizing
HBase Master nodes(s) are not as compute intensive as a typical RegionServer or the NameNode server. Therefore a more modest memory setting can be chosen for the HBase master. RegionServer memory requirements depend heavily on the workload characteristics of your HBase cluster. Although over provi!sioning for memory benefits all the workload patterns, with very large heap sizes Javas stop-the-world GC pauses may cause problems. In addition, when running HBase cluster with Hadoop core, you must ensure that you overhttp://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm Page 9 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
provision the memory for Hadoop MapReduce by at least 1 GB to 2 GB per task on top of the HBase memory.
Other Issues
Weight
The storage density of the latest generation of servers means that the weight of the racks needs to be taken into account. You should verify that the weight of a rack is not more than the capacity of the data!centers floor.
Support contracts
The concept to consider here is care for the master nodes, keep an eye on the slave nodes. You do not need traditional enterprise-class support contracts for the majority of the nodes in the cluster, as their fail!ures are more of a statistics issue than a crisis. The money saved in support can go into more slave nodes.
Commissioning
Hortonworks plans to cover the best practices commissioning a Hadoop cluster in a future document. For now, note that the smoke tests that come with the Hadoop cluster are a good initial test, followed by Ter!asort. Some of the major server vendors offer in factory
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm Page 10 of 12
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
commissioning of Hadoop clusters for an extra fee. This can have a direct benefit in ensuring that the cluster is working before you receive and pay for it. There is an indirect benefit in that if the terasort performance is lower on-site than in-factory, it is possible to conclude that the network is the likely culprit and so it is possible to track down the problem faster.
Conclusion
Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the perfor!mance and the total cost of ownership (TCO) associated with the environment. Additionally, the following composite system stack recommendations can help benefit organizations in the planning stages:
Slaves
One Quad
24
1 GB Ether!net all-to-all
48 24
Slaves
Dual Quad
24
24-48
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm
Hardware_Recommendations_for_Hadoop
4/17/14, 10:48 AM
workload I/O inten!sive work!load HBase clus!ters Masters All work!load pat!terns/HBase clusters
or 2 TB disks Twelve 1 TB disks Twelve 1 TB disks Four to six 2 TB disks Dual Quad 24-48
48-96
rack and 2 x 10 GB intercon!nect links per rack going to a pair of cen!tral switches.
This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
http://docs.hortonworks.com/HDP2Alpha/Hardware_Recommendations_for_Hadoop.htm
Page 12 of 12