Posted on

How to Configure Hadoop Clusters ranging from a nodes to large clusters

Learn How to configure Hadoop Cluster

How to Install and Configure Hadoop Clusters ranging from a Few nodes to extremely large clusters with thousands of nodes

To play with Hadoop, you may first want to install it on a Single Machine (See How to Set up a Hadoop Single Node Cluster).

Leran to Configure Hadoop Clusters

How to Installation and Configure Hadoop Clusters

Step 1 to Installation and Configure Hadoop Clusters:

Unpack the software on all the machines in the cluster and divide up the hardware into functions.

Step 2 to Installation and Configure Hadoop Clusters:

Divide the machine in the cluster as Masters and Workers.

Masters: Master is a machine in the cluster which is designated as the NameNode and another machine as ResourceManager.

Services such as Web App Proxy Server and MapReduce Job History server are usually run on dedicated hardware.

Workers: Machines other than Masters and Dedicated machine for MapReduce are defined as Workers.

How to Configure Hadoop in Non-Secure Mode

To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.

Configuring Environment of Hadoop Daemons

To start site-specific customization of the Hadoop daemons’ process environment an Admin should use the etc/hadoop/hadoop-env.sh and optionally the etc/hadoop/mapred-env.sh and etc/hadoop/yarn-env.sh scripts.

Now to define each remote node correctly, you must specify the JAVA_HOME.

To define individual daemons tool use the following configuration options:

Configure Hadoop Clusters and Configuring Environment of Hadoop Daemons

Use Following Command to configure Namenode to use parallelGC, the following Command should be added in hadoop-env.sh:

Use the Following Command to configure HADOOP_HOME in the system-wide shell environment configuration.

Take a Example of a simple script inside /etc/profile.d:

Installation and Configure Hadoop Clusters

Configuration parameters for the Hadoop daemons

Important parameters to be specified in the given configuration files:

  • etc/hadoop/core-site.xml

Configure Hadoop Clusters and Configuring the Hadoop Daemons

  • etc/hadoop/hdfs-site.xml
  • Configurations for NameNode:

Configure Hadoop Clusters and Configurations for NameNode

  • Configurations for DataNode:

Configure Hadoop Clusters and Configuration for data Node

  • etc/hadoop/yarn-site.xml
  • Configurations for ResourceManager and NodeManager:

Configure Hadoop Clusters and configurations for ResourceManager and NodeManager

  • Configurations for ResourceManager:

Configure Hadoop Cluster & Configurations for ResourceManager Part 1

Configure Hadoop Cluster & Configurations for ResourceManager Part 2

Configure Hadoop Cluster & Configurations for ResourceManager Part3

  • Configurations for NodeManager:

Configure Hadoop Cluster & Configurations for NodeManager Configure Hadoop Cluster

Configure Hadoop Cluster part2

How to Operate the Hadoop Cluster

Step 1:

Start Both HDFS and YARN cluster.

FOR HDFS

Step 2:

Format a new distributed filesystem as HDFS. Use the following statement:

Step 3:

Use following command to Start the HDFS NameNode:

Step 4:

Use following command to Start a HDFS DataNode:

Step 5:

For etc/hadoop/workers and ssh (See How to Set up a Hadoop Single Node Cluster)

FOR YARN

Step 1:

Use the following command to Start YARN

Step 2:

Use the following statement to start a NodeManager on each designated host as YARN:

Step 3:

To Start a standalone WebAppProxy server use the following command:

Step 4:

For etc/hadoop/workers and ssh (See How to Set up a Hadoop Single Node Cluster)

Step 5:

To Start the MapReduce JobHistory Server use following command: