Posted on

How to Set up a Hadoop Single Node Cluster

SINGLE NODE CLUSTER HADOOP

Hadoop: Setting up a Single Node Cluster

Hadoop Single Node Cluster is used to quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

HOW TO SET UP HADOOP SINGLE NODE CLUSTER

How to Start Hadoop Single Node Cluster

Step 1:

Download Hadoop distribution and Unpack it. Now In the distribution to define parameters edit the file etc/hadoop/hadoop-env.sh.

Now to watch usage documentation for the hadoop script use following command:

Now you can to start your Hadoop cluster in one of the three supported modes:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

Local (Standalone) Mode to start Hadoop Single Node Cluster

Standalone mode is used for debugging and hadoop by default is set to run in this mode.

Following example copies the unpacked conf. directory to use as input and then finds and displays every match of the given regular expression. Result:

Pseudo-Distributed Mode to start Hadoop Single Node Cluster

How to Configure Hadoop in Pseudo-Distributed Mode?

You can use following:

etc/hadoop/core-site.xml:

etc/hadoop/hdfs-site.xml:

Check that you can ssh to localhost server with following command:

If you cannot ssh to  local host server use following command to execute the task:

How to Execute: Run MapReduce job locally?

Step 1

First Format the filesystem using following command:

Step 2

You have to Start NameNode daemon and DataNode daemon using following command:

Step 3

Namenode is present at http://localhost:9870/. Go here and browse web interface.

Step 4

Create a HDFS directory to execute MapReduce jobs using following command:

Step 5

Use the following command to create input files in distributed filesystem:

Step 6

View the output files on the distributed filesystem using following command:

Step 7

To Stop daemons use following command:

How to Run MapReduce job on YARN in a pseudo-distributed mode in Hadoop Single Node Cluster

Step 1

Configure parameters as follows:

etc/hadoop/mapred-site.xml:

etc/hadoop/yarn-site.xml:

Step 2

Use the following command to start ResourceManager daemon and NodeManager daemon:

Step 3

Namenode is present at http://localhost:8088/. Go here and browse web interface.

Step 4

You have Run a MapReduce job.

Step 5

Use the following command to stop daemons: