Posted on

Hadoop – How to Set Hadoop Environment Setup

hadoop-enviornment-setup

Hadoop is supported by GNU/Linux platform and its flavors. Therefore check how to set hadoop-enviornment-setup  by tipcircle.com

HOW TO SET HADOOP ENVIRONMENT

Pre-installation Setup

Before installing Hadoop into the Linux environment, we need to set up Linux using ssh (Secure Shell). Follow the steps below:

Creating a User

Follow the steps given below to create a user:

  1. Open the root using the command “su”.
  2. Create a user from the root account using the command “useradd username”.
  3. Now you can open an existing user account using the command “su username”.

Type the following commands to create a user in Linux terminal

SSH Setup and Key Generation

Copy the public keys form id_rsa.pub to authorized_keys, and provide the owner with read and write permissions to authorized_keys file respectively.

Installing Java

Verify the existence of java in your system using the command “java -version”. The syntax of java version command is given below.

If Java is already installed then it will give following OUTPUT

If java is not installed in your system, then install java http://www.oracle.com

Downloading Hadoop

Download and extract Hadoop 2.4.1 from Apache software foundation.

You can even extract using following commands.

Hadoop Operation Modes

Hadoop cluster can be operated in one of the three supported modes:

  1. Local/Standalone Mode : After downloading Hadoop in your system, it is configured in a standalone mode and can be run as a single java process.
  2. Pseudo Distributed Mode : It is a distributed simulation on single machine. Each Hadoop daemon such as hdfs, yarn, MapReduce etc., will run as a separate java process. This mode is used by developers.
  3. Fully Distributed Mode : This mode is fully distributed with minimum two or more machines as a cluster.

We will discuss how to install Hadoop in above mode separately.

  • Installing Hadoop in Standalone Mode

Make sure that Hadoop is working fine. Just check the following command:

If everything is fine with your setup, then you should see the following result:

This means your Hadoop’s standalone mode setup is working fine.

Setting Up hadoop-enviornment-setup

Hadoop environment variables can be appended by adding following commands to ~/.bashrc file.

  • Step 1

Create temporary content files in the input directory anywhere you would like to work.

Following file will generate in your Input Directory:

  • Step 2

How to count the total number of words in all the files available in the input directory?

Step 2 will save the output in output/part-r00000 file, which you can check by using:

  • Installing Hadoop in Pseudo Distributed Mode

Setting Up hadoop-enviornment-setup

  • Step 1

Hadoop environment variables can be appended by adding following commands to ~/.bashrc file.

Apply all the changes to current system

  • Step 2

Hadoop Configuration

Find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”.

To develop Hadoop programs in java, you have to reset the java environment variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system.

 

Now

Open the core-site.xml and add the following properties in between <configuration>, </configuration> tags.

Now Open the hdfs-site.xml add the following properties in between <configuration>, </configuration> tags.

Now Open the yarn-site.xml add the following properties in between <configuration>, </configuration> tags.

Now Open the mapred-site.xml add the following properties in between <configuration>, </configuration> tags.

This file is used to specify which MapReduce framework we are using.

first copy the file from mapred-site,xml.template to mapred-site.xml file using the following command.

Then add the following properties in between the <configuration>, </configuration>tags in this file.