Posted on

Hadoop – What is HDFS (Hadoop Distributed File System)

hadoop - HDFS

HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers.

Hadoop Distributed File System has scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.

HDFS

HDFS has many goals. Here are some of the most notable:

  • Hadoop Distributed File System is very good in detecting faults and applying quick solution to recover over the fault.
  • It can access data via MapReduce streaming.
  • Scalability to reliably store and can process large amounts of data.
  • Economy by distributing data.
  • Reliability by automatically maintaining multiple copies of data.

Application interfaces into HDFS

HDFS provides a native Java application programming interface (API) and a native C-language wrapper for the Java API. You can access HDFS in many different ways.

Following  applications that can interface with HDFS:

  1. FileSystem (FS) shell.
  2. DFS Admin.
  3. fsck.
  4. Name nodes and data nodes.

HDFS architecture

An HDFS cluster consists of a single node, known as a NameNode, that manages the file system namespace and regulates client access to files. In addition, data nodes (DataNodes) store data as blocks within files.

HDFS ARCHITECTURE

 

Name nodes and data nodes

  • Node within Hadoop Distributed File System manages file system namespace operations like opening, closing, and renaming files.
  • To handle read and write requests from Hadoop Distributed File System clients node also maps data blocks to data nodes.

What is the Relationships between name nodes and data nodes?

  • Data nodes continuously loop, asking the name node for instructions.
  • The name node maintains and administers changes to the file system namespace.
  • Each data node maintains an open server socket so that client code can read data.
  • The host for this server socket is known as name node, which provides the information to interested clients.

Overview: File creation process in HDFS

Hadoop Distributed File System is built using the Java programming language, so any device which support Java can use HDFS to create a file.