Hadoop Commands Guide
All hadoop commands are invoked by the bin/hadoop script. Running the hadoop script without any arguments prints the description for all commands.
Hadoop – Learn Free HDFS Files Operations : What is HDFS operations.
Today here we will discuss all the common used hadoop commands. Lets start:
Overview of hadoop commands
All of the Hadoop commands and subprojects follow the same basic structure: shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]
- Shellcommand: The command of the project being invoked.
- SHELL_OPTIONS: Options that the shell processes prior to executing Java.
- COMMAND: Action to perform.
- GENERIC_OPTIONS: The common set of options supported by multiple commands.
All of the shell commands will accept a common set of options. Following are commonly used shell option within hadoop commands.
- –buildpaths: Enables developer versions of jars.
- –config confdir: Default Configuration directory is $HADOOP_HOME/etc/hadoop. This command overwrites this directory.
- –debug: Enables shell level configuration debugging information.
- –help: Shell script usage information.
- –hostnames: When --workers is used, override the workers file with a space delimited list of hostnames where to execute a multi-host subcommand.
- –hosts: When --workers is used, override the workers file with another file that contains a list of hostnames where to execute a multi-host subcommand.
- –workers: If possible, execute this command on all hosts in the workers file.
The Command executed by Shell commands are distributed into two parts.
- User Commands: Commands useful for users of a hadoop cluster.
- Administration Commands: Commands useful for administrators of a hadoop cluster.
User Hadoop Commands
Hadoop archives are special format archives. A Hadoop archive maps to a file system directory. “.har” is the extension for archives.
Below is link to our post ARCHIVES GUIDE :
There are two commands_option in checknative [-a] [-h].
- “-a“: Check all libraries are available.
- “-h“: print help.
Below is link to our seperate post on Native Hadoop Library:
There are three commands_option in classpath [–glob |–jar <path> |-h |–help].
- –glob: expand wildcards.
- –jarpath: write classpath as manifest in jar named path.
- -h, –help: print help.
Prints the class path needed to get the Hadoop jar and the required libraries. If called without arguments, then prints the classpath set up by the command scripts, which is likely to contain wildcards in the classpath entries. Additional options print the classpath after wildcard expansion or write the classpath into the manifest of a jar file. The latter is useful in environments where wildcards cannot be used and the expanded classpath exceeds the maximum supported command line length.
- create alias[-provider-provider-path][-strict] [-value credential-value]: Prompts the user for a credential to be stored as the given alias. The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password. Use -value flag to supply the credential value instead of being prompted.
- delete alias [-provider provider] [-strict] [-f]:The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password. The command asks for confirmation unless -f is specified.
- -f: List of objects to change.
- -i: Ignore failures.
- -log: Directory to log output.
We have covered DistCp in seperate Post.
- print: Print out the fields in the tokens contained in filename.
- get URL: Fetch a token from service at URL and place it in filename.
- append: Append the contents of the first N filenames onto the last filename.
- remove -alias alias: From each file specified, remove the tokens matching alias and write out each file using specified format.
- cancel -alias alias: Just like remove, except the tokens are also cancelled using the service specified in the token object.
- renew -alias alias: For each file specified, renew the tokens matching alias and write out each file using specified format.
Runs a jar file.
Print the computed java.library.path.
Convert the named principal via the auth_to_local rules to the Hadoop user name.
Administration Hadoop Commands
- -getlevel host:port classname [-protocol (http|https)}: Prints the log level of the log identified by a qualified classname, in the daemon running at host:port. The -protocol flag specifies the protocol for connection.
- -setlevel host:port classname level [-protocol (http|https)]: Sets the log level of the log identified by a qualified classname, in the daemon running at host:port. The -protocol flag specifies the protocol for connection.
This command only works by sending a HTTP/HTTPS request to the daemon’s internal Jetty servlet, following are the daemons that this command supports:
- name node
- secondary name node
- data node
- journal node
- resource manager
- node manager
- Timeline server
- etc/hadoop/hadoop-env.sh: This file stores the global settings used by all Hadoop shell commands.
- etc/hadoop/hadoop-user-functions.sh: This file allows for advanced users to override some shell functionality.
- ~/.hadooprc: This stores the personal environment for an individual user. It is processed after the hadoop-env.sh and hadoop-user-functions.sh files and can contain the same settings.