Big Data University- Hadoop 101

Hadoop
Big Data University- Hadoop 101 >> Lesson Transcripts/Labs >> Unit_2_HDFS_Command_Line


Welcome to HDFS command line interface. In this presentation, I will cover the general usage of the HDFS command line interface and commands specific to HDFS. Other commands should be familiar to anyone with UNIX experience and will not be covered.

The HDFS can be manipulated through a Java API or through a command line interface. All commands for manipulating HDFS through Hadoop’s command line interface begin with hdfs, a space, and dfs. This is the file system shell. This is followed by the command name as an argument to hdfs dfs. These commands start with a dash. For example, the ls command for listing a directory is a common UNIX command and is preceded with a dash. As on UNIX systems, ls can take a path as an argument. In this example, the path is the current directory, represented by a single dot.

As we saw for the ls command, the file system shell commands can take paths as arguments. These paths can be expressed in the form of uniform resource identifiers or URIs. The URI format consists of a scheme, an authority, and path. There are multiple schemes supported. The local file system has a scheme of “file”. HDFS has a scheme called “hdfs”. For example, let us say you wish to copy a file called “myfile.txt” from your local filesystem to an HDFS file system on the localhost. You can do this by issuing the command shown. The cp command takes a URI for the source and a URI for the destination. The scheme and the authority do not always need to be specified. Instead you may rely on their default values. These defaults can be overridden by specifying them in a file named core-site.xml in the conf directory of your Hadoop installation. HDFS is not a fully POSIX compliant file system, but it supports many of the commands. The HDFS commands are mostly easily-recognized UNIX commands like cat and chmod. There are also a few commands that are specific to HDFS such as copyFromLocal. We’ll examine a few of these.

copyFromLocal and put are two HDFS-specific commands that do the same thing - copy files from the local filesystem to a location on another filesystem. Their opposite is the copyToLocal command which can also be referred to as get. This command copies files out of the filesystem you specify and into the local filesystem. getMerge is an enhanced form of get that can merge the files from multiple locations into a single local file. setRep lets you override the default level of replication. You can do this for one file or, with the -R option, to an entire tree.

This command returns immediately after requesting the new replication level. If you want the command to block until the job is done, pass the -w option. IBM, with BigInsights, provides the Ambari Console as graphical way to work with HDFS. The services tab provides a simple way to view the status of the Hadoop components. Create a file view to browse and work with directories and files.


About Me

I'm a data leader working to advance data-driven cultures by wrangling disparate data sources and empowering end users to uncover key insights that tell a bigger story. LEARN MORE >>



comments powered by Disqus