all versions are available here:

http://mirror.tcpdiag.net/apache/hadoop/common/

I picked 2.4.1 (currently stable version)

Following instructions on official site:

http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup.html

1: pretty smooth until I saw:

In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Note, there is no folder named "conf". By comparing it to the install instructions of 2.5.0, I found the correct one should be: etc/hadoop-env.sh

2: another typo in the official instruction:

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

again, no "conf" folder, here it should be:

$ mkdir input

$ cp etc/hadoop/*.xml input

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'

$ cat output/*

A good unoffical installation guide:
http://data4knowledge.org/2014/08/16/installing-hadoop-2-4-1-detailed/

handling warnings you may see:
http://chawlasumit.wordpress.com/2014/06/17/hadoop-java-hotspottm-execstack-warning/

if ssh has some problem, make sure
1: ssh server is running.
2: run: /etc/init.d/ssh reload

a tutorial for dummy:

if you see "WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable":
the solution is here:
http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos

after everything is correctly installed and launched.
you can check the status by:
$jps
output is:

23208 SecondaryNameNode
22857 NameNode
26575 Jps

22997 DataNode

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Formatting the Namenode

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this w Before formatting, ensure that the dfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:

% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format

"no such file or directory":

http://stackoverflow.com/questions/20821584/hadoop-2-2-installation-no-such-file-or-directory

hadoop fs -mkdir -p /user/[current login user]

"datanode is not running":

This is for newer version of Hadoop (I am running 2.4.0)

In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir and dfs.namenode.data.dir

Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh

how to copy file from local system to hdfs?

hadoop fs -copyFromLocal localfile.txt /user/hduser/input/input1.data

THen run example:

$bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hdgepo/input /user/hdgepo/output

jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount <input> <output>, where <input> is a text file or a directory containing text files, and <output> is the name of a directory that will be created to hold the output. The output directory must not exist before running the command or you will get an error.

RUn your own hadoop:

https://github.com/uwsampa/graphbench/wiki/Standalone-Hadoop
useful hadoop fs commands:
http://www.bigdataplanet.info/2013/10/All-Hadoop-Shell-Commands-you-need-Hadoop-Tutorial-Part-5.html

Cluster Setup:

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
Web interface for hadoop 2.4.1:
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Web_Interfaces

Deryk's stack

Tuesday, September 9, 2014

Install Hadoop first time!

Formatting the Namenode

"no such file or directory":

No comments:

Post a Comment

About Me

Blog Archive