Tuesday, September 9, 2014

Install Hadoop first time!

all versions are  available here:
http://mirror.tcpdiag.net/apache/hadoop/common/

I picked 2.4.1 (currently stable version)
Following instructions on official site:
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup.html

1: pretty smooth until I saw:
In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Note, there is no folder named "conf". By comparing it to the install instructions of 2.5.0, I found the correct one should be: etc/hadoop-env.sh

2: another typo in the official instruction:
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar  grep input output 'dfs[a-z.]+'
$ cat output/*
 again, no "conf" folder, here it should be:
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar  grep input output 'dfs[a-z.]+'
$ cat output/*


A good unoffical installation guide:
http://data4knowledge.org/2014/08/16/installing-hadoop-2-4-1-detailed/

handling warnings you may see:
http://chawlasumit.wordpress.com/2014/06/17/hadoop-java-hotspottm-execstack-warning/

if ssh has some problem, make sure
1: ssh server is running.
2: run: /etc/init.d/ssh reload

a tutorial for dummy:


if you see "WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable":
the solution is here:
http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos

after everything is correctly installed and launched.
you can check the status by:
$jps
output is:

23208 SecondaryNameNode
22857 NameNode
26575 Jps

22997 DataNode

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/



Formatting the Namenode


The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this w Before formatting, ensure that the dfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format


"no such file or directory":

http://stackoverflow.com/questions/20821584/hadoop-2-2-installation-no-such-file-or-directory

hadoop fs -mkdir -p /user/[current login user]

"datanode is not running":
This is for newer version of Hadoop (I am running 2.4.0)
  • In this case stop the cluster sbin/stop-all.sh
  • Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir and dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).
  • Now format the namenode via bin/hadoop namenode -format
  • And finally sbin/start-all.sh
how to copy file from local system to hdfs?
hadoop fs -copyFromLocal localfile.txt /user/hduser/input/input1.data


THen run example:

$bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hdgepo/input /user/hdgepo/output


jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount <input> <output>, where <input> is a text file or a directory containing text files, and <output> is the name of a directory that will be created to hold the output. The output directory must not exist before running the command or you will get an error.


RUn your own hadoop:
https://github.com/uwsampa/graphbench/wiki/Standalone-Hadoop

useful hadoop fs commands:
http://www.bigdataplanet.info/2013/10/All-Hadoop-Shell-Commands-you-need-Hadoop-Tutorial-Part-5.html
Cluster Setup:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

Web interface for hadoop 2.4.1:
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Web_Interfaces

No comments:

Post a Comment