all versions are available here:
http://mirror.tcpdiag.net/apache/hadoop/common/
I picked 2.4.1 (currently stable version)
Following instructions on official site:
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup.html
1: pretty smooth until I saw:
In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Note, there is no folder named "conf". By comparing it to the install instructions of 2.5.0, I found the correct one should be: etc/hadoop-env.sh
$ mkdir inputagain, no "conf" folder, here it should be:
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
$ cat output/*
A good unoffical installation guide:
http://data4knowledge.org/2014/08/16/installing-hadoop-2-4-1-detailed/
handling warnings you may see:
http://chawlasumit.wordpress.com/2014/06/17/hadoop-java-hotspottm-execstack-warning/
if ssh has some problem, make sure
1: ssh server is running.
2: run: /etc/init.d/ssh reload
a tutorial for dummy:
if you see "WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable":
the solution is here:
http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos
after everything is correctly installed and launched.
you can check the status by:
$jps
output is:
23208 SecondaryNameNode
22857 NameNode
26575 Jps
22997 DataNode
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Formatting the Namenode
The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this w Before formatting, ensure that the dfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format
"no such file or directory":
http://stackoverflow.com/questions/20821584/hadoop-2-2-installation-no-such-file-or-directoryhadoop fs -mkdir -p /user/[current login user]
"datanode is not running":
This is for newer version of Hadoop (I am running 2.4.0)
- In this case stop the cluster sbin/stop-all.sh
- Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir and dfs.namenode.data.dir
- Delete both the directories recursively (rm -r).
- Now format the namenode via bin/hadoop namenode -format
- And finally sbin/start-all.sh
how to copy file from local system to hdfs?
hadoop fs -copyFromLocal localfile.txt /user/hduser/input/input1.data
THen run example:
jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount <input> <output>
, where <input>
is a text file or a directory containing text files, and <output>
is the name of a directory that will be created to hold the output. The output directory must not exist before running the command or you will get an error.
RUn your own hadoop:
https://github.com/uwsampa/graphbench/wiki/Standalone-Hadoopuseful hadoop fs commands:
http://www.bigdataplanet.info/2013/10/All-Hadoop-Shell-Commands-you-need-Hadoop-Tutorial-Part-5.html
Cluster Setup:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.htmlWeb interface for hadoop 2.4.1:
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Web_Interfaces
No comments:
Post a Comment