Monday, September 15, 2014

build a virtually distributed environment on a single laptop.

Using LXC to achieve that.
good tutorial of LXC:
http://en.community.dell.com/techcenter/os-applications/w/wiki/6950.lxc-containers-in-ubuntu-server-14-04-lts
http://en.community.dell.com/techcenter/os-applications/w/wiki/7440.lxc-containers-in-ubuntu-server-14-04-lts-part-2

http://wupengta.blogspot.com/2012/08/lxchadoop.html

golden tutorial:
http://www.kumarabhishek.co.vu/

Once you have installed LXC and created a user. You can check it under /var/lib/lxc:
Note, you have to be a root user to check it:

gstanden@vmem1:/usr/share/lxc/templates$ cd /var/lib/lxc
bash: cd: /var/lib/lxc: Permission denied
gstanden@vmem1:/usr/share/lxc/templates$ sudo cd /var/lib/lxc
sudo: cd: command not found
gstanden@vmem1:/usr/share/lxc/templates$ sudo su
root@vmem1:~# cd /var/lib/lxc

#ifconfig -a
lxcbr0    Link encap:Ethernet  HWaddr fe:d3:07:23:4d:71  

          inet addr:10.0.3.1  Bcast:10.0.3.255  Mask:255.255.255.0

LXC creates this NATed bridge "lxcbr0" at host startup, which means "lxcbr0" will connect containers.


>sudo lxc-create  -t ubuntu -n hdp1
>sudo lxc-start -d -n hdp1
>sudo lxc-console -n hdp1
>sudo lxc-info –n hdp1
Name:       hdp1
State:      RUNNING
PID:        17954
IP:         10.0.3.156
CPU use:    2.18 seconds
BlkIO use:  160.00 KiB
Memory use: 9.13 MiB
>sudo lxc-stop –n lxc-test
>sudo lxc-destroy –n lxc-test

ubuntu@hdp1# sudo useradd -m hduser1

ubuntu@hdp1:~$ sudo passwd hduser1
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfull

Then install JDK on VM "hduser1":
apt-get install openjdk-7-jdk
THen we should set JAVA_HOME, etc., in .bashrc:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH="$PATH:$JAVA_HOME/bin:/home/hduser1/hadoop-2.4.1/bin:$JRE_HOME/bin"

configure network:
http://www.kumarabhishek.co.vu/
http://tobala.net/download/lxc/
http://containerops.org/2013/11/19/lxc-networking/
Now I have 5 LXC virtual machines.
hdp1 : namenode,jobtracker,secondarynamenode
hdp2 : datanodes,tasktrackers
hdp3 : datanodes,tasktrackers
hdp4 : datanodes,tasktrackers
hdp5 : datanodes,tasktrackers

For each VM, check and change two files:
1:/var/lib/lxc/hdp1/config
  make sure this line exists:
     lxc.network.link = lxcbr0
    "lxcbr0" is the bridge created by LXC, whose virtual IP is: 10.0.3.1, who also has the same hostname as the host machine.
2:/var/lib/lxc/hdp1/rootfs/etc/network/interfaces
change 2nd part to assign a static IP address:
auto eth0iface eth0 inet static address 10.0.3.101 netmask 255.255.0.0 broadcast 10.0.255.255 gateway 10.0.3.1 dns-nameservers 10.0.3.1
Once the master node is configured, we copy the LXC.
To clone our lxc-test container, we first need to stop it if it’s running:
$ sudo lxc-stop -n lxc-test
Then clone:
sudo lxc-clone -o hdp1 -n hdpX #replace X with 2,3,...,N


Then For each VM, we need edit /etc/hosts to reflect the changes we made on /etc/hosts on our host machine except the host machine.
10.0.3.101 hdp1
10.0.3.102 hdp2
10.0.3.103 hdp3
10.0.3.104 hdp4
10.0.3.105 hdp5

http://jcinnamon.wordpress.com/lxc-hadoop-fully-distributed-on-single-machine/




How to create multiple bridges?

add a bridge interface:
sudo brctl addbr br100

to delete a bridge inteface
# ip link set br100 down
# brctl delbr br100

Setting up a bridge is pretty much straightforward. At first you create a new bridge, and then continue with adding as many interfaces to it as you want:
# brctl addbr br0
# brctl addif br0 eth0
# brctl addif br0 eth1
# ifconfig br0 netmask 255.255.255.0 192.168.32.1 up
The name br0 is just a suggestion, following the loose conventions for interface names -- identifier followed by a number. However, you're free to choose anything you like. You can name your bridge pink_burning_elephant if you like to. I just don't know if you remember in 5 years why you're having iptables for a burning elephant.


Good tutorial of brctl command:
http://www.lainoox.com/bridge-brctl-tutorial-linux/


Multi-Cluster Multi-Node Distributed Virtual Network Setup

Bridge Mode

Tuesday, September 9, 2014

Install Hadoop first time!

all versions are  available here:
http://mirror.tcpdiag.net/apache/hadoop/common/

I picked 2.4.1 (currently stable version)
Following instructions on official site:
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup.html

1: pretty smooth until I saw:
In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Note, there is no folder named "conf". By comparing it to the install instructions of 2.5.0, I found the correct one should be: etc/hadoop-env.sh

2: another typo in the official instruction:
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar  grep input output 'dfs[a-z.]+'
$ cat output/*
 again, no "conf" folder, here it should be:
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar  grep input output 'dfs[a-z.]+'
$ cat output/*


A good unoffical installation guide:
http://data4knowledge.org/2014/08/16/installing-hadoop-2-4-1-detailed/

handling warnings you may see:
http://chawlasumit.wordpress.com/2014/06/17/hadoop-java-hotspottm-execstack-warning/

if ssh has some problem, make sure
1: ssh server is running.
2: run: /etc/init.d/ssh reload

a tutorial for dummy:


if you see "WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable":
the solution is here:
http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos

after everything is correctly installed and launched.
you can check the status by:
$jps
output is:

23208 SecondaryNameNode
22857 NameNode
26575 Jps

22997 DataNode

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/



Formatting the Namenode


The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this w Before formatting, ensure that the dfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format


"no such file or directory":

http://stackoverflow.com/questions/20821584/hadoop-2-2-installation-no-such-file-or-directory

hadoop fs -mkdir -p /user/[current login user]

"datanode is not running":
This is for newer version of Hadoop (I am running 2.4.0)
  • In this case stop the cluster sbin/stop-all.sh
  • Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir and dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).
  • Now format the namenode via bin/hadoop namenode -format
  • And finally sbin/start-all.sh
how to copy file from local system to hdfs?
hadoop fs -copyFromLocal localfile.txt /user/hduser/input/input1.data


THen run example:

$bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hdgepo/input /user/hdgepo/output


jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount <input> <output>, where <input> is a text file or a directory containing text files, and <output> is the name of a directory that will be created to hold the output. The output directory must not exist before running the command or you will get an error.


RUn your own hadoop:
https://github.com/uwsampa/graphbench/wiki/Standalone-Hadoop

useful hadoop fs commands:
http://www.bigdataplanet.info/2013/10/All-Hadoop-Shell-Commands-you-need-Hadoop-Tutorial-Part-5.html
Cluster Setup:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

Web interface for hadoop 2.4.1:
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Web_Interfaces

Sunday, September 7, 2014

Conquering Spark

Spark is hot! Indeed.
I have no knowledge of Hadoop or Internet programming. But I still want to conquer Spark.

The first thing I learned is from downloading Spark.
https://spark.apache.org/downloads.html

They have :
Pre-built packages:
Pre-built packages, third-party (NOTE: may include non ASF-compatible licenses):
What are all these abbreviations representing?
HDFS, HDP1, CDH3, CDH4, HDP2, CDH5, MapRv3 and MapRv4

Simply put, they are all distributions of Hadoop. Just like a Linux distribution gives you more than Linux, CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with additional components such as a user interface, plus necessary enterprise capabilities such as security, and integration with a broad range of hardware and software solutions.
http://www.dbms2.com/2012/06/19/distributions-cdh-4-hdp-1-hadoop-2-0/

HDP1 and HDP2: two versions of Hortonworks Data Platform. 
Hortonworks is a company which makes use of Hadoop. Hortonworks is to promote the usage of Hadoop. Its product named Hortonworks Data Platform (HDP) includes Apache Hadoop and is used for storing, processing, and analyzing large volumes of data. The platform is designed to deal with data from many sources and formats. The platform includes various Apache Hadoop projects including the Hadoop Distributed File System(HDFS), MapReduce, Pig, Hive, HBase and Zookeeper and additional components.
official site of HDP: http://hortonworks.com/
its wiki: http://en.wikipedia.org/wiki/Hortonworks

CDH3, CDH4, CDH5: versions of Cloudera Distribution Including Apache Hadoop

Its wiki: http://en.wikipedia.org/wiki/Cloudera

MapRv3, MapRv4: versions from MapR company


3 pillars of Hadoop: HDFS, MapReduce, Yarn

Now Spark may replace MapReduce in the future.

http://hortonworks.com/hadoop/hdfs/

 to run spark, you need install CDH or HDP or MapR hadoop. or you can run spark standalone.


Essentials For disctributed development

Virtual Box

Docker: using your existing kernel as its kernel and just creates a container to wrap the kernel to run your apps. Sharing the same kernel.

vagrant: a light-weight VM, better isolation

How to install Docker on Ubuntu 14.04
http://docs.docker.com/installation/ubuntulinux/


Docker vs. Vagrant
http://www.scriptrock.com/articles/docker-vs-vagrant


Installing Hadoop 2.4 on Ubuntu 14.04:
http://dogdogfish.com/2014/04/26/installing-hadoop-2-4-on-ubuntu-14-04/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Using Docker to try Hadoop:
http://techtraits.com/hadoopsetup/

Distributed environment on a single laptop:
http://ofirm.wordpress.com/2014/01/05/creating-a-virtualized-fully-distributed-hadoop-cluster-using-linux-containers/

 

Sunday, July 27, 2014

example using matchShapes in OpenCV

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <cmath>
#include <iostream>
#include <cstdio>
using namespace std;
using namespace cv;

int main(int argc, char*argv[])
{
     if(argc!=3)
    {
        cout<<"usage: ./ms main.png match.png"<<endl;
        return 1;
     }
    Mat src = imread(argv[1]);
    if (src.empty())
   {
      cout<<"figuire "<<argv[1]<< "is not located!"<<endl;
      return 1;
    }

    Mat match = imread(argv[2]);
     if (match.empty())
     {
          cout<<"figuire "<<argv[2]<< "is not located!"<<endl;
          return 1;
     }

      Mat srcGray;
      Mat matchGray;
      cvtColor(src, srcGray, CV_BGR2GRAY);
      cvtColor(match, matchGray, CV_BGR2GRAY);

      Mat src_th, match_th;
      threshold(srcGray, src_th,125, 255,THRESH_BINARY_INV);
      threshold(matchGray, match_th,125, 255,THRESH_BINARY_INV);

      vector<vector<Point> > src_contours;
      vector<Vec4i> src_hierarchy;
      findContours(src_th, src_contours, src_hierarchy,CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);


      vector<vector<Point> > match_contours;
      vector<Vec4i> match_hierarchy;
      findContours(match_th, match_contours, match_hierarchy,CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);

      double matchR;  

     for(int i=0;i<src_contours.size();++i)
    {
        matchR = matchShapes(approx, match_contours[1],CV_CONTOURS_MATCH_I1,0);
       cout<<"match resul of contour "<<i<<" is: "<<matchR<<endl;
     }
}

However, matchShapes function does not work very well.

We are using the 2nd figure to match any one in the 1st figure. Here is the result:

match resul of contour 0 is: 0.11592
match resul of contour 1 is: 0.151644
match resul of contour 2 is: 0.282304
match resul of contour 3 is: 0.390383
match resul of contour 4 is: 0.540419
match resul of contour 5 is: 0.757443

Saturday, July 26, 2014

different stages of using OpenCV

1: read in images.
Mat src = imread("and.png");

2: convert to a gray image.

 Mat gray;
cvtColor(src, gray, CV_BGR2GRAY);
3: find edges. There are two ways: threshold and canny. This step is optional. If the picture is only black and white, then this step is not necessary. We can just use findContours.
Mat th; Mat bw;
//method 1:  threshold(gray, th,125, 255,THRESH_BINARY);

//method 2:  Canny(gray, bw, 100, 200); 


       Note the differences between the above two images.
4: find contours and draw contours.
4.1:  findContours(th, contours, hierarchy,CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);
 Note the red rectangle. If we change :
threshold(gray, th,125, 255,THRESH_BINARY);
to
threshold(gray, th,125, 255,THRESH_BINARY_INV);
You will not see that red  rectangle.

   4.2: findContours(bw, contours2, hierarchy2,CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);
Why do we get 4 contours instead of 2 contours since we only two shapes in the canny output?
Because Canny creates two contours for each edge, one outside and one inside of that edge.


4: write out the image.

imwrite("output.png", color); 

Sunday, April 13, 2014

fuzzy sort

All regions have intersection does not require sorting.

Solution is from here:
http://alumni.media.mit.edu/~dlanman/courses/cs157/HW3.pdf

Some explanation:
Step 1: Find-Intersection
This step is to find a region [a,b] that does not have intersection with other regions [$a_i$,$b_i$]. So this is  a minimal regions.

Step 2:  Partition-Right and Partition-Left-Middle
So the middle part does not need to sort.

Step 3: sort left part and right part.

Some wrong solution before:

get a pivot region arbitrary. and then divide the whole array as 3 parts: left, middle, right.

left part is bi less than middle and right part is ai  greater than middle. middle part is the set of regions that have intersection with the given pivot region.

The problem here is that: middle region may have two regions that are disjoint. therefore, the two regions still need sort, thus violating the fact that middle part does not require sorting. For example, the pivot regions is [5, 9] and middle part is [6,7], [8,9] and [5,9]. then [6,7] and [8,9] definitely need sort.



Friday, March 14, 2014

Comments on "Android Development Tutorial"

I am following the official tutorial:
https://developer.android.com/training/index.html

When I finished the 1st step: "Creating an Android Project"( https://developer.android.com/training/basics/firstapp/creating-project.html), I noticed my "src" and "layout" folders are empty which are not supposed to be.
The solution is:
"Elclipse"--> "Help" -> "Install new software" and install (this will update it) from this url: https://dl-ssl.google.com/android/eclipse/

Then re-run  "Creating an Android Project" above, everything becomes normal.

The 2nd problem you may see is the AVD creation. Sometimes, it is possible after you filled out all information needed, you found the "OK" button does not work. In such a case, you can go \adtx86_64-20131030\sdk\tools, and then found "monitor.bat" and run it. Then create AVD from there.