Deryk's stack: Conquering Spark

Spark is hot! Indeed.
I have no knowledge of Hadoop or Internet programming. But I still want to conquer Spark.

The first thing I learned is from downloading Spark.
https://spark.apache.org/downloads.html

They have :

Pre-built packages:

For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
For CDH4: find an Apache mirror or direct file download
For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download

Pre-built packages, third-party (NOTE: may include non ASF-compatible licenses):

For MapRv3: direct file download (external)
For MapRv4: direct file download (external)

What are all these abbreviations representing?

HDFS, HDP1, CDH3, CDH4, HDP2, CDH5, MapRv3 and MapRv4

Simply put, they are all distributions of Hadoop. Just like a Linux distribution gives you more than Linux, CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with additional components such as a user interface, plus necessary enterprise capabilities such as security, and integration with a broad range of hardware and software solutions.

http://www.dbms2.com/2012/06/19/distributions-cdh-4-hdp-1-hadoop-2-0/

HDP1 and HDP2: two versions of Hortonworks Data Platform.

Hortonworks is a company which makes use of Hadoop. Hortonworks is to promote the usage of Hadoop. Its product named Hortonworks Data Platform (HDP) includes Apache Hadoop and is used for storing, processing, and analyzing large volumes of data. The platform is designed to deal with data from many sources and formats. The platform includes various Apache Hadoop projects including the Hadoop Distributed File System(HDFS), MapReduce, Pig, Hive, HBase and Zookeeper and additional components.

official site of HDP: http://hortonworks.com/

its wiki: http://en.wikipedia.org/wiki/Hortonworks

CDH3, CDH4, CDH5: versions of Cloudera Distribution Including Apache Hadoop

Its wiki: http://en.wikipedia.org/wiki/Cloudera

MapRv3, MapRv4: versions from MapR company

3 pillars of Hadoop: HDFS, MapReduce, Yarn

Now Spark may replace MapReduce in the future.

http://hortonworks.com/hadoop/hdfs/

to run spark, you need install CDH or HDP or MapR hadoop. or you can run spark standalone.

Deryk's stack

Sunday, September 7, 2014

Conquering Spark

No comments:

Post a Comment

About Me

Blog Archive