I have no knowledge of Hadoop or Internet programming. But I still want to conquer Spark.
The first thing I learned is from downloading Spark.
https://spark.apache.org/downloads.html
They have :
Pre-built packages:
- For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
- For CDH4: find an Apache mirror or direct file download
- For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
Pre-built packages, third-party (NOTE: may include non ASF-compatible licenses):
- For MapRv3: direct file download (external)
- For MapRv4: direct file download (external)
HDFS, HDP1, CDH3, CDH4, HDP2, CDH5, MapRv3 and MapRv4
Simply put, they are all distributions of Hadoop. Just like a Linux distribution gives you more than Linux, CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with additional components such as a user interface, plus necessary enterprise capabilities such as security, and integration with a broad range of hardware and software solutions.
http://www.dbms2.com/2012/06/19/distributions-cdh-4-hdp-1-hadoop-2-0/
HDP1 and HDP2: two versions of Hortonworks Data Platform.
Hortonworks is a company which makes use of Hadoop. Hortonworks is to promote the usage of Hadoop. Its product named Hortonworks Data Platform (HDP) includes Apache Hadoop and is used for storing, processing, and analyzing large volumes of data. The platform is designed to deal with data from many sources and formats. The platform includes various Apache Hadoop projects including the Hadoop Distributed File System(HDFS), MapReduce, Pig, Hive, HBase and Zookeeper and additional components.
official site of HDP: http://hortonworks.com/
its wiki: http://en.wikipedia.org/wiki/Hortonworks
CDH3, CDH4, CDH5: versions of Cloudera Distribution Including Apache Hadoop
Its wiki: http://en.wikipedia.org/wiki/Cloudera
MapRv3, MapRv4: versions from MapR company
3 pillars of Hadoop: HDFS, MapReduce, Yarn
Now Spark may replace MapReduce in the future.
http://hortonworks.com/hadoop/hdfs/
to run spark, you need install CDH or HDP or MapR hadoop. or you can run spark standalone.
No comments:
Post a Comment