Hadoop Ecosystem


Hadoop is the core of a growing ecosystem of software components. Hadoop itself only consists of a distributed file system (HDFS) and a functional programming layer (MapReduce) to process stored data. The Hadoop Ecosystem developed tools around these basic components to interact with Hadoop more easily and integrate it into other environments. This talk gives a brief introduction into Hadoop, and then covers the most important tools tools around it. When possible, demos will be presented. Sqoop & Flume - Getting Data into HDFS from Logs and Databases Hive & Pig - High Level Access to MapReduce HBase - A random access, low latency key/value store on top of Hadoop Mahout - A Machine Learning Library for MapReduce Whirr - Install a Hadoop Cluster on external cloud providers Oozie - A workflow engine to automatically submit Hadoop jobs and some more tools The target audience is new users of Hadoop to get an overview of the many available tools around.

Watch the video of Kai Vogt talk here.

Schedule info
Time slot: 
4 June 11:00 - 11:20
Experience level: 
Presentation Format: 
Short (20min)