This section contains a series of How-Tos that demonstrate the integration between Pentaho and Hadoop using a sample weblog dataset.

The how-tos are organized by topic with each set explaining various techniques for loading, transforming, extracting and reporting on data within a Hadoop cluster. You are encouraged to perform the how-tos in order as the output of one is sometimes used as the input of another. However, if you would like to jump to a how-to in the middle of the flow, instructions for preparing input data are provided.

Unknown macro: {composition-setup}

Unknown macro: {deck}

Unknown macro: {card}

The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem. The Kettle transform shown here runs as a Mapper and Reducer within the cluster.

Unknown macro: {youtube}

KZe1UugxXcs

Unknown macro: {card}

What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java? At a half hour long, you may not want to watch the entire video...

Unknown macro: {youtube}

cfFq1XB4kww

Unknown macro: {card}

This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.

Unknown macro: {youtube}

ZnyuTICOrhk

Unknown macro: {card}

A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

Unknown macro: {youtube}

Ylekzmd6TAc

Unknown macro: {card}

A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

Unknown macro: {youtube}

3Xew58LcMbg

Hadoop Topics

Expand all Collapse all