This section contains a series of How-Tos that demonstrate the integration between Pentaho and Hadoop using a sample weblog dataset.
The how-tos are organized by topic with each set explaining various techniques for loading, transforming, extracting and reporting on data within a Hadoop cluster. You are encouraged to perform the how-tos in order as the output of one is sometimes used as the input of another. However, if you would like to jump to a how-to in the middle of the flow, instructions for preparing input data are provided.
The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem. The Kettle transform shown here runs as a Mapper and Reducer within the cluster.
KZe1UugxXcs
What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java? At a half hour long, you may not want to watch the entire video...
cfFq1XB4kww
This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.
ZnyuTICOrhk
A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
Ylekzmd6TAc
A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
3Xew58LcMbg