Transforming Data within a Hadoop Cluster

Unknown macro: {scrollbar}

How to transform data within the Hadoop cluster using Pentaho MapReduce, Hive, and Pig.

Using Pentaho MapReduce to Parse Weblog Data — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
Using Pentaho MapReduce to Generate an Aggregate Dataset — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
Transforming Data within Hive — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
Transforming Data with Pig — How to invoke a Pig script from a PDI job.
Using Pentaho MapReduce to Parse Mainframe Data — How to use Pentaho to ingest a Mainframe file into HDFS, then use MapReduce to process into delimited records.

Pentaho Big Data