Transforming Data within a Hadoop Cluster
Unknown macro: {scrollbar}
How to transform data within the Hadoop cluster using Pentaho MapReduce, Hive, and Pig.
- Using Pentaho MapReduce to Parse Weblog Data — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
- Using Pentaho MapReduce to Generate an Aggregate Dataset — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
- Transforming Data within Hive — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
- Transforming Data with Pig — How to invoke a Pig script from a PDI job.
- Using Pentaho MapReduce to Parse Mainframe Data — How to use Pentaho to ingest a Mainframe file into HDFS, then use MapReduce to process into delimited records.