Unknown macro: {scrollbar}

How to use compression with Pentaho MapReduce. This guide uses the Snappy compression codec in its examples, but you may use any compression codec you choose that is supported in Hadoop. The following scenarios are covered:

Reading Compressed Files
Writing Compressed Files
Compressing Intermediate Data

Prerequisites

In order to follow along with this how-to guide you will need the following:

Hadoop
Pentaho Data Integration
Pentaho Hadoop Distribution
Compression Codec Installed on Hadoop

Step-By-Step Instructions

Reading Compressed Files

In this task you will configure Pentaho MapReduce to read compressed files into the Map/Reduce Input.

The following compression codecs are automatically supported by Pentaho MapReduce. You do not need to do any configuration to read a file using these codecs.

Create Year Partitioner Class: In a text editor create a new file named YearPartitioner.java containing the following code:

Using Compression with Pentaho MapReduce

Prerequisites

Step-By-Step Instructions

Reading Compressed Files