Additional Configuration for using MR1 with CDH5

Additional configuration required to allow access to a CDH 5.0 cluster configured for MapReduce 1. This feature was removed in CDH 5.1.

Before you start

These steps assume that you have already followed the Set Active Hadoop Distribution instructions and are completing your shim setup for a CDH 5.0 distribution. If you have not, or don't understand what any of this means, you should read Configuring Pentaho for your Hadoop Distro and Version.

Configuring MapReduce 1

CDH 5.0 is configured to use MapReduce 2 by default. If your cluster is set up to use MapReduce 1 instead, you will need to change a properties file. If you are not sure what version of MapReduce your CDH 5 cluster is using, check with your system administrator.

  1. Navigate to the CDH5.0 shim folder located in the hadoop-configurations folder that matches shim you previously configured. For example: cdh50, etc. These steps will refer to that directory as cdhXX. This folder is different for each application and is located:
    • DI Server - data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/cdhXX
    • BA Server - biserver-ee/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/cdhXX
    • Spoon - data-integration/plugins/pentaho-big-data-plugin/cdhXX
  2. Edit the config.properties file.
  3. Change the shim.current.config property from mr2 to mr1. It should look like:
              shim.current.config=mr1
    
  4. Save and close the file.