Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{scrollbar}

h1. Client Configuration

These instructions are for Hadoop distros other than MapR, if you are using MapR go to the [Configure Pentaho for MapR] page.

h2. Kettle Client

# Download and extract Kettle CE from the [Downloads] page.
_The Kettle Client comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required._
# Configure PDI Client for a different version of Hadoop
## Delete $PDI_HOME/libext/pentaho/hadoop-0.20.2-core.jar
## For all other distributions you should replace this core jar with the one from your cluster. For example, if you are using Cloudera CDHu2CDHu3:
Copy $HADOOP_HOME/hadoop-core-0.20.2-cdh3u2cdh3u3.jar to $PDI_HOME/libext/pentaho
## For certain Hadoop distributions or versions (for example Hadoop 0.20.205) you also need to have Apache Commons Configuration included in your set of PDI libraries.  In that case copy [commons-configuration-1.7.jar|http://commons.apache.org/configuration/download_configuration.cgi] to $PDI_HOME/libext/commons
## For certain Hadoop distributions or versions (for example Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PDI_HOME/libext/pentaho.

h2. Pentaho Report Designer (PRD)

# Download and extract PRD from the [Downloads] page.
_The PRD comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required._
# Configure PRD for a different version of Hadoop
## Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
## Copy $HADOOP_HOME/hadoop-core.jar from your distribution into $PRD_HOME/lib/jdbc
## Copy For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries.  In that case copy [commons-configuration-1.7.jar|http://commons.apache.org/configuration/download_configuration.cgi]  to $PRD$PDI_HOME/lib (in case this is needed for certain Hadoop variants)/libext/commons
## For certain Hadoop distributions or versions (for example Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PRD$PDI_HOME/libext/libpentaho.

h1. Pentaho Hadoop Node Configuration (PHD)

As of Kettle version 4.3, there is no longer a requirement to put any Kettle software Allon TaskTracker nodesNodes. must have
the
pentaho-mapreduce (PHD) packageIf you have installed a onprevious them.version Performof the PHD followingon steps:your cluster, #you Downloadwill thehave latestto PHDremove archiveit. fromPerform the following [Downloads] page.
# Extract the PHD archive intosteps to remove a previous PHD:

# Delete /opt/pentaho/pentaho-mapreduce and all it's files and sub directories
# Update $HADOOP_HOME/conf/hadoop-env.sh and add*remove* /opt/pentaho/pentaho-mapreduce/lib/\* tofrom the HADOOP_CLASSPATH
{code}export
# If HADOOP_CLASSPATH= is currently in your environment with /opt/pentaho/pentaho-mapreduce/lib/*:$HADOOP_CLASSPATH
{code}\* specified in it, you will need to set your HADOOP_CLASSPATH variable to it's previous value minus the pentaho reference and re export.
# Update $HADOOP_HOME/conf/mapred-site.xml and add*remove* the following properties if they do not exist:
## {code:xml}<property>
  <name>pentaho.kettle.home</name>
  <value>/opt/pentaho/pentaho-mapreduce</value>
</property>

<property>
  <name>pentaho.kettle.plugins.dir</name>
  <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
</property>
{code}
## Note: {{pentaho.kettle.home}} should contain a .kettle/ folder with kettle.properties. A shared.xml can also be placed here if you require any Kettle shared objects in the transformations execution within Pentaho MapReduce. These files will be created automatically if the user the TaskTracker process is executed as has write access to {{pentaho.kettle.home}}. If not, you should manually create at least .kettle/kettle.properties.
# Restart the Hadoop JobTracker and TaskTracker nodes (usually stop-mapred/start-mapred will do the trick) for the HADOOP_CLASSPATH change to take effect.

{include:Known Configuration Issues}