Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{scrollbar}

h1. Client Configuration

These instructions are for Hadoop distros other than MapR, if you are using MapR go to the [Configure Pentaho for MapR] page.

h2. Kettle Client

# Download and extract Kettle CE from the [Downloads] page.
_The Kettle Client comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required._
# Configure PDI Client for a different version of Hadoop
## Delete $PDI_HOME/libext/pentaho/hadoop-0.20.2-core.jar
## For all other distributions you should replace this core jar with the one from your cluster. For example, if you are using Cloudera CDHu3:
Copy $HADOOP_HOME/hadoop-core-0.20.2-cdh3u3.jar to $PDI_HOME/libext/pentaho
## For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries.  In that case copy [commons-configuration-1.7.jar|http://commons.apache.org/configuration/download_configuration.cgi] to $PDI_HOME/libext/commons
## For Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PDI_HOME/libext/pentaho.

h2. Pentaho Report Designer (PRD)

# Download and extract PRD from the [Downloads] page.
_The PRD comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required._
# Configure PRD for a different version of Hadoop
## Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
## Copy $HADOOP_HOME/hadoop-core.jar from your distribution into $PRD_HOME/lib/jdbc
## For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries.  In that case copy [commons-configuration-1.7.jar|http://commons.apache.org/configuration/download_configuration.cgi] to $PRD_HOME/lib/jdbc
## For Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PRD_HOME/lib/jdbc.

h2. Pentaho BI Server

# Download and extract The BI Server from the [Downloads] page.
_The BI Server comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required._
# Configure BI Server for a different version of Hadoop
## Delete $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib/hadoop-0.20.2-core.jar
## Copy $HADOOP_HOME/hadoop-core.jar from your distribution into $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib/
## For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries.  In that case copy [commons-configuration-1.7.jar|http://commons.apache.org/configuration/download_configuration.cgi] to $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib
## For Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PDI_HOME/libext/pentaho.

h1. Pentaho Hadoop Node Configuration (PHD)

{color:red}As of Kettle version 4.3, there is no longer a requirement to put any Kettle software on TaskTracker Nodes. {color} 

If you have installed a previous version of the PHD on your cluster, you will have to remove it. Perform the following steps to remove a previous PHD:

# Delete /opt/pentaho and all it's files and sub directories
# Update $HADOOP_HOME/conf/hadoop-env.sh and *remove* /opt/pentaho/pentaho-mapreduce/lib/\* from the HADOOP_CLASSPATH
# If HADOOP_CLASSPATH is currently in your environment with /opt/pentaho/pentaho-mapreduce/lib/\* specified in it, you will need to set your HADOOP_CLASSPATH variable to it's previous value minus the pentaho reference and re export.
# Update $HADOOP_HOME/conf/mapred-site.xml and *remove* the following properties if they exist:
## {code:xml}<property>
  <name>pentaho.kettle.home</name>
  <value>/opt/pentaho/pentaho-mapreduce</value>
</property>

<property>
  <name>pentaho.kettle.plugins.dir</name>
  <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
</property>
{code}
# Restart the Hadoop JobTracker and TaskTracker nodes (usually stop-mapred/start-mapred will do the trick) for the HADOOP_CLASSPATH change to take effect.

{include:Known Configuration Issues}