Unknown macro: {scrollbar}
Client Configuration
These instructions are for Hadoop distros other than MapR, if you are using MapR go to the Configure Pentaho for MapR page.
Kettle Client
- Download and extract Kettle CE from the Downloads page.
The Kettle Client comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required. - Configure PDI Client for a different version of Hadoop
- Delete $PDI_HOME/libext/pentaho/hadoop-0.20.2-core.jar
- For all other distributions you should replace this core jar with the one from your cluster. For example, if you are using Cloudera CDHu3:
Copy $HADOOP_HOME/hadoop-core-0.20.2-cdh3u3.jar to $PDI_HOME/libext/pentaho - For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries. In that case copy commons-configuration-1.7.jar to $PDI_HOME/libext/commons
- For Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PDI_HOME/libext/pentaho.
Pentaho Report Designer (PRD)
- Download and extract PRD from the Downloads page.
The PRD comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required. - Configure PRD for a different version of Hadoop
- Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
- Copy $HADOOP_HOME/hadoop-core.jar from your distribution into $PRD_HOME/lib/jdbc
- For Hadoop 0.20.205 you also need to have Apache Commons Configuration included in your set of PDI libraries. In that case copy commons-configuration-1.7.jar to $PDI_HOME/libext/commons
- For Cloudera CDH3 Update 3) you also need to copy $HADOOP_HOME/lib/guava-r09-jarjar.jar to $PDI_HOME/libext/pentaho.
Pentaho Hadoop Node Configuration (PHD)
As of Kettle version 4.3, there is no longer a requirement to put any Kettle software on TaskTracker Nodes.
If you have installed a previous version of the PHD on your cluster, you will have to remove it. Perform the following steps to remove a previous PHD:
- Delete /opt/pentaho and all it's files and sub directories
- Update $HADOOP_HOME/conf/hadoop-env.sh and remove /opt/pentaho/pentaho-mapreduce/lib/* from the HADOOP_CLASSPATH
- If HADOOP_CLASSPATH is currently in your environment with /opt/pentaho/pentaho-mapreduce/lib/* specified in it, you will need to set your HADOOP_CLASSPATH variable to it's previous value minus the pentaho reference and re export.
- Update $HADOOP_HOME/conf/mapred-site.xml and remove the following properties if they exist:
<property> <name>pentaho.kettle.home</name> <value>/opt/pentaho/pentaho-mapreduce</value> </property> <property> <name>pentaho.kettle.plugins.dir</name> <value>/opt/pentaho/pentaho-mapreduce/plugins</value> </property>
- Restart the Hadoop JobTracker and TaskTracker nodes (usually stop-mapred/start-mapred will do the trick) for the HADOOP_CLASSPATH change to take effect.