Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 30 Next »

Setting up and configuring the Pentaho node dist, Kettle (PDI) and Reporting

Preconfigured Packages

These instructions are specific to the MapR distribution of Hadoop, if you are not using MapR, go to the Configure Pentaho for Cloudera and Other Hadoop Versions page.

Client Configuration

MapR Client

  1. Follow installation instructions provided by MapR for your architecture: Setting up the Client - MapR

PDI Client

  1. Download and extract Kettle CE from the Downloads page.
  2. Configure PDI Client for MapR
    1. Overview:
      1. The MapR native libraries for your architecture must be added to the java.library.path
      2. MapR Hadoop Configuration directory needs to be on the classpath
      3. MapR Hadoop Core library must be on the classpath
    2. All architectures
      1. Update the $PDI_HOME/launcher/launcher.properties with the attached launcher.properties
      2. Delete $PDI_HOME/libext/pentaho/hadoop-0.20.2-core.jar
      3. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PDI_HOME/libext/pentaho
      4. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PDI_HOME/libext/pentaho
    3. Linux x64
      1. Update the $PDI_HOME/spoon.sh with the attached spoon.sh
    4. Mac OS X 64-bit
      1. Update the Data Integration 64-bit.app/Content/Info.plist with the attached Info.plist

Report Designer

  1. Download and extract PRD from the Downloads page.
  2. Configure PRD for MapR
    1. Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
    2. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PRD_HOME/lib
    3. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PRD_HOME/lib
    4. Linux x64:
      1. Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64" to the last line in $PRD_HOME/report-designer.sh
    5. For MacOS:
      1. Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Mac_OS_X-x86_64-64" to the "VMOptions" entry in $PRD_HOME/Pentaho\ Report\ Designer.app/Contents/Info.plist

Hadoop Node Configuration

Download the Pentaho Hadoop Node Distribution (PHD):

Ubuntu: phd-ce-mapr-bigdata-preview-4.3_all.deb

RedHat/CentOS: phd-ce-mapr-bigdata-preview-4.3.noarch.rpm

All TaskTracker nodes must have the pentaho-mapreduce (PHD) package installed on them. Our packages require the MapR TaskTracker (mapr-tasktracker) package being installed.

From a high level the packages perform the following steps:

  1. Extract the Pentaho Hadoop Node Distribution archive into /opt/pentaho/pentaho-mapreduce
  2. Update $HADOOP_HOME/conf/hadoop-env.sh and add /opt/pentaho/pentaho-mapreduce/lib/* to the HADOOP_CLASSPATH
  3. Update $HADOOP_HOME/conf/mapred-site.xml and add the following properties if they do not exist:
    1. <property>
        <name>pentaho.kettle.home</name>
        <value>/opt/pentaho/pentaho-mapreduce</value>
      </property>
      
      <property>
        <name>pentaho.kettle.plugins.dir</name>
        <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
      </property>
      

RedHat/CentOS

Install

  1. rpm -i phd-ce-mapr-bigdata-preview-4.3.noarch.rpm

Upgrade

  1. rpm -U --force phd-ce-mapr-bigdata-preview-4.3.noarch.rpm

Ubuntu

Install

  1. dpkg -i phd-ce-mapr-bigdata-preview-4.3_all.deb

Upgrade

Remove then reinstall:

  1. dpkg -r pentaho-mapreduce
  2. dpkg -i phd-ce-mapr-bigdata-preview-4.3_all.deb

Restart JobTracker and TaskTracker

To complete the installation you need to restart the JobTracker and TaskTracker nodes so the HADOOP_CLASSPATH changes take effect.

  • No labels