Excerpt |
---|
Setting up and configuring the Pentaho node dist, Kettle (PDI) and Reporting |
Preconfigured Packages
These instructions are specific to the MapR distribution of Hadoop, if you are not using MapR, go to the Configure Pentaho for Cloudera and Other Hadoop Versions page.
Client Configuration
MapR Client
- Follow installation instructions provided by MapR for your architecture: Setting up the Client - MapR
PDI Client
- Download and extract Kettle CE from the Downloads page.
- Configure PDI Client for MapR
- Overview:
- The MapR native libraries for your architecture must be added to the
java.library.path
- MapR Hadoop Configuration directory needs to be on the classpath
- MapR Hadoop Core library must be on the classpath
- The MapR native libraries for your architecture must be added to the
- All architectures
- Update the $PDI_HOME/launcher/launcher.properties with the attached launcher.properties
- Delete $PDI_HOME/libext/pentaho/hadoop-0.20.2-core.jar
- Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PDI_HOME/libext/bigdata
- Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PDI_HOME/libext/bigdata
- Linux x64
- Update the $PDI_HOME/spoon.sh with the attached spoon.sh
- Update the $PDI_HOME/pan.sh with the attached pan.sh
- Update the $PDI_HOME/kitchen.sh with the attached kitchen.sh
- Update the $PDI_HOME/carte.sh with the attached carte.sh
- Mac OS X 64-bit
- Update the Data Integration 64-bit.app/Content/Info.plist with the attached Info.plist
- Overview:
Report Designer
- Download and extract PRD from the Downloads page.
- Configure PRD for MapR
- Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
- Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PRD_HOME/lib
- Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PRD_HOME/lib
- Linux x64:
- Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64" to the last line in $PRD_HOME/report-designer.sh
- For MacOS:
- Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Mac_OS_X-x86_64-64" to the "VMOptions" entry in $PRD_HOME/Pentaho\ Report\ Designer.app/Contents/Info.plist
Hadoop Node Configuration
Download the Pentaho Hadoop Node Distribution (PHD):
...
- Extract the Pentaho Hadoop Node Distribution archive into /opt/pentaho/pentaho-mapreduce
- Update $HADOOP_HOME/conf/hadoop-env.sh and add /opt/pentaho/pentaho-mapreduce/lib/* to the HADOOP_CLASSPATH
- Update $HADOOP_HOME/conf/mapred-site.xml and add the following properties if they do not exist:
Code Block xml xml <property> <name>pentaho.kettle.home</name> <value>/opt/pentaho/pentaho-mapreduce</value> </property> <property> <name>pentaho.kettle.plugins.dir</name> <value>/opt/pentaho/pentaho-mapreduce/plugins</value> </property>
RedHat/CentOS
Install
rpm -i phd-ce-mapr-bigdata-preview-4.3.noarch.rpm
Upgrade
rpm -U --force phd-ce-mapr-bigdata-preview-4.3.noarch.rpm
Ubuntu
Install
dpkg -i phd-ce-mapr-bigdata-preview-4.3_all.deb
Upgrade
Remove then reinstall:
dpkg -r pentaho-mapreduce
dpkg -i phd-ce-mapr-bigdata-preview-4.3_all.deb
Restart JobTracker and TaskTracker
To complete the installation you need to restart the JobTracker and TaskTracker nodes so the HADOOP_CLASSPATH changes take effect.
...