Additional Configuration for MapR Shims
Additional configuration required to allow Pentaho to access MapR clusters.
PLEASE NOTE: This documentation is for Pentaho 5.2, 5.3, and 5.4. You can find documentation for Pentaho 6.x and later here: https://help.pentaho.com/Documentation.
Before you start
These steps assume that you have already followed the Set Active Hadoop Distribution instructions and are completing your shim setup for a MapR distribution. If you have not, or don't understand what any of this means, you should read Configuring Pentaho for your Hadoop Distro and Version.
NOTE: Pentaho does not support connections to Impala on a secured MapR 4.1 cluster. For more information, please see these references:
Windows Users
These steps apply to installing a shim into the DI and BA Servers as well as the Spoon, Report Designer, and Metadata Editor design tools.
NOTE: If you are installing MapR 4.0.1 on Windows, use version 4.0.1.31009GA or later as your MapR client. The software can be obtained from MapR.
1. Install, then verify that the MapR client is properly installed on your computer and is able to connect to and browse your MapR cluster. For more information on how to do this, visit the MapR site.
2. If you plan to use spoofing or impersonation to connect to the MapR client, specify the appropriate User ID (UID), Group ID (GID), and name as indicated in the MapR documentation. (NOTE: Make sure that the account that you use for spoofing is created the client and on each node. Each "spoofing" account should have the same UID and GID as the one on the client.)
3. Copy the hbase-site.xml file from the cluster to these directories:
- data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX
- data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX
4. Navigate to the MapR shim folder located in the hadoop-configurations folder that matches shim you previously configured, such as mapr31, mapr21 etc. These steps refer to that directory as maprXX.
Component |
Location of MapR Shim Folder |
---|---|
DI Server |
data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
BA Server |
biserver-ee/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Spoon |
data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Report Designer |
report-designer/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Metadata Editor |
metadata-editor/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
5. Edit the config.properties file.
6. Locate the windows.classpath property and edit it to match your local MapR client tools installation directory. Set the windows.classpath parameter equal to these:
*Hadoop classpath
*Pentaho installation directory path
*MapR shim directory path
NOTE: The value of windows.classpath parameter should include lib/hadoop2-windows-patch-08072014.jar as a first entry in the string, the Hadoop classpath of MapR client on the current machine, a full directory path where MapR shim is located under each Pentaho component, and this entry: file:///c:/opt/mapr/lib. To determine your hadoop classpath, execute the hadoop classpath command and use those values instead. Convert any directory paths to Windows URL format. The following is an example.
windows.classpath=lib/hadoop2-windows-patch-08072014.jar,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common,file:///C://opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce,file:///C:/opt/mapr/sqoop/sqoop-1.4.5,file:///C:/opt/mapr/sqoop/sqoop-1.4.5/lib,file:///C:/contrib/capacity-scheduler,file:///C:/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401,file:///C:/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401/lib,file:///C:/opt/mapr/lib
7. Modify the windows.library.path if necessary.
windows.library.path=C:\\opt\\mapr\\lib
8. Save and close the config.properties file.
9. Set the MAPR_HOME environment variable to the install location of the MapR client, then restart Windows.
10. If you are configuring the MapR 3.0.1 shim, you need to download additional jars.
11. If you are configuring the MapR 2.1.2 shim, read about how to address one of our known issues.
Linux and Mac Users
These steps apply to installing a shim into the DI and BA Servers as well as the Spoon, Report Designer, and Metadata Editor design tools.
NOTE: Due to a known issue with the MapR library, we only support MapR 4.0.1 on those Pentaho Business Analytics components which are installed on Linux clients. MapR is aware of this issue.
1. Install, then verify that the MapR client is properly installed on your computer and is able to connect to and browse your MapR cluster. For more information on how to do this, visit the MapR site.
2. Ensure that you have set up a login on your client computer that has the same user name, User ID, and Group ID as an account on the MapR cluster. The account on your client computer will be the same account from which you will need to run the Pentaho applications.
3. Copy the hbase-site.xml file from the cluster to these directories:
- data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX
- data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX
4. Navigate to the MapR shim folder located in the hadoop-configurations folder that matches shim you previously configured. For example: mapr31, mapr21 etc. These steps will refer to that directory as maprXX.
Component |
Location of MapR Shim Folder |
---|---|
DI Server |
data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
BA Server |
biserver-ee/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Spoon |
data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Report Designer |
report-designer/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
Metadata Editor |
metadata-editor/plugins/pentaho-big-data-plugin/hadoop-configurations/maprXX |
5. Edit the config.properties file.
6. Locate the linux.classpath property and edit it to match your local MapR client tools installation directory. Set the linux.classpath parameter equal to these:
*Hadoop classpath
*Pentaho installation directory path
*MapR shim directory path
NOTE: The linux.classpath should contain the Hadoop classpath of MapR client on the current machine, a full directory path where MapR shim is located under each Pentaho component, and this entry: /opt/mapr/lib. To determine your hadoop classpath, execute the hadoop classpath command and use those values instead. the following is an example.
linux.classpath=/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce,/opt/mapr/sqoop/sqoop-1.4.5,/opt/mapr/sqoop/sqoop-1.4.5/lib,/contrib/capacity-scheduler,/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401,/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401/lib,/opt/mapr/lib
7. Modify the linux.library.path if necessary. Here is an example.
linux.library.path=/opt/mapr/lib
8. Save and close the config.properties file.
9. Set the MAPR_HOME environment variable to the install location of the MapR client.
10. If you are configuring the MapR 3.0.1 shim, you need to download additional jars.
11. If you are configuring the MapR 2.1.2 shim, read about how to address one of our known issues.
Set yarn.application.classpath property for Pentaho 5.3.x Clients (Windows Only)
If you are using Pentaho 5.3.x on Windows, you will need to set the yarn.application.classpath property in the yarn-site.xml file on the MapR client (<MAPR_HOME>/hadoop/hadoop-2.x.x/etc/hadoop/) as follows.
<property> <name>yarn.application.classpath</name> <value>$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/auxlib/*:$PWD/*:%PWD%/* </value> </property>
Download Additional Jars for MapR 3.0.1
If you are using the MapR 3.0.1 shim, check to see if the files in the following table are in the pentaho-big-data-plugin\hadoop-configurations\mapr3x directory.
If they are not, download the files from repository.mapr.com (specific directories are in the following table) and place them there.
File |
MapR Repository Location |
---|---|
hadoop-core-1.0.3-mapr-3.1.0.jar |
|
hadoop-auth-1.0.3-mapr-3.1.0.jar |
|
central-logging-1.0.3-mapr-3.1.0.jar |
|
libprotodefs-1.0.3-mapr-3.1.0.jar |
|
maprfs-1.0.3-mapr-3.1.0.jar |
Resolving Known Issues
MapR 2.1.2 Cluster Connection Workaround (MS Windows)
MapR version 2.1.2 has a known bug that prevents Spoon from connecting to clusters properly. To work around this bug, add a -Dmapr.libary.flatclass parameter to the Java OPT variable that is in the Spoon.bat file. For example:
set OPT=%PENTAHO_DI_JAVA_OPTIONS% "-Djava.library.path=%LIBSPATH%" "-DKETTLE_HOME=%KETTLE_HOME%" "-DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%" "-DKETTLE_USER=%KETTLE_USER%" "-DKETTLE_PASSWORD=%KETTLE_PASSWORD%" "-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES%" "-DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%" "-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT%" "-Dpentaho.installed.licenses.file=%PENTAHO_INSTALLED_LICENSE_PATH%" "-Dmapr.library.flatclass"
MapR 2.1.2 Cluster Connection Workaround (Linux and Mac)
MapR version 2.1.2 has a known bug that prevents Spoon from connecting to clusters properly. To work around this bug, add a -Dmapr.library.flatclass parameter to the Java OPT variable that is in the spoon.sh file. For example:
OPT="$OPT $PENTAHO_DI_JAVA_OPTIONS -Djava.library.path=$LIBPATH -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES -DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT -Dmapr.library.flatclass"
Drive Letter Casing Issue (Windows Only)
The MapR shim might fail to load correctly if the drive letter in the Windows classpath or library path has a capital letter. This is a known issue with MapR software. If this happens, use the lower case instead, like this: file:///c:/opt/mapr.
MapR Release Notes
MapR Release Notes can be found here: http://doc.mapr.com/display/RelNotes/Release+Notes.