Helpful Commands for Working with Hadoop Configurations

Helpful Scripts, Commands, etc. for Working with Hadoop Configurations.

Setting the current shim for all your pentaho applications with 1 command.

This linux shell command finds all plugin.properties files from the current directory and lower and sets active.hadoop.configuration to the provided argument. The argument is the name of the hadoop configuration to use.

find . -wholename "*pentaho-big-data-plugin/plugin.properties" -exec sed -i "s/\(active.hadoop.configuration=\)\(.*\)/\1$1/g" {} \;

Printing the current shim configured for all your pentaho applications with 1 command.

This linux shell command finds all plugin.properties files from the current directory and lower and prints the value of active.hadoop.configuration

find . -wholename "*pentaho-big-data-plugin/plugin.properties" -exec ls {} \; -exec grep -o "active\.hadoop\.configuration=[0-9A-Za-z\-]*" {} \; | cut -f2 -d=

Alias to find the shims directory and change your current directory to it:

alias goshims="cd \`find . -wholename \"*pentaho-big-data-plugin/hadoop-configurations\"\`"

Alias to find the active shim's directory and change your current directory to it:


alias goshim='SHIM_PROP="`find . -wholename \"*pentaho-big-data-plugin/plugin.properties\" -exec ls {} 2>/dev/null \; | head --lines=1`"; \
                          ACTIVE_SHIM="`grep -o \"active\.hadoop\.configuration=[0-9A-Za-z\-]*\" \"$SHIM_PROP\" | sed \"s/.*=//g\"`"; \
                          HADOOP_CONFIG_PATH="`grep -o \"hadoop\.configurations\.path=[0-9A-Za-z\-]*\" \"$SHIM_PROP\" | sed \"s/.*=//g\"`"; \
                          cd "`dirname \"$SHIM_PROP\"`/$HADOOP_CONFIG_PATH/$ACTIVE_SHIM"'

Change the ResourceManager information from Hortonworks Sandbox to my.resourcemanager.com (replace this with your RM's hostname):

find . -name "*-site.xml" -exec sed -i "s/sandbox.hortonworks.com/my.resourcemanager.com/g" {} \;

Change the ResourceManager information from Cloudera QuickStart VM to my.resourcemanager.com (replace this with your RM's hostname):

find . -name "*-site.xml" -exec sed -i "s/clouderamanager.cdh5.test/my.resourcemanager.com/g" {} \;

Increase the Mondrian query timeout:

Analyzer reports using Hive datasources (or anything that generates a MapReduce job) can exceed the timeout, here's how to change the timeout quickly from 300 seconds (5 mins) to 600 seconds (10 mins), you can change the 600 to whatever you want:

sed -i "s/^mondrian.rolap.queryTimeout=300$/mondrian.rolap.queryTimeout=600/" pentaho-solutions/system/mondrian/mondrian.properties