Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The default Kettle environment installation path within HDFS is /opt/pentaho/mapreduce/$id, where $id is generally the version of Kettle the environment contains but can easily be a custom build that is tailored for a specific set of jobs.

...

Anchor
ConfigurationOptions
ConfigurationOptions

Configuration options

Pentaho MapReduce can be configured through the pentaho-mapreduce.properties found in the plugin's base directory, or overridden per Pentaho MapReduce job entry if they are defined in the User Defined properties tab.

...

Property Name

Description

pmr.kettle.installation.id

Version of Kettle to use from the Kettle HDFS installation directory. If not set we will use the version of Kettle that is used to submit the Pentaho MapReduce job.

pmr.kettle.dfs.install.dir

Installation path in HDFS for the Kettle environment used to execute a Pentaho MapReduce job. This can be a relative path, anchored to the user's home directory, or an absolute path if it starts with a /.

pmr.libraries.archive.file

Pentaho MapReduce Kettle environment runtime archive to be preloaded into kettle.hdfs.install.dir/pmr.kettle.installation.id

pmr.kettle.additional.plugins

Comma-separated list of additional plugins (by directory name) to be installed with the Kettle environment.
e.g. "steps/DummyPlugin,my-custom-plugin"

Anchor
customizing
customizing

Customizing the Kettle Environment used by Pentaho MapReduce

...

The installation environment used by Pentaho MapReduce will be installed to pmr.kettle.dfs.install.dir/pmr.kettle.installation.id when the Pentaho MapReduce job entry is executed. If the installation already exists no modifications will be made and the job will use the environment as is. That means any modifications after the initial run, or any custom pre-loading of a kettle environment, will be used as is by Pentaho MapReduce.

...

  1. Remove the pentaho.* properties from your mapred-site.xml
  2. Remove the directories those properties referenced
  3. Restart the TaskTracker process

Anchor
appendix-a
appendix-a

Appendix A: pentaho-mapreduce-libraries.zip structure

...

...

Code Block
pentaho-mapreduce-libraries.zip/
  `- lib/
      +- kettle-core-{version}.jar
      +- kettle-engine-{version}.jar
      `- .. (all other required Kettle dependencies and optional jars)

Anchor
appendix-b
appendix-b

Appendix B: Example Kettle environment installation directory structure within DFS

...

Code Block
/opt/pentaho/mapreduce/
  +- 4.3.0/
  |   +- lib/
  |   |   +- kettle-core-{version}.jar
  |   |   +- kettle-engine-{version}.jar
  |   |   `- .. (all other required Kettle dependencies and optional jars)
  |   `- plugins/
  |       +- pentaho-big-data-plugin/
  |       `- .. (additional optional plugins)
  `- custom/
      +- lib/
      |   +- kettle-core-{version}.jar
      |   +- kettle-engine-{version}.jar
      |   +- my-custom-code.jar
      |   `- .. (all other required Kettle dependencies and optional jars)
      `- plugins/
          +- pentaho-big-data-plugin/
          |   ..
          `- my-custom-plugin/
              ..