Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hadoop's Distributed Cache is a mechanism to distribute files into the working directory of each map and reduce task. The origin of these files is HDFS. Pentaho MapReduce will automatically configure the job to use a Kettle environment from HDFS (configured via pmr.kettle.installation.id, see #Configuration options #ConfigurationOptions). If the desired Kettle environment does not exist, Pentaho MapReduce will take care of "installing" it in HDFS before executing the job.

The default Kettle environment installation path within HDFS is /opt/pentaho/mapreduce/$id, where $id is generally the version of Kettle the environment contains but can easily be a custom build that is tailored for a specific set of jobs.

Configuration options
Anchor

...

ConfigurationOptions

...

ConfigurationOptions

Pentaho MapReduce can be configured through the pentaho-mapreduce.properties found in the plugin's base directory.

...