Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hadoop's Distributed Cache is a mechanism to copy distribute files from HDFS into the working directory of each map and reduce task. The origin of these files is HDFS. Pentaho MapReduce will automatically configure the job to use a Kettle environment from HDFS. It will also set up the environment if it If the desired Kettle environment does not already existexist, Pentaho MapReduce will take care of "installing" it in HDFS before executing the job.

The default Kettle environment installation path within HDFS is /opt/pentaho/mapreduce/$id, where $id is generally the version of Kettle the environment contains but can easily be a custom build that is tailored for a specific set of jobs.

...