...
Hadoop's Distributed Cache is a mechanism to distribute files into the working directory of each map and reduce task. The origin of these files is HDFS. Pentaho MapReduce will automatically configure the job to use a Kettle environment from HDFS (configured via pmr.kettle.installation.id
, see Configuration #Configuration options). If the desired Kettle environment does not exist, Pentaho MapReduce will take care of "installing" it in HDFS before executing the job.
...