Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Customizing the Kettle Environment used by Pentaho MapReduce
Anchor
customizing
customizing

TODOThe installation environment used by Pentaho MapReduce will be installed to pmr.kettle.dfs.install.dir/pmr.kettle.installation.id when the Pentaho MapReduce job entry is executed. If the installation already exists no modifications will be made and the job will use the environment as it exists. That means any modifications after the initial run, or any custom pre-loading of a kettle environment, will be used as is by Pentaho MapReduce.

Customizing the libraries used in a fresh Kettle environment install into HDFS

The pmr.libraries.archive.file contents are copied into HDFS at pmr.kettle.dfs.install.dir/pmr.kettle.installation.id. To make changes for initial installations, you must edit the archive referenced by this properly.

  1. Unzip pentaho-mapreduce-libraries.zip, it contains a single lib/ directory with the required Kettle dependencies
  2. Copy additional libraries to the lib/ directory
  3. Zip up the lib/ directory into pentaho-mapreduce-libraries-custom.zip so the archive contains the lib/ with all jars within it (you may create subdirectories within lib/. All jars found in lib/ and its subdirectories will be added to the classpath of the executing job.)
  4. Update pentaho-mapreduce.properties and update the following properties:
    Code Block
    pmr.kettle.installation.id=custom
    pmr.libraries.archive.file=pentaho-mapreduce-libraries-custom.zip
    

The next time you execute Pentaho MapReduce the custom Kettle environment will be copied into HDFS at pmr.kettle.dfs.install.dir/custom and used when executing the job. You can switch between Kettle environments by specifying the pmr.kettle.installation.id property as a User Defined property per Pentaho MapReduce job entry or globally in the pentaho-mapreduce.properties file*.

*Note: If Only if the installation referenced by pmr.kettle.installation.id does not exist will the archive file and additional plugins currently configured will be used to "install" it into HDFS.

Customizing an existing Kettle environment in HDFS

You can customize an existing Kettle environment install in HDFS by manually copying jars and plugins into HDFS. This can be done manually (hadoop fs -copyFromLocal <localsrc> ... <dst> or with the Hadoop Copy Files job entry.

Upgrading from the Pentaho Hadoop Distribution (PHD)

...