...
Customizing the Kettle Environment used by Pentaho MapReduce
Anchor | ||||
---|---|---|---|---|
|
TODO
The installation environment used by Pentaho MapReduce will be installed to pmr.kettle.dfs.install.dir
/pmr.kettle.installation.id
when the Pentaho MapReduce job entry is executed. If the installation already exists no modifications will be made and the job will use the environment as it exists. That means any modifications after the initial run, or any custom pre-loading of a kettle environment, will be used as is by Pentaho MapReduce.
Customizing the libraries used in a fresh Kettle environment install into HDFS
The pmr.libraries.archive.file
contents are copied into HDFS at pmr.kettle.dfs.install.dir
/pmr.kettle.installation.id
. To make changes for initial installations, you must edit the archive referenced by this properly.
- Unzip pentaho-mapreduce-libraries.zip, it contains a single lib/ directory with the required Kettle dependencies
- Copy additional libraries to the lib/ directory
- Zip up the lib/ directory into pentaho-mapreduce-libraries-custom.zip so the archive contains the lib/ with all jars within it (you may create subdirectories within lib/. All jars found in lib/ and its subdirectories will be added to the classpath of the executing job.)
- Update
pentaho-mapreduce.properties
and update the following properties:Code Block pmr.kettle.installation.id=custom pmr.libraries.archive.file=pentaho-mapreduce-libraries-custom.zip
The next time you execute Pentaho MapReduce the custom Kettle environment will be copied into HDFS at pmr.kettle.dfs.install.dir/custom
and used when executing the job. You can switch between Kettle environments by specifying the pmr.kettle.installation.id
property as a User Defined property per Pentaho MapReduce job entry or globally in the pentaho-mapreduce.properties
file*.
*Note: If Only if the installation referenced by pmr.kettle.installation.id
does not exist will the archive file and additional plugins currently configured will be used to "install" it into HDFS.
Customizing an existing Kettle environment in HDFS
You can customize an existing Kettle environment install in HDFS by manually copying jars and plugins into HDFS. This can be done manually (hadoop fs -copyFromLocal <localsrc> ... <dst>
or with the Hadoop Copy Files job entry.
Upgrading from the Pentaho Hadoop Distribution (PHD)
...