Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to pass data between Hadoop and Kettle there must be some degree of type conversion. Here's the type mapping for the built in Kettle types:

Kettle Type

Hadoop Type

ValueMetaInterface.TYPE_STRING

org.apache.hadoop.io.Text

ValueMetaInterface.TYPE_BIGNUMBER

org.apache.hadoop.io.Text

ValueMetaInterface.TYPE_DATE

org.apache.hadoop.io.Text

ValueMetaInterface.TYPE_INTEGER

org.apache.hadoop.io.LongWritable

ValueMetaInterface.TYPE_LONG

org.apache.hadoop.io.DoubleWritable

ValueMetaInterface.TYPE_BOOLEAN

org.apache.hadoop.io.BooleanWritable

ValueMetaInterface.TYPE_BINARY

org.apache.hadoop.io.BytesWritable

...

The currently supported configuration properties are:

Property Name

Description

pmr.kettle.installation.id

Version of Kettle to use from the Kettle HDFS installation directory. If not set we will use the version of Kettle that is used to submit the Pentaho MapReduce job.

pmr.kettle.dfs.install.dir

Installation path in HDFS for the Kettle environment used to execute a Pentaho MapReduce job. This can be a relative path, anchored to the user's home directory, or an absolute path if it starts with a /.

pmr.libraries.archive.file

Pentaho MapReduce Kettle environment runtime archive to be preloaded into kettle.hdfs.install.dir/pmr.kettle.installation.id

pmr.kettle.additional.plugins

Comma-separated list of additional plugins (by directory name) to be installed with the Kettle environment.
e.g. "steps/DummyPlugin,my-custom-plugin"

...