Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You can customize an existing Kettle environment install in HDFS by manually copying jars and plugins into HDFS. This can be done manually (hadoop fs -copyFromLocal <localsrc> ... <dst> or with the Hadoop Copy Files job entry.

See Appendix B for the supported directory structure in HDFS.

Adding JDBC drivers to the Kettle environment

JDBC drivers and their required dependencies must be placed in the installation directory's lib/ directory.

Upgrading from the Pentaho Hadoop Distribution (PHD)

...

  1. Remove the pentaho.* properties from your mapred-site.xml
  2. Remove the directories those properties referenced
  3. Restart the TaskTracker process

Appendix A: pentaho-mapreduce-libraries.zip structure

...

Anchor
appendix-a
appendix-a

Code Block
pentaho-mapreduce-libraries.zip/
  `- lib/
      +- kettle-core-{version}.jar
      +- kettle-engine-{version}.jar
      `- .. (all other required Kettle dependencies and optional jars)

Appendix B: Example Kettle environment installation directory structure within DFS
Anchor
appendix-b
appendix-b

Code Block
/opt/pentaho/mapreduce/
  +- 4.3.0/
  |   +- lib/
  |   |   +- kettle-core-{version}.jar
  |   |   +- kettle-engine-{version}.jar
  |   |   `- .. (all other required Kettle dependencies and optional jars)
  |   `- plugins/
  |       +- pentaho-big-data-plugin/
  |       `- .. (additional optional plugins)
  `- custom/
      +- lib/
      |   +- kettle-core-{version}.jar
      |   +- kettle-engine-{version}.jar
      |   +- my-custom-code.jar
      |   `- .. (all other required Kettle dependencies and optional jars)
      `- plugins/
          +- pentaho-big-data-plugin/
          |   ..
          `- my-custom-plugin/
              ..