How to set up and configure Kettle for your specific Hadoop distribution.
The Pentaho applications come pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required.
Documentation for configuring Pentaho for distros other than Apache Hadoop 0.20.2 is now located on the Pentaho Infocenter here
Currently supported Hadoop distributions:
Pentaho uses an abstraction layer to facilitate supporting the rapid and never ending distributions version updates. We call this layer a shim. The following list shows the current known support and status of various distributions. We generally do not have to update a shim for a minor or patch version change.
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
CDH3u3, u4 and u5 |
CDH3U4 |
4.8+ |
Support will be dropped in 5.0 |
CDH4.0, 4.0.1 |
CDH4 |
4.8+ |
|
CDH4.1, 4.1.1 |
CDH4 |
4.8+ |
|
CDH4.1.2, 4.1.3 |
NS* |
4.8.1.1+ |
|
CDH4.2 |
NS* |
4.8.1.1+ |
|
Go to Cloudera releases
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
0.20.x |
hadoop-20 |
4.8 |
|
1.0.x |
NS* |
|
|
1.1.x |
NS* |
|
Distro is Beta |
2.x.x |
NS* |
|
Distro is Alpha |
Go to Apache releases
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
1.1.3, 1.2.0 |
Mapr |
4.8, 5.0 |
|
2.0.x |
NS* |
5.0 |
Will be supported in Kettle 5.0 (next release) |
2.1.x |
NS* |
5.0 |
Will be supported in Kettle 5.0 (next release) |
Go to MapR releases
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
IHD 2.3 |
NS* |
5.0 |
Will be supported in Kettle 5.0 (next release) |
Go to Intel releases
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
HDP 1.2.x |
NS* |
5.0 |
Will be supported in Kettle 5.0 (next release)PDI-8035. People have been successful using hadoop-20 shim |
HDP 2.x |
NS* |
5.0 |
Distro is Alpha |
Go to Hortonworks releases
Hadoop Version |
Shim |
Pentaho Ver |
Notes |
---|---|---|---|
DSE 3.0 |
NS* |
|
May be supported in 5.0 but not committed |
DSE 2.2.x |
NS* |
|
|
Go to DataStax releases
* NS - Not supported. See Hadoop Configurations for information on how to create or modify a shim to support your configuration
+ Pentaho Ver is the earliest version of the Pentaho suite that supports this shim. Subsequent Pentaho versions will also support this shim unless otherwise noted.
The Pentaho support policy for Hadoop is available on the Pentaho Support Plan for Hadoop Distributions page.