Configuring Pentaho for your Hadoop Distro and Version (Pentaho Suite Version 5.1)

Configuring Pentaho for your Hadoop Distro and Version (Pentaho Suite Version 5.1)

How to set up and configure Pentaho (Kettle, Pentaho Data Integration, Pentaho Business Analytics Suite) for your specific Hadoop distribution.

This page applies to Pentaho Suite version 5.1.  For 5.2 or later, go here.  For 5.0 or earlier, go here.

Pentaho supports different versions of Hadoop distributions from several vendors such as Cloudera, Hortonworks and MapR. To support this many versions, Pentaho uses an abstraction layer, called a shim, that connects to the different Hadoop distributions. A shim is a small library that intercepts API calls and either redirects or handles them, or changes the calling parameters. Periodically, Pentaho develops new shims as vendors develop new Hadoop distributions and versions. These big data shims are tested and certified by Pentaho engineers. The following steps will help you get Pentaho set up to work with your Hadoop cluster.

New Shims

Pentaho provides Enterprise Edition support for distributions by Cloudera, Hortonworks and MapR. It is Pentaho's intention to support new distro's and versions by these vendors as soon as we can after general availability. Sometimes this may take a long time. Newly supported shims are made available via monthly service packs.

Due to the rapid pace of development and frequency of releases on the part of the distro vendors, Pentaho can only test and fully support the last two major releases for each vendor. There is no reason that previous shims would stop working as new ones are released but we have to draw the line at 2.

This page, and the version matrix on it, reference both supported and unsupported distributions. References to un-supported versions is for community users and does not imply support.

Determine the proper shim for your Hadoop Distro and version

Pentaho is pre-configured for Apache Hadoop 0.20.2. If you are using this distribution and version, no further configuration is required.

In the following table, click the tab of the Hadoop distribution that you are interested in, then locate the version of the distribution you want to use. Note the name of the corresponding shim and the minimum version of the Pentaho software that supports it.

For example, if you want to use the Cloudera's CDH 4.2.1, click the Cloudera tab, then look in the Hadoop version column. CDH4.2.x is supported with shim cdh42. You need to have Pentaho Business Analytics (or Pentaho Data Integration) version 5.0 or later installed to use this shim.


Pentaho Shim Support Matrix

 Apache

Version

Shim

Pentaho Suite Ver+

Download

Notes

0.20.x

hadoop-20

5.0

included in 5.0, 5.1

 

1.0.x

NS*

 

 

No Support planned

1.1.x

NS*

 

 

No Support planned

1.2.x

NS*

 

 

No Support planned

2.x.x

NS*

 

 

No Support planned

Go to Apache releases

 Click here to expand...

 Cloudera

Version

Shim

Pentaho Suite Ver+

Download

Notes

CDH4.0, 4.0.1, 4.1, 4.1.1

cdh4

5.0

download

The cdh42 shim also supports this configuration

CDH4.1.2

cdh412

5.0

download

The cdh42 shim also supports this configuration

CDH4.1.3

cdh413

5.0

download

The cdh42 shim also supports this configuration

CDH4.2.x

cdh42

5.0

included in 5.0, 5.1

Backward compatible with all earlier cdh4.x distributions

CDH4.3 - CDH4.6

cdh42

5.0

included in 5.0, 5.1

 

CDH4.7

++cdh42

NS

included in 5.0, 5.1

++Not yet QA tested but minor releases rarely have issues PDI-12313

CDH5

cdh50

**5.0.4

included with 5.0.6, 5.1

 

CDH5.1

cdh51

5.2

included with 5.2

 

Go to Cloudera releases

*NOTE: the cdh42 shim supports all versions of CDH from 4.0 through 4.6.x

 Click here to expand...

 Hortonworks

Version

Shim

Pentaho Suite Ver+

Download

Notes

HDP 1.2.x

hdp12

4.8 + BD Plugin 1.3.2+

download

 

HDP 1.3.x

hdp13

4.8 + BD Plugin 1.3.2+

download

 

HDP 1.3 for Win

*NS

 

 

Testing and support is waiting for customer demand. Vote here: PDI-10266

HDP 2.0

hdp20

**5.0.4

included in 5.0.4 and 5.1

 

HDP 2.1

hdp21

5.2

included in 5.2

 

Go to Hortonworks releases

 Click here to expand...

Version

Shim

Pentaho Suite Ver+

Download

Notes

IDH 2.3

idh23

4.8 + BD Plugin 1.3.2+

download