Excerpt |
---|
How to set up and configure Pentaho (Kettle, Pentaho Data Integration, Pentaho Business Analytics Suite) for your specific Hadoop distribution. |
This page applies to Pentaho Suite versions 5.2 and later. For 5.1 or earlier,
...
click here
Pentaho supports different versions of Hadoop distributions from several vendors such as Cloudera, Hortonworks, and MapR. To support this many versions, Pentaho uses an abstraction layer, called a shim, that connects to the different Hadoop distributions. A shim is a small library that intercepts API calls and redirects or handles them, or changes the calling parameters. Periodically, Pentaho develops new shims as vendors develop new Hadoop distributions and versions. These big data shims are tested and certified by Pentaho engineers. The following steps will help you get Pentaho set up to work with your Hadoop cluster.
...
- Support Matrix for 5.2
- Support Matrix for 5.3 (HDP 2.2 is supported with the March 5.3 patch release.)
NOTE: Pentaho is pre-configured for Apache Hadoop 0.20.2. If you are using this distribution and version, no further configuration is required.
...
CDH 5 users who want to configure CDH 5 to use Map Reduce 1 instead of Map Reduce 2, follow the instructions in Additional Configuration for using MR1 with CDH5.
Future Release Roadmap
The Support for the following Hadoop Configurations are planned for post an upcoming patch or future release.
- CDH 5.3
...
- HDP 2.2
- Amazon EMR
- Spark on CDH 5.3
Next Steps
Now that you've configured Pentaho for your Hadoop distribution, there are many things you can do. Here are a few links to get you started!
...