...
Architecture Overview
TODO
Type Mapping
In order to pass data between Hadoop and Kettle there must be some degree of type conversion. Here's the type mapping for the built in Kettle types:
Kettle Type | Hadoop Type |
| |
| |
| |
| |
| |
| |
| |
Distributed Cache
Pentaho MapReduce relies on Hadoop's Distributed Cache to distribute the Kettle environment, configuration, and plugins across the cluster. By leveraging the Distributed Cache network traffic is reduced up for subsequent executions as the Kettle environment is automatically configured on each node. This also allows you to use multiple version of Kettle against a single cluster.
...