April 14, 2006: Weekly Kettle Tip: Pointers Using Kettle, from Matt Casters, Chief of Data Integration, Pentaho
This weeks Kettle tip was not written by me or one of the Kettle developers. It comes from people in Daytona Beach (FL) that have been using Kettle for a while now. I'm reposting their experiences and pointers unchanged (with permission) in the hope that it will help others.
Setting
We were new to using Kettle and want to use kettle as an ETL tool to load a datawarehouse from 50 different sources.
Mapping
1st misconception - Assumed mapping meant mapping one field to another (i.e. pulling data from a source table and mapping it to a destination table).
To peform this type of Mapping use the Select Value Step (this allows you to extract data from source and pass it on to the successor step (i.e. map input field name to output field name)).
Constant values
Use of Constant Step - allows you to map a fixed value to an output step (i.e. say you have a field STATE and you want to put FL in each record in the destination table, you would use a constant step).
In the constant step you can map many constants/fixed values with one step.
Table Output
Not all field in the destination table have to have a value put into them.
The tool may give you warnings (in the validation step), but its not necessary to have every field in the destination have a value being passed it.
Commit size
To improve performance of transformations, add the value to the commit size in the table output step.
Example:
If commit size is specified an appropriate size in the "Table Output" step, this will cause major boost to performance of a transformation.
We were testing with 5000 records, using the default size of 0, it took 160 seconds.
Increasing the size to 1000 made the process run in 3.5 seconds.
I hope you found these tips useful and if you have others, feel free to post them here, or send them over to me.
Kind regards,
Matt