Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Dear Kettle User,
Lately, more and more people ask about using Kettle to do data migration from one database to another. Usually this involves copying data from a propriatery (expensive) database to an open source database.

Data migration projects are arguably one of the most difficult there are because while in theory all relational database use SQL, in practice, database vendors do as much as possible to provide their users with propriatery extensions. Obviously they do this to create a vendor lock-in and to have that customer stay with the company for a long time thereafter.

However, that being said, there are a few options available in Kettle that will help you to do database migrations and also help you to automate the tasks.

One of the features that I will cover today is the use of the "Copy Tables Wizard" that is available in the job editor called Chef. Chef is part of the standard Pentaho Data Integration suite of products.

The requirements for this wizard to work is that you have a Kettle Repository installed and that you are logged into this repository. You also need to have created at least 2 database connections: one for the source database and one for the target database.

You can start the wizard by selecting the appropriate menu in Chef:

Then you need to select the source and target databases:

After that you can select the tables you would like to copy:

Finally, you are being asked to give the new job that this wizard will create a name and a location in the repository. (click on the "..." button to select a directory to store the job and transformations in)

After selecting "Finished" Chef will be creating a job containing a series of job entries that will create the SQL to create the tables and also the transformations to copy the data. When finished, this is what it typically looks like:

Running this job will cause the tables to be created and populated on the target database.

Here is an example of a generated SQL job entry dialog:

And here is an example of a generated transformation copy job entry:

Until next time,

Matt

  • No labels