A guide to setting up PDI in a Microsoft client-server style environment
A guide to setting up Kettle for client-server style development in a (oh no!) Windows environment.
This information is scattered throughout the forum which made we wish I had a guide like thisso I thought it more newbies to have the information in one place.
Thanks to all the contributors on the forums that did all the ground work for the Windows environment.
Please feel free to edit/correct this guide as required.
Scenario:
Users need to use their own machines (desktops/notebooks) to develop but test/run their jobs/transformations on an ETL server.
In this scenario we will be using a central repository to store the ETL jobs. (Someone might want to add what to do differently when using svn).
What you'd need before you start...
- The latest JRE for the server and client
- The latest stable PDI (zip file)
- A kettle repository with a username and password for each developer
- The jobs/transformations logs are written to a database.
- Your .kettle and .pentaho directories containing the information about the repsoitories and db connections you will be using on this project. (Optional)
Setting up ETL Server:
Install the latest JRE if it's not already installed on the server.
Extract PDI to a directory of your choice.
(Note: For a 64-bit machine running the 64-bit JRE you will need to replace "YourPentahoDir\libswt\win32\swt.jar" with the 64-bit version.)
Now edit carte.bat as follows;
- the last line needs to change to allow it to be run in the background without needing parameters, i.e.
start /belownormal javaw %OPT% org.pentaho.di.www.Carte 0.0.0.0 8080
(The port 8080 can of course be changed to whatever you require)
- set it to work in a non-user directory (optional)
- set the memory to what you would require/have avilable
(the limit for 32-bit JVM is around 1583)
Now shedule carte.bat to be run every time the machine is started using Windows Task Scheduler.
Restart the machine and test access to http://yourservername/kettle/status
If the status page is loaded your server you're almost there
Now copy your .kettle and .pentaho directories into the directory you set up as your (kettle) home directory as this tells carte which db connections, etc to use.
(If you do not have these files yet remember to come back to this step once you have set up the first client.)
Finally map a drive on the local machine to the directory where all your (data and work) files will be stored which you need to process.
(This is the easiest way. It is possible to assign relative paths but I did not have the time to reseach this.)
The server should now be ready to work with clients.
Setting up Client/Desktop machine(s):
Each user that wishes to develop/test PDI jobs/transformations utilising the server and respository needs to do the following.
Map a drive to the ETL server. (Make sure you assign the same drive letter to it that you did on the ETL server.)
Install JRE on your machine if it is not already installed on the client.
Unzip PDI to a directory of your choice.
Copy the .kettle and .pentho directories to C:\Documents and Settings\UserUsingPDI
(If you do not have these files don't worry it will be setup when setting up the first client.)
Send a shortcut of spoon.bat to their desktop for convenience. (Optional)
Run spoon
If you had an existing .kettle directory with all the configuration settings, the user is not presented with a list of the available repositories then either it has not been setup or the .kettle directory is in the wrong place.
If not then you have to create a new repository or setup the connection to the repository.
I'm assuming the slave (ETL) server has not been setup as otherwise you would not be reading this guide. The first user/developer to log on would thus need to define the slave server.
Do this by opening a job/transformation from the repository or create a new job to be saved to the repository. A simple read from db to dummy step would be sufficient for a new job.
In the left-hand pane select the view mode, expand the tree and right click on slave server, select new and set up the slave server.
Test the slave server by running the transformation, selecting remote execution and choosing the slave server you have just defined. You will be taken to the monitor slave pane where you can test to see it the job has executed successfully.
Once the remote execution is working right click on the slave server and select share in order to make the slave server available to all the transformations/jobs in the repository.
Save the transformation and exit spoon.
The shared.xml file will now have been updated with the shared slave. Use this file to overwrite the one in the .kettle directory that is distributed to everyone on the project.
The next user would not have to set up the repository or slave server if the .kettle and .penthao directories are copied before spoon is started.
The developers will now be able to develop in a client-server style environment.
Ps. remeber to have all the developers set their grid size to the same value, as the stages will always snap to the new points as soon as you do anything with it.