03. Hello World Example
PLEASE NOTE: This tutorial is for a pre-5.0 version PDI. If you are on PDI 5.0 or later, please use https://help.pentaho.com/Documentation.
Hello World Example
Although this will be a simple example, it will introduce you to some of the fundamentals of PDI:
Working with the Spoon tool
Transformations
Steps and Hops
Predefined variables
Previewing and Executing from Spoon
Executing Transformations from a terminal window with the Pan tool.
Overview
Let's suppose that you have a CSV file containing a list of people, and want to create an XML file containing greetings for each of them.
CSV File Contents:
last_name,name
Suarez,Maria
Guimaraes,Joao
Rush,Jennifer
Ortiz,Camila
Rodriguez,Carmen
da Silva,Zoe
Desired Output:
<Rows>
<row>
<msg>Hello, Maria!</msg>
</row>
<row>
<msg>Hello, Joao!</msg>
</row>
<row>
<msg>Hello, Jennifer!</msg>
</row>
<row>
<msg>Hello, Camila!</msg>
</row>
<row>
<msg>Hello, Carmen!</msg>
</row>
<row>
<msg>Hello, Zoe!</msg>
</row>
</Rows>
A Transformation is made of Steps, linked by Hops. These Steps and Hops form paths through which data flows.
Preparing the environment
Before starting a Transformation, create a Tutorial folder, where you'll save all the files for this tutorial. Then create a CSV file like the one shown above, and save it in the Tutorial folder as list.csv.
Transformation walkthrough
The proposed task will be accomplished in three subtasks:
Creating a new Transformation
Designing the basic flow of the transformation, by adding steps and hops
Configuring the steps for the dataset and the desired actions
Creating a new Transformation
Click New, then select Transformation. Alternatively you can go to the File menu, then select New, then Transformation. You can also just press Ctrl-N.
Click Save, and save it into the Tutorial folder with the name hello. The transformation will be stored as a hello.ktr file.
Designing the basic flow of the transformation, by adding steps and hops
A Step is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, among others. Each Step is designed to accomplish a specific function, such as generating a random number or inserting rows into a database table.
A Hop is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that Hop constitutes the Output Data of the origin Step, and the Input Data of the destination Step. A Hop has only one origin and one destination, but more than one Hop could leave a Step. When that happens, the Output Data could be distributed among the outgoing hops, or copied entirely to each outgoing hop. Likewise, more than one Hop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create the Output.
Our Transformation has to do the following:
Read the CSV file
Build the greetings message
Save the greetings in the XML file
For each of these items you'll use a different Step, according to the next diagram:
In this example, each task will be done in a single step, due to the simplicity of the requirements. For more complex transformations, it may take many more steps to achieve the desired result.
Here's how to start the Transformation:
To the left of the workspace is the Steps Palette. Select the Input category.
Drag the CSV file onto the workspace on the right.
Select the Scripting category.
Drag the Modified JavaScript Value icon to the workspace.
Select the Output category.
Drag the XML Output icon to the workspace.
Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:
Select the first Step.
Hold the Shift key and drag the icon onto the second Step.
Link the Modified Java Script Value with the XML Output via this same process.
Specifying Step behavior
Every Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong. The Step Name can be set within the configuration window, making it easier to understand what each step will do. A Step Description is also available allows you to clarify the purpose of the Step for documentation purposes.
Configuring the CSV file input Step
Double-click on the CSV file input Step.
The configuration window for the step will appear. Here you'll indicate the file location, file format (e.g. delimiters, enclosure characters, etc.) and column metadata (e.g. column name, data type, etc)
Change the step name with one that is more representative of this Step's function. In this case, type in name list.
For the Filename field, click Browse and select the input file.
Click Get Fields to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Header row present checkbox is checked).
The grid has now the names of the columns of your file: last_name and name, and should look like this:
Switch lazy conversion off
Click Preview to ensure that the file will be read as expected. A window showing data from the file will appear.
Click OK to finish defining the Step CSV file input.
Configuring the Modified JavaScript Value Step
Double-click on the Modified JavaScript Value Step.
The Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScript code. You will use it to build the message "Hello, " concatenated with each of the names.
Name this Step Greetings.
The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the code. In particular, the last two branches have the input and output fields, ready to use in the code. In this example there are two fields: last_name and name. Write the following code:
var msg = 'Hello, ' + name + "!";At the bottom you can type any variable created in the code. In this case, you have created a variable named msg. Since you need to send this message to the output file, you have to write the variable name in the grid. This should be the result: