03. Hello World
Hello World 예제
이 예제는 간단하지만 PDI의 몇 가지 기본적인 내용들에 대해서는 설명이 될 것이다.
- Spoon을 사용하는 방법
- Transformations
- Step 과 Hop
- 미리 선언된 변수들
- Spoon을 통한 미리보기와 실행
- Pan 툴을 이용한 터미널에서 Transformation 실행하기
Overview
인물정보를 가지고 있는 CSV 파일을 XML파일로 변환하되 여기에 각 인물마다 인삿말을 더하는 작업을 한다고 해보자.
아래의 CSV파일의 내용이다.
성, 이름 김, 철수 박, 금자 최, 민성 윤, 영서 서, 승현 조, 영희
위의 CSV는 아래와 같은 XML이 될 것이다.
- <Rows> - <row> <msg>안녕, 철수!</msg> </row> - <row> <msg>안녕, 금자!</msg> </row> - <row> <msg>안녕, 민성!</msg> </row> - <row> <msg>안녕, 영서!</msg> </row> - <row> <msg>안녕, 승현!</msg> </row> - <row> <msg>안녕, 영희!</msg> </row> </Rows>
이렇게 이름에 인삿말을 더해 파일을 생성하는 것이 첫 번째 Transformation의 목표다.
Transformation은 Hop으로 연결된 Step으로 이루어진다. 이러한 Step과 Hop은 데이터의 흐름을 나타낸다. 따라서 Transformation 은 데이터 흐름 지향적(data-flow oriented) 이라고 할 수 있다.
환경 준비
Transformation 을 하기전에 먼저 개발하기 편한 위치에 Tutorial 폴더를 만들자. 여기에 이번 튜토리얼의 모든 파일을 저장할 것이다. 위에서 본 CSV파일을 list.csv 이름으로 폴더에 저장하자.
Transformation 과정
이 작업은 3개의 subtask로 나눠서 수행될 것이다.
- Transformation 생성
- Step과 Hop을 이용해 Transformation의 뼈대를 구성
- 행동 순서대로 각 Step을 설정
Transformation 생성
- 새 파일 를 클릭하고 Transformation 을 선택한다. 아니면 파일 메뉴를 클릭하고 새로 만들기 를 선택한 후 Transformation 을 선택해도 된다. 또 다른 방법으로 Ctrl-N 단축키를 눌러도 된다.
- 왼쪽의 뷰 네비게이터에서 Transformation 1 을 더블클릭하거나 우클릭 후 설정 메뉴를 선택한다. 아니면 Ctrl+T 단축키를 누른다.
- Transformation 프로퍼티를 설정할 수 있는 윈도우가 나타난다. 우선은 Transformation 이름과 설명란에만 내용을 입력하고 확인 을 누른다.
- Tutorial폴더에 hello 라는 이름으로 Transformation을 저장하자. hello.ktr 파일 생성될 것이다.
Constructing the skeleton of the Transformation using Steps and Hops
A Step is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, among others. Each Step is designed to accomplish a specific function, such as reading a parameter or normalizing a dataset.
A Hop is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that Hop constitutes the Output Data of the origin Step, and the Input Data of the destination Step. A Hop has only one origin and one destination, but more than one Hop could leave a Step. When that happens, the Output Data can be copied or distributed to every destination. Likewise, more than one Hop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create the Output.
A Transformation has to do the following:
- Read the CSV file
- Build the greetings
- Save the greetings in the XML file
For each of these items you'll use a different Step, according to the next diagram:
In this example, the correspondence between tasks and Steps is one-to-one because the Transformation is very simple. It isn't always that way, though.
Here's how to start the Transformation:
- To the left of the workspace is the Steps Palette. Select the Input category.
- Drag the CSV file onto the workspace on the right.
- Select the Scripting category.
- Drag the Modified JavaScript Value icon to the workspace.
- Select the Output category.
- Drag the XML Output icon to the workspace.
Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:
- Select the first Step.
- Hold the Shift key and drag the icon onto the second Step.
- Link the Modified Java Script Value with the XML Output via this same process.
Specifying Step behavior
Every Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong. However, Step Name is always a representative name inside the Transformation - this doesn't change among Step configurations. Step Description allows you to clarify the purpose of the Step.
Configuring the CSV file input Step
- Double-click on the CSV file input Step.
- The configuration window belonging to this kind of Step will appear. Here you'll indicate the location, format and content of the input file.
- Replace the default name with one that is more representative of this Step's function. In this case, type in name list.
- In the Filename field, type the name and location of the input file.
Note: Just to the right of the text box is a symbol with a red dollar sign. This means that you can use variables as well as plain text in that field. A variable can be written manually as ${name_of_the_variable} or selected from the variable window, which you can access by pressing Ctrl-Spacebar. This window shows both predefined and user-defined variables, but since you haven't created any variables yet, right now you'll only see the predefined ones. Among those, select:
${Internal.Transformation.Filename.Directory}
Next the name of the variable, type a slash and the name of the file you created:
${Internal.Transformation.Filename.Directory}/list.csv
At runtime the variable will be replaced by its value, which will be the path where the Transformation was saved. The Transformation will search the file
list.csv in that location. - Click Get Fields to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Header row present checkbox is checked).
Note: The Get Fields button is present in most Steps' configuration windows. Its purpose is to load a grid with data from external sources or previous Steps. Even when the fields can be written manually, this button gives you a shortcut when there are many available fields and you want to use all or almost all of them.
- The grid has now the names of the columns of your file: last_name and name, and should look like this:
- Switch lazy conversion off
- Click Preview to ensure that the file will be read as expected. A window showing data from the file will appear.
- Click OK to finish defining the Step CSV file input.
Configuring the Modified JavaScript Value Step
- Double-click on the Modified JavaScript Value Step.
- The Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScript code. You will use it to build the message "Hello, " concatenated with each of the names.
- Name this Step Greetings.
- The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the code. In particular, the last two branches have the input and output fields, ready to use in the code. In this example there are two fields: last_name and name. Write the following code:
var msg = 'Hello, ' + name.getString() + "!";
Note: The text name.getString() can be written manually, or by double-clicking on the text in the function tree.
- At the bottom you can type any variable created in the code. In this case, you have created a variable named msg. Since you need to send this message to the output file, you have to write the variable name in the grid. This should be the result:
Warning: Don't mix these variables with PDI variables - they are not the same.
Note: Modified is not an adjective for JavaScript, but for the Step. You are not dealing with a variant of JavaScript - it is the Step itself that is modified. It is an enhanced version of the original Step, which you found in previous versions of PDI.
- Click OK to finish configuring Step Modified Script Value.
- Select the Step you just configured. In order to check that the new field will leave this Step, you will now see the Input and Output Fields. Input Fields are the data columns that reach a Step. Output Fields are the data columns that leave a Step. There are Steps that simply transform the input data. In this case, the input and output fields are usually the same. There are Steps, however, that add fields to the Output - Calculator, for example. There are other Steps that filter or combine data causing that the Output has less fields that the Input - Group by, for example.
- Right-click the Step to bring up a context menu.
- Select Show Input Fields. You'll see that the Input Fields are last_name and name, which come from the CSV file input Step.
- Select Show Output Fields. You'll see that not only do you have the existing fields, but also the new msg field.
Configuring the XML Output Step
- Double-click the XML Output Step. The configuration window for this kind of Step will appear. Here you're going to set the name and location of the output file, and establish which of the fields you want to include. You may include all or some of the fields that reach the Step.
- Name the Step File with Greetings.
- In the File box write:
${Internal.Transformation.Filename.Directory}/Hello.xml
- Click Get Fields to fill the grid with the three input fields. In the output file you only want to include the message, so delete name and last_name.
- Save the Transformation again.
How does it work?
When you execute a Transformation, almost all Steps are executed simultaneously. The Transformation executes asynchronously; the rows of data flow through the
Steps at their own pace. Each processed row flows to the next Step without waiting for the others. In real-world Transformations, forgetting this characteristic can be a significant source of unexpected results.
At this point, Hello World is almost completely configured. A Transformation reads the input file, then creates messages for each row via the JavaScript code, and then the message is sent to the output file. This is a small example with very few rows of names, so it is difficult to notice the asynchronous execution in action. Keep in mind, however, that it's possible that at the same time a name is being written in the output file, another is leaving the first Step of the Transformation.
Verify, preview and execute
- Before executing the Transformation, check that everything is properly configured by clicking Verify. Spoon will verify that the Transformation is syntactically correct, and look for unreachable Steps and nonexistent connections. If everything is in order (it should be if you followed the instructions), you are ready to preview the output.
- Select the JavaScript Step and then click Preview button. The following window will appear:
- As you can see, Spoon suggests that you preview the selected Step. Click QuickLaunch. After that, you will see a window with a sample of the output of the JavaScript Step. If the output is what you expected, you're ready to execute the Transformation.
- Click Run.
- Spoon will show a window where you can set, among other information, the parameters for the execution and the logging level. Click Launch.
- A new window tab will appear in the Job window. This is the log tab, which contains a log of the current execution.
The log tab has two sections: An upper part and a lower part.
In the upper side you can see the executed operations for each Step of the Transformation. In particular, pay attention to these:
- Read: the number of rows coming from previous Steps.
- Written: the number of rows leaving from this Step toward the next.
- Input: the number of rows read from a file or table.
- Output: the number of rows written to a file or table.
- Errors: errors in the execution. If there are errors, the whole row will become red.
In the lower portion of the window, you will see the execution step by step. The detail will depend on the log level established. If you pay attention to this detail, you will see the asynchronicity of the execution. The last line of the text will be:
Spoon - The transformation has finished!!
If there weren't error messages in the text, open the newly generated Hello.xml file and check its content.
Pan
Pan allows you to execute Transformations from a terminal window. The script is pan.bat on Windows, or pan.sh on other platforms, and it's located in the installation folder. If you run the script without any options, you'll see a description pan with a list of available options.
To execute your Transformation, try the simplest command:
Pan /file <Jobs_path>/Hello.ktr /norep
- /norep is a command to ask Spoon not to connect to the repository.
- /file precedes the name of the file that contains the Transformation.
- <Jobs_path> is the full path to the Tutorial folder, for example:
C:/Pentaho/Tutorial
or
/home/PentahoUser/Tutorial
The other options are run with default values.
After you enter this command, the Transformation will be executed in the same way it did inside Spoon. In this case, the log will be written to the terminal unless you specify a file to write to. The format of the log text will vary a little, but the information will be basically the same that you saw in the graphical environment.