Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Excerpt

May 25, 2006
Submitted by James Dixon, Pentaho Chief Geek

This technical tip shows how to configure Kettle so that input files such as XML, CSV and Excel files can be loaded from a Pentaho solution folder.

...

As always, you will want to gather the  necessary resources before you start the hands on part of this article. This article's tip will NOT work with PCI (the Pentaho demo) versions PRIOR to release milestone 1.1.6.   

  • Pentaho Pre-Configured Install, version 1.1.6 build 279 or later
  • Pentaho Getting Started Guide, version 1.1.6 build 279 or later
  • Kettle, version 2.2.2 or later
  • Our sample XML file, the CD collection (cdcollection.xml) - you can download that file here.  Use your browser's right-click | Save As... option to right-click on the link and save the this file to your hard drive in a spot you will remember.
    In order to keep this tip short and to the point, I'll assume you have a working knowledge of Kettle. It is an intuitive application to use, so if you are not familiar with it, you can get up to speed rather quickly.

...

  1. Copy the cdcollection.xml file to the following directory under your PCI's solution folders: <pentaho-demo>/pentaho-solutions/samples/etl/cdcollection.xml.
  2. Launch Kettle's Spoon application using the spoon.bat (Windows users) or the spoon.sh (*nix users) file in the root of the Kettle installation.
  3. In the tree on the left pane, locate the XML Input step under Base step types | Input.  Drag an XML Input step from the tree in the left pane to the right working pane.
  4. Double-click on the XML Input step in the right working pane to bring up the XML Input step properties dialog.
  5. Click the Browse button to locate the cdcollection.xml file in the Pentaho solution folders. Once you have selected the file, you will see the path to it in the File textbox.
  6. Next, we want to substitute the path to the root of the solutions folders with the environment variable pentaho.solutionpath, so when we move this solution to another server (likely in a real world scenario), the path to the data file remains relative to the solution and won't need to be changed.  To do this, click on the Variable button.  From the popup list, select pentaho.solutionpath. Notice that %%pentaho.solutionpath%% (${pentaho.solutionpath}
     in in *nix) has been prepended to the path to the xml file.
  7. Now change the path to the xml file so that the %%pentaho.solutionpath%% replaces the root portion of the path to the solution files, and change all backslashes to forward slashes. In our example, the new path would look like this:
    Code Block
    
    %%pentaho.solutionpath%%samples/etl/cdcollection.xml
    
  8. We change the slashes because it is safest to use '/' as the file path separator as this text is used by Spoon and the Pentaho server and it will work equally well on Windows and Linux and OS X, whereas '\' will only work on Windows.
  9. Click the Add button to add the path to your xml file to the Selected Files list.

    Image Added
  10. Switch to the Content tab. Here we want to specify the location of the node in the xml file that represents the repeating data that will become rows of data in our resultset. In the cdcollection.xml file the cd node under the catalog node is the location that represents our repeating data. In the Location list, add the catalog element first, then add the cd element second.

    Image Added
  11. Switch to the Fields tab. Click the Get Fields button. If all has gone well, you should see the Field list populated with 4 fields - Title1, Artist1, Price1 and Category1.
  12. Click the Preview Rows button. Your transformation is working successfully if you get a popup dialog filled with the CD collection data. If you don't, go back and carefully verify each step again.
  13. Click OK to close the properties dialog.
  14. Finally, we want to export your new transformation to your Pentaho solutions folders. From the File menu, choose the Export to XML option, and save your transformation as cdcollection_transform.xml in the <pentaho-demo>/pentaho-solutions/samples/etl directory. 

...

  1. To finish this thing up, we will reuse the sample etl action sequence that comes with the PCI.
  2. Make a copy of the SampleTransformation.xaction file and name that copy xml_input.xaction. You can find the SampleTransformation.xaction file in <pentaho-demo>/pentaho-solutions/sampes/etl directory.
  3. Make a copy of the SampleTransformation.properties file and name that copy xml_input.properties. You can find the SampleTransformation.properties file in <pentaho-demo>/pentaho-solutions/sampes/etl directory.
  4. Open the xml_input.properties file in your favorite text editor.
  5. At the top of the file, change the value of the <name> node to be xml_input.xaction. It should look like this:
    Code Block
    
    <name>xml_input.xaction</name>
    
  6. Under the resources/transformation-file/solution-file nodes, change the value of the location node to cdcollection_transform.xml. It should look like this:
    Code Block
    
    <location>cdcollection_transform.xml</location>
    
  7. Under the component-definition node, change the value of the importstep node to "XML Input", without the quotes. This is the name of the step we created in our transformation. If you changed the step name in the transformation, then also change it here.  It should look similar to this:
Code Block

<importstep>XML Input</importstep>
  1. Save and close the xml_input.xaction file.
  2. Open the xml_input.properties file in your favorite text editor.
  3. Change the value of the title property to "2. XML Input Example", without the quotes.
  4. Change the value of  the description property to "How to configure Kettle so that input files such as XML, CSV and Excel files can be loaded from a Pentaho solution folder.", without the quotes.
  5. Save and close the xml_input.properties file. 

...

  1. First, make sure your PCI is up and running. If you don't know how to get it started, or are unsure as to whether it's already running, see the Pentaho Getting Started Guide for how to successfully start the server.
  2. Next make sure that the solution folders you are using in the PCI contains your xml_input.xaction!! 
  3. Navigate through the sample pages to the ETL samples. From the Samples home page, go to "A Collection of Samples and Examples" | "Extraction, Transformation and Loading with Kettle".
  4. You should see the link to your new action sequence, labeled "XML Input Example". If you don't, try refreshing your solution repository by navigating to Content and Settings, and clicking the Publish link for the Solution Repository.
  5. Click the "XML Input Example" link. You should see your CD collection data in a new browser window.

    Image Added

Our example deployment of this solution is just one way you could deploy it. You can execute this action sequence in this PCI or any other Pentaho server using the web service, Java API or user interface. The Pentaho server automatically sets the 'pentaho.solutionpath', so no configuration is necessary on the server. 

...