Introduction to Spoon
What is Spoon?
Kettle is an acronym for "Kettle E.T.T.L. Environment." Kettle is designed to help you with your ETTL needs, which include the Extraction, Transformation, Transportation and Loading of data.
Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen. Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data sources. Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository. Jobs are usually scheduled in batch mode to be run automatically at regular intervals.
Note: For a complete description of Pan or Kitchen, see the Pan and Kitchen user guides.
Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. Pan or Kitchen can then read the data to execute the steps described in the transformation or to run the job. In summary, Pentaho Data Integration makes data warehouses easier to build, update, and maintain.
Installing the...
Note: What are developers installing in these steps? A plug-in? Can we be more specific here and provide information as to reasons behind the installation?
Follow the instructions below to install the...
- Install the Sun Microsystems Java Runtime Environment version 1.4 or higher. You can download a JRE for free at http://www.javasoft.com/.
- Unzip the zip-file: Kettle-3.0.zip in a directory of your choice.
- Under Unix-like environments (Solaris, Linux, MacOS, for example), you must make the shell scripts executable. Execute these commands to make all shell scripts in the Kettle directory executable:
cd Kettle chmod +x *.sh
Launching Spoon
The scripts below allow you to launch Spoon on different platforms:
- Spoon.bat: launch Spoon on the Windows platform.
- spoon.sh: launch Spoon on a Unix-like platform such as Linux, Apple OSX, Solaris
To make a shortcut under the Windows platform, an icon is provided. Use "spoon.ico" to set the correct icon. Point the shortcut to the Spoon.bat file.
Supported platforms
The Spoon GUI is supported on the following platforms:
- Microsoft Windows: all platforms since Windows 95, including Vista
- Linux GTK: on i386 and x86_64 processors, works best on Gnome
- Apple's OSX: works both on PowerPC and Intel machines
- Solaris: using a Motif interface (GTK optional)
- AIX: using a Motif interface
- HP-UX: using a Motif interface (GTK optional)
- FreeBSD: preliminary support on i386, not yet on x86_64
Known Issues
Below is a list of known issues associated with Spoon.
Linux
Occasional JVM crashes running SuSE Linux and KDE. Running under Gnome presents no problems. (detected on SUSE Linux 10.1 but earlier versions also have the same problem)
FreeBSD
Problems with drag and drop. Use the right click pop up menu on the canvas as a workaround.
Note: (Insert new step)
Check the Tracker lists at http://kettle.javaforge.com for up-to-date information about
recently discovered issues.
Screen shots
Note: Introductory text here
The Main tree in the upper-left panel of Spoon allows you to browse connections associated with the jobs and transformations you have open. When designing a transformation, the Core Objects palate in the lower left-panel contains the available steps used to build your transformation including input, output, lookup, transform, joins, scripting steps and more. When designing a job, the Core objects palate contains the available job entries. The Core Objects bar contains a variety of job entry types. These items are described in detail in the following chapters: [Database Connections], [Hops], [Transformation Steps], [Job Entries], [Graphical View].
Command line options
Below are the command line options that you can use when starting the Spoon application:
-file=filename
The option below runs the specified transformation (.ktr : Kettle Transformation).
-logfile=Logging Filename
The option below allows you to specify the location of the log file. The default is the standard output.
-level=Logging Level
The level option sets the log level for the transformation being run. Below are the possible values:
Nothing: |
Do not display any output |
Error: |
Only display errors |
Minimal: |
Use minimal logging |
Basic: |
This is the default basic logging level |
Detailed: |
Give detailed logging output |
Debug: |
Show detailed output for debugging purposes. |
Rowlevel: |
Detailed logging at a row level. Warning - this generates a lot of data. |
Connect to the repository with name "Repository name" as shown below:
-rep=Repository name
Note: You must specify the options -user, -pass and -trans described below. The repository details are loaded from the file repositories.xml in the local directory or in the Kettle directory:
$HOME/.kettle/ or C:\Documents and Settings\<username>\.kettle on Windows.
Below is the user name with which you want to connect to the repository.
-user=Username
Below is the password used to connect to the repository.
-pass=Password
Use the option below to select the transformation to run from the repository.
-trans=Transformation Name
Use the option below to select the job to run from the repository.
-job=Job Name
Importants:
- On Windows, Pentaho advises you to use the /option:value format to avoid command line parsing problems by the MS-DOS shell.
- Fields in italic represent the values that the options use.
- Use quotes or double quotes if there are spaces in option values; quotes ensure that option values stay together.
Note: Examine the examples below for more information (what examples?)
Repository
Spoon allows you to store transformation and job files to the local file system or in the Kettle repository. The Kettle repository can be housed in any common relational database. To load a transformation from a database repository, you must connect to this repository. Define a database connection to the repository when you start Spoon, as shown below:
The information associated with repositories is stored in "repositories.xml". This file is located in the hidden directory ".kettle" in your default home directory. On Windows, the file is located in C:\Documents and Settings\<username>\.kettle
Note: The complete path and filename associated with repositories.xml file is displayed on the Spoon console.
If you don't want Welcome dialog to appear each time Spoon starts up, disable the 'Present this dialog at startup' check box or use the Options dialog under the Edit / Options menu. See also 2.14. Options.
Note: The default password for the administrator is admin. Pentaho strongly recommends that you change this default password using the Repository Explorer or the "Repository/Edit User" menu.
Repository Auto-Login
You can have Spoon automatically log into the repository by setting the following environment variables: KETTLE_REPOSITORY, KETTLE_USER and KETTLE_PASSWORD.
Important: Because logging automatically poses a security risk, Pentaho strongly recommends that you always lock your computer to prevent unauthorized access to the repository.
License
Beginning with version 2.2.0, Kettle was released into the public domain under the LGPL license. Please refer to Appendix A for the full text of this license.
Note: Pentaho Data Integration is referred to as "Kettle" below.
Copyright (C) 2006 Pentaho Corporation Kettle is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. Kettle is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the Kettle distribution; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Definitions
Transformation Definitions
The table below contains a list of transformation definitions:
Transformation |
Description |
---|---|
Value |
Values are part of a row and can contain any type of data: strings, floating point numbers, unlimited precision BigNumbers, integers, dates or boolean values |
Row |
A row exists of 0 or more values |
Input Stream |
A stack of rows that enters a step |
Hop |
A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another — the number of streams is equal to the copies of the destination step (one or more) |
Note |
Descriptive text that can be added to a transformation |
Job Definitions
The table below contains a list of job definitions:
Job |
Description |
---|---|
Job Entry |
A part of a job that performs a specific task |
Hop |
A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another — the number of streams is equal to the copies of the destination step (one or more) |
Note |
Descriptive text that that can be added to a job |
Toolbar
The icons on the toolbar of the main screen are from left to right:
Icon |
Description |
---|---|
|
Create a new job or transformation |
|
Open transformation/job from file if you're not connected to a repository or from the repository if you are connected to one. |
|
Save the transformation/job to a file or to the repository. |
|
Save the transformation/job under a different name or filename. |
|
Open the print dialog. |
|
Run transformation/job: runs the current transformation from XML file or repository. |
|
Preview transformation: runs the current transformation from memory. You can preview the rows that are produced by selected steps. |
|
Run the transformation in debug mode allowing you to troubleshoot execution errors. |
|
Replay the processing of a transformation for a certain date and time. This will cause certain steps (Text File Input and Excel Input) to only process rows that failed to be interpreted correctly during the run on that particular date and time. |
|
Verify transformation: Spoon runs a number of checks for every step to see if everything is going to run as it should. |
|
Run an impact analysis: what impact does the transformation have on the used databases. |
|
Generate the SQL that is needed to run the loaded transformation. |
|
Launches the database explorer allowing you to preview data, run SQL queries, generate DDL and more. |
Options
Kettle options allow you to customize properties associated with the behavior and look and feel of the graphical user interface. Examples include startup options such as whether or not to display tips and the Kettle Welcome Page, and user interface options such as fonts and colors. To access the options dialog, select Edit|Options... from the menu bar.
General Tab
Feature |
Description |
---|---|
Maximum Undo Level |
Sets the maximum number of steps that can be undone (or redone) by Spoon |
Default number of lines in preview dialog |
Allows you to change the default number of rows that are requested from a step during transformation previews |
Maximum nr of lines in the logging windows |
Specifies the maximum limit of rows to display in the logging window |
Show tips at startup? |
Sets the display of tips at startup |
Show welcome page at startup? |
Controls whether or not to display the Welcome page when launching Spoon |
Use database cache? |
Spoon caches information that is stored on the source and target databases. In some instances, caching causes incorrect results when you are making database changes. To prevent errors you can disable the cache altogether instead of clearing the cache every time. |
Note: Spoon automatically clears the database cache when you launch DDL (Data Definition Language) statements towards a database connection; however, when using third party tools, clearing the database cache manually may be necessary.
Feature |
Description |
---|---|
Open last file at startup? |
Loads the last transformation you used (opened or saved) from XML or repository automatically |
Auto save changed files? |
Automatically saves a changed transformation before running |
Only show the active file in the main tree? |
Reduces the number of transformation and job items in the main tree on the left by only showing the currently active file |
Only save used connections to XML? |
Limits the XML export of a transformation to the used connections in that transformation. This is helpful while exchanging sample transformations to avoid having all defined connections to be included. |
Ask about replacing existing connections on open/import? |
Requests permission before replacing existing database connections during import |
Replace existing connections on open/import? |
This is the action that's being taken when there is no dialog shown. (see previous option) (huh?) |
Show "Save" dialog? |
Allows you to turn off the confirmation dialogs you receive when a transformation has been changed |
Automatically split hops? |
Turns off the confirmation dialogs you get when you want to split a hop (see also 7.4. Splitting A Hop) |
Show "copy or distribute" dialog? |
Turns off the warning message that appears when you link a step to multiple outputs. This warning message describes the two options for handling multiple outputs:
|
Show repository dialog at startup? |
Controls whether or not the repositories dialog shows up at startup. |
Ask user when exiting? |
Controls whether or not to display the confirmation dialog when a user chooses to exit the application. |
Clear custom parameters (steps/plug-ins) |
Clears all parameters and flags that were set in the plug-in or step dialogs. |
Display tool tips? |
This option controls whether or not to display tool tips for the buttons on the main tool bar. |
Look and Feel tab
Note: Add introductory text here...
Feature |
Description |
---|---|
Fixed width font |
The font that is used in the dialog boxes, trees, input fields, and so on |
Font on workspace |
The font that is used on the graphical view |
Font for notes |
The font to use in notes that are displayed in the Graphical View |
Background color |
Sets the background color in Spoon. It affects all dialogs too |
Workspace background color |
Sets the background color in the Graphical View of Spoon |
Tab color |
The color that is being used to indicate tabs that are active/selected. |
Icon size in workspace |
Affects the size of the icons in the graph window. The original size of an icon is 32x32 pixels. The best results (graphically) are probably at sizes 16,24,32,48,64 and other multiples of 32. |
Line width on workspace |
Affects the line width of the hops on the Graphical View and the border around the step. |
Shadow size on workspace |
If this size is larger then 0, a shadow of the steps, hops, and notes is drawn on the canvas, making it look like the transformation floats above the canvas. |
Dialog middle percentage |
By default, a parameter is drawn at 35% of the width of the dialog, counted from the left. You can change this with this parameter. May be useful in cases where you use unusually large fonts. |
Canvas anti-aliasing? |
Some platforms like Windows, OSX and Linux support anti-aliasing through GDI, Carbon or Cairo. Check this to enable smoother lines and icons in your graph view. If you enable this and your environment doesn't work any more afterwards, change the value for option "EnableAntiAliasing" to "N" in file $HOME/.kettle/.spoonrc (C:\Documents and Settings\<user>\.kettle\.spoonrc on Windows) |
Use look of OS? |
Checking this on Windows allows you to use the default system settings for fonts and colors in Spoon. On other platforms, this is always the case. |
Show branding graphics |
Enabling this option will draw Pentaho Data Integration branding graphics on the canvas and in the left hand side "expand bar". |
Preferred Language |
Here you can specify the default language setting. If a certain text hasn't been translated into this locale, Kettle will fall back to the fail over locale. |
Alternative Language |
Because the original language in which Kettle was written is English, it's best to set this locale to English. |
Search Metadata
This option will search in any available fields, connectors or notes of all loaded jobs and transformations for the string specified in the Filter field. The metadata search returns a detailed result set showing the location of any search hits. This feature is accessed by choosing Edit|Search metadata from the menu bar.
Set environment variable
The Set Environment Variable feature allows you to create and set environment variables for the current user session explicitly. This is a useful feature when designing transformations for testing variable substitutions that are normally set dynamically by another job or transformation.
This feature is accessible by choosing Edit|Set Environment Variable from the menu bar.
Note: This page also displays when you run a transformation that use undefined variables. This allows you to define them right before execution time.
Show environment variables
This feature displays the current list of environment variables and their values. It is accessed by selecting the Edit|Show environment variables option from the menu bar.
Execution log history
If you have configured your Job or Transformation to store log information in a database table, you can view the log information from previous executions by right-clicking on the job or transformation in the Main Tree and selecting 'Open History View'. A view similar to the one below appears:
Note: todo: replace screenshot when PDI-224 is fixed
Note: The log history for a job or transformation also opens by default each next time you execute the file.
Replay
The Replay feature allows you to re-run a transformation that failed. Replay functionality is implemented for Text File Input and Excel input. It allows you to send files that had errors back to the source and have the data corrected. Only the lines that failed before processed during the replay if a .line file is present. The Replay feature uses the date in the filename of the .line file to match the entered replay date.
Generate mapping against target step
In cases where you have a fixed target table, map the fields from the stream to their corresponding fields in the target output table. Use a Select Values step in your transformation. The 'Generate mapping against target' option provides you with an easy-to-use dialog for defining these mappings that automatically creates the resulting Select Values step that can be dropped into your transformation flow prior to the table output step.
To access the 'Generate mapping against target' option right click in the table output step.
After defining your mappings, select OK and the Select Values step containing your mappings will appear on the workspace. Attach the mapping step into your transformation just before the table output step.
Generate mappings example
Below is an example of a simple transformation in which we want to generate mappings to our target output table:
- Begin by right-clicking on the Table output step and selecting 'Generate mappings against target'.
- Add all necessary mappings using the Generate Mapping dialog shown above and click OK. You will now see a Table output mapping step has been added to the canvas.
- Drag the generated Table output Mapping step into your transformation flow prior to the table output step:
Safe mode
Use safe mode when you are mixing rows from many sources to ensure that all rows have the same layout in all conditions. The safe mode option is available in the Spoon logging window or on the Execute a Transformation/Job window. When running in safe mode, the transformation checks every row that passes and ensures that all layouts are identical. If a row is does not have the same layout as the first row, an error is generated and reported.
Note: This option is also available in Pan.
Welcome Screen
The Welcome page displays the first time you launch Spoon 3.0; it provides you with links to additional information about Pentaho Data Integration. You can disable Welcome page in Spoon options (Edit|Options).