Database lookup
Description
The Database lookup step allows you to look up values in a database table. Lookup values are added as new fields onto the stream.
Options
The following table describes the available options for configuring the database lookup:
Option |
Description |
---|---|
Step name |
Name of the step; this name has to be unique in a single transformation. |
Connection |
Database connection to the lookup table. |
Lookup schema |
Database schema containing the lookup table. |
Lookup Table |
Name of the database table used for the lookup. |
Enable cache? |
Enables caching of database lookups. This means that once a key (or group of key) has been looked up, the looked up values are stored, and returned again the next time this key (or group of key) is being looked up (without incurring the cost of a database call).
|
Cache size in rows |
The size of the cache (number of rows), 0 means cache everything. |
Load all data from table |
Pre-loads the cache with all the data present in the lookup table. This may improve performance by avoiding database calls.
|
Keys to look up table |
The keys and conditions to perform the database lookup.
|
Values to return table |
The fields from the lookup table to add to the output stream. |
Do not pass the row if the lookup fails |
Enable to avoid passing a row when lookup fails. Within the SQL syntax, enabling this would be an INNER JOIN, otherwise it would be an OUTER JOIN. |
Fail on multiple results? |
Enable to force the step to fail if the lookup returns multiple results. |
Order by |
If the lookup query returns multiple results, the ORDER BY clause helps you to select the record to take. For example, ORDER BY would allow you to pick the customer with the highest sales volume in a specified state. |
Get Fields |
Click to return a list of available fields from the input stream(s) of the step. |
Get lookup fields |
Click to return a list of available fields from the lookup table that can be added to the step's output stream. |
Note on the Load all data from table option: As an example: when storing a value of type "CHAR(3)", many databases will store "ab" using 3 characters: "ab " (notice the blank space). When you do a "SELECT * FROM my_lookup_table WHERE key_column = 'ab'", the database is smart and adds a blank space before applying the where clause (it looks for 'ab ' instead of 'ab').
The problem occurs when you use "load all data from table" to pre-load the cache: at startup, the cache is built using all the values present in the database, so we store 'ab ' in the cache. Later, looking up 'ab' fails, because the trailing space matters in java equality.
Note that this problem does not occur when using the cache, if "load all data from table" is disabled. Indeed, the first lookup for 'ab' would find no entry in the cache, call the database (which is smart enough to handle the trailing whitespace problem), get the correct result, and store it in the java cache under the 'ab' key. Thus, the next lookup for 'ab' will find the appropriate result in the cache :)