The JDBC Client
The JDBC driver uses the following class:
org.pentaho.di.core.jdbc.ThinDriver
The URL is in the following format:
jdbc:pdi://hostname:port/kettle?option=value&option=value
For Carte, this is an example:
jdbc:pdi://localhost:8084/kettle?debugtrans=/tmp/jdbc.ktr
For the Data Integration server:
jdbc:pdi://localhost:9080/kettle?webappname=pentaho-di
this example is for a the carte configuration file shown above.
The following standard options are available:
- webappname : the name of the web app (typically pentaho-di on the DI server) Make sure to specify this in the "Options" sections of the Kettle database connection dialog if you want to connect with PDI to a PDI Server (Carte or the DI server);
- proxyhostname : the proxy server for the HTTP connection(s)
- proxyport : the port of the proxy server
- nonproxyhosts : the hosts (comma seperator) for which not to use a proxy
- debugtrans : the optional name of a file in which the generated transformation will be stored for debugging purposes (example: /tmp/debug.ktr)
- debuglog : set to "true" to have the logging text of the remote SQL transformation will be written to the general logging channel once execution is finished.
Parameters for the service transformation can be set with the following format: PARAMETER_name=value (so with the option name prepended with "PARAMETER_")
SQL Support
Support for the SQL is minimal at the moment.
The following things are supported, please consider everything else unsupported:
- SELECT:
- * is expanded to include all rows
- COUNT(field)
- COUNT(*)
- COUNT(DISTINCT field)
- DISTINCT <fields>
- IIF( condition, true-value or field, false-value or field)
- CASE WHEN condition THEN true-value ELSE false-value END
- Aggregates: SUM, AVG, MIN, MAX
- Alias both with the "AS" keyword and with one or more spaces seperated, for example SUM(sales) AS "Total Sales" or SUM(sales) TotalSales
- Constant expressions are possible, see below in the literals section.
- Calculations on the other hand are not possible yet, perform them in the service transformation for now.
- FROM
- Strictly one service name, aliasing is possible
- You can omit the service name to query from an empty row or you can query from dual, for example "SELECT 1" or "SELECT 1 FROM dual" are the same.
- You can specify a schema (default is Kettle) but it is currently ignored. It will be translated to a namespace in the near future.
- WHERE
- nested brackets
- AND, OR, NOT if preceded by brackets, for example: NOT ( A = 5 OR C = 3 )
- precedence taken into account
- Literals (String, Integer)
- PARAMETER('parameter-name')='value' (always evaluates to TRUE in the condition)
- =
- <
- >
- <=, =<
- >=, =>
- <>
- LIKE (standard % and ? wildcards are converted to .* and . regular expressions)
- REGEX (matches regular expression)
- IS NULL
- IS NOT NULL
- IN ( value, value, value, ... )
- You can put a condition on the IIF expression or it's alias if one is used. (please use identical string literals for expressions)
- GROUP BY
- Group on fields, not IIF() function
- HAVING
- Conditions should be placed on the aggregate construct, not the alias
- Please use identical strings for the expressions, the algorithm is not yet that smart. In other words, if you use "COUNT( * )" in the SELECT clause you should use the same "COUNT( * ) " expression in the HAVING clause, not "COUNT(*)" or any variant of it.
- You can place having conditions on aggregations that do not appear in the SELECT clause.
- ORDER BY
- You can order on any column in the result or not in the result
- You can order on IFF or CASE-WHEN expressions.
Literals:
- Strings have single quotes around them, escaping quotes done by doubling the single quote.
- Dates have square brackets around them and the following formats are supported: [yyyy/MM/dd HH:mm:ss.SSS], [yyyy/MM/dd HH:mm:ss] and [yyyy/MM/dd]
- Number and BigNumber should have no grouping symbol and the decimal is . (example 123.45)
- Integers contain only digits
- Boolean values can be TRUE or FALSE
Limitations
Besides the obviously plentiful limitations in the support for the SQL standard, there are a few noteworthy things to note:
- Grouping is done using a "Memory Group By" step which keeps all the groups in memory. If you expect large amounts of groups to be used, watch your memory consumption on the server. We're using the "memory group by" step to avoid doing a costly sort on the data.
- It is not possible to specify the same field twice in the same SELECT clause. (for whatever reason you might want to do that)
- calculations and functions like string concatenation and so on is not (yet) supported.