Vertica Bulk Loader
Description
This step allows you to load data into a Vertica database table. It uses the VerticaCopyStream class in the Vertica JDBC driver (documented on the Vertica site in the Programmer's Guide > Vertica Client Libraries). Using the VerticaCopyStream class with a COPY statement is the highest performance method offered by Vertica to stream bulk data into a table.
Prerequisites
- The Vertica JDBC driver must be in the PDI classpath.
- You must be connecting to a Vertica cluster version 5 or higher. Both Vertica CE and Vertica EE are supported.
- The host machine running the transformation using this step must be able to connect to all nodes of the Vertica cluster via TCP.
- The Vertica user configured in the connection must either be a superuser, or have USAGE privilege on the schema and INSERT privilege on the table.
Options
Option |
Description |
---|---|
Step name |
The name of this step as it appears in the transformation workspace. |
Connection |
Drop-down menu which allows you to select the appropriate Vertica database connection. Note: List is populated only once a database connection has already been established; if no database connection has been established, this list will be blank. You must select a connection with the type "Vertica 5+". |
Target schema |
The name of the schema containing the table in which the data will be inserted. |
Target table |
Name of the target table. |
Specify database fields |
If this option is selected, the user can indicate the specific input fields to be inserted in the table as well as provide a mapping to the column name in the table. |
Insert direct to ROS |
This option determines whether the data will be loaded using the WOS (if the option is not selected) or ROS (if the option is selected).
|
Abort on error |
If this option is selected, any rows of data which are rejected by Vertica will cause an error which will abort the entire transformation, potentially rolling back the current transaction. |
Exceptions log file |
Specifies the filename or absolute path for the load exceptions file. In Vertica 6 and higher, the path can refer to a storage location. Please see the Vertica SQL Reference Manual > SQL Statements > COPY for details on the required privileges. |
Rejected data log file |
Specifies the filename or absolute path in which to write rejected rows. In Vertica 6 and higher, the path can refer to a storage location. Please see the Vertica SQL Reference Manual > SQL Statements > COPY for details on the required privileges. |
Stream name |
If this option is specified, the string will be used as the COPY load stream identifier which can be seen when monitoring load streams via the LOAD_STREAMS system table. The default stream name is <table_name>_STDIN |
Database fields
If the "Specify database fields" option is selected, this table is enabled. With the "Get fields" button you can load the fields from the stream. With the "Enter field mapping" button you can edit the mapping from the stream to the table fields when the table already exists.
Option |
Description |
---|---|
Table field |
Name of the column in the table to store this field. |
Stream field |
Name of the field in the stream. |
SQL button
The user can press the SQL button to review and optionally execute the CREATE TABLE and CREATE PROJECTION statements that match the data coming into this step. Please note that the projection is created using no segmentation or partitioning which will result in a full copy of the data residing on every node on the Vertica cluster. This is useful for testing, but the Vertica database administrator should revise the table's projections before being used in production.
Metadata Injection Support (7.x and later)
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.