Cassandra Output
PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.
Description
The Cassandra Output step allows data to be written to a Cassandra column family (table)
Options
Connection Tab
Option | Definition |
---|---|
Step name | The name of this step as it appears in the transformation workspace. |
Cassandra host | Connection host name input field. |
Cassandra port | Connection host port number input field. |
Socket timeout | Sets an optional connection timeout period, specified in milliseconds. |
Username | Target keyspace and/or family (table) authentication details input field. |
Password | Target keyspace and/or family (table) authentication details input field. |
Keyspace | Input field for the keyspace (database) name. |
Write Options Tab
The Cassandra Output step provides a number of options that control what and how data is written to the target Cassandra keyspace.
This tab contains connection details and basic query information, in particular, how to connect to Cassandra and execute a CQL (Cassandra query language) query to retrieve rows from a column family (table).
Important: Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra metadata. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field metadata. If resulting values cannot be parsed by the Cassandra column validator for a particular column then an error results.
Cassandra Output converts PDI's dense row format into sparse data by ignoring incoming field values that are null.
Option | Definition |
---|---|
Column family (table) | Input field to specify the column family, to which the incoming rows should be written. |
Get column family names button | Populates the drop-down box with names of all the column families that exist in the specified keyspace. |
Consistency level | Input field enables an explicit write consistency to be specified. Valid values are: ZERO, ONE, ANY, QUORUM and ALL. The Cassandra default is ONE. |
The Show schema button at the lower right-hand side of the UI pops up a dialog that shows meta data for the specified column family.
Schema Options Tab
Option | Definition |
---|---|
Host for schema updates | The Cassandra schema host name. |
Port for schema updates | The Cassandra schema port number. |
Create column family | If checked, enables the step to create the named column family if it does not already exist. |
Table creation WITH clause | Use to specify additions to the table creation WITH clause. |
Truncate column family | If checked, specifies whether any existing data should be deleted from the named column family before inserting incoming rows. |
Update column family metadata | If checked, updates the column family metadata with information on incoming fields not already present, when option is selected. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column metadata option is enabled. |
Insert fields not in column metadata | If checked, inserts the column family metadata in any incoming fields not present, with respect to the default column family validator. This option has no effect if Update column family metadata is selected. |
Use compression | Option compresses (gzip) the text of each BATCH INSERT statement before transmitting it to the node. |
CQL to execute before inserting first row | Use to specify any a priori CQL statements to execute before inserting the first row. |
More Details about Updating Column Family Metadata
Selecting the Update column family meta data option will result in the column family meta data getting updated with information on incoming fields not already present. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column meta data option is enabled. If the latter is enabled then any incoming fields that are not present in the column family meta data will be inserted with respect to the default column family validator. This option has no effect if Update column family meta data is selected.
Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra meta data. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field meta data. If resulting values can't be parsed by the Cassandra column validator for a particular column then an error will result.
Pre-Insert CQL
Cassandra Output gives the user the option of executing an arbitrary set of CQL statements prior to inserting the first incoming PDI row. This is useful, amongst other things, for creating or dropping secondary indexes on columns. Clicking the CQL to execute before inserting first row button pops up a CQL editor. The user can enter multiple CQL statements as long as each is terminated by a semicolon.
Pre-insert CQL statements are executed after any column family meta data updates for new incoming fields, and before the first row is inserted. This allows for indexes to be created for columns corresponding new incoming fields.
Metadata Injection Support (7.x and later)
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.