Configure Cassandra Output
Cassandra Output is an output step that enables data to be written to a Cassandra column family (table) as part of an ETL transformation.
Option |
Definition |
---|---|
Step name |
The name of this step as it appears in the transformation workspace. |
Cassandra host |
Connection host name input field. |
Cassandra port |
Connection host port number input field. |
Username |
Target keyspace and/or family (table) authentication details input field. |
Password |
Target keyspace and/or family (table) authentication details input field. |
Keyspace |
Input field for the keyspace (database) name. |
Show schema |
Opens a dialog box that shows metadata for the specified column family. |
Configure Column Family and Consistency Level
This tab contains connection details and basic query information, in particular, how to connect to Cassandra and execute a CQL (Cassandra query language) query to retrieve rows from a column family (table).
Important: Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra metadata. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field metadata. If resulting values cannot be parsed by the Cassandra column validator for a particular column then an error results.
Cassandra Output converts PDI's dense row format into sparse data by ignoring incoming field values that are null.
Option |
Definition |
---|---|
Column family (table) |
Input field to specify the column family, to which the incoming rows should be written. |
Get column family names button |
Populates the drop-down box with names of all the column families that exist in the specified keyspace. |
Consistency level |
Input field enables an explicit write consistency to be specified. Valid values are: ZERO, ONE, ANY, QUORUM and ALL. The Cassandra default is ONE. |
Create column family |
If checked, enables the step to create the named column family if it does not already exist. |
Truncate column family |
If checked, specifies whether any existing data should be deleted from the named column family before inserting incoming rows. |
Update column family metadata |
If checked, updates the column family metadata with information on incoming fields not already present, when option is selected. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column metadata option is enabled. |
Insert fields not in column metadata |
If checked, inserts the column family metadata in any incoming fields not present, with respect to the default column family validator. This option has no effect if Update column family metadata is selected. |
Commit batch size |
Allows you to specify how many rows to buffer before executing a BATCH INSERT CQL statement. |
Use compression |
Option compresses (gzip) the text of each BATCH INSERT statement before transmitting it to the node. |
Pre-insert CQL
Cassandra Output gives you the option of executing an arbitrary set of CQL statements prior to inserting the first incoming PDI row. This is useful for creating or dropping secondary indexes on columns.insert CQL statements are executed after any column family metadata updates for new incoming fields, and before the first row is inserted. This enables indexes to be created for columns corresponding new to incoming fields.
Option |
Definition |
---|---|
CQL to execute before inserting first row |
Opens the CQL editor, where you can enter one or more semicolon-separated CQL statements to execute before data is inserted into the first row. |