Cassandra Output0



Configure Cassandra Output
Cassandra Output is an output step that enables data to be written to a Cassandra column family (table) as part of an ETL transformation.

Option

Definition

Step name

The name of this step as it appears in the transformation workspace.

Cassandra host

Connection host name input field.

Cassandra port

Connection host port number input field.

Username

Target keyspace and/or family (table) authentication details input field.

Password

Target keyspace and/or family (table) authentication details input field.

Keyspace

Input field for the keyspace (database) name.

Show schema

Opens a dialog box that shows metadata for the specified column family.




Configure Column Family and Consistency Level
This tab contains connection details and basic query information, in particular, how to connect to Cassandra and execute a CQL (Cassandra query language) query to retrieve rows from a column family (table).
Important: Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra metadata. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field metadata. If resulting values cannot be parsed by the Cassandra column validator for a particular column then an error results.
Cassandra Output converts PDI's dense row format into sparse data by ignoring incoming field values that are null.

Option

Definition

Column family (table)

Input field to specify the column family, to which the incoming rows should be written.

Get column family names button

Populates the drop-down box with names of all the column families that exist in the specified keyspace.

Consistency level

Input field enables an explicit write consistency to be specified. Valid values are: ZERO, ONE, ANY, QUORUM and ALL. The Cassandra default is ONE.

Create column family

If checked, enables the step to create the named column family if it does not already exist.

Truncate column family

If checked, specifies whether any existing data should be deleted from the named column family before inserting incoming rows.

Update column family metadata

If checked, updates the column family metadata with information on incoming fields not already present, when option is selected. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column metadata option is enabled.

Insert fields not in column metadata

If checked, inserts the column family metadata in any incoming fields not present, with respect to the default column family validator. This option has no effect if Update column family metadata is selected.

Commit batch size

Allows you to specify how many rows to buffer before executing a BATCH INSERT CQL statement.

Use compression

Option compresses (gzip) the text of each BATCH INSERT statement before transmitting it to the node.




Pre-insert CQL
Cassandra Output gives you the option of executing an arbitrary set of CQL statements prior to inserting the first incoming PDI row. This is useful for creating or dropping secondary indexes on columns.insert CQL statements are executed after any column family metadata updates for new incoming fields, and before the first row is inserted. This enables indexes to be created for columns corresponding new to incoming fields.

Option

Definition

CQL to execute before inserting first row

Opens the CQL editor, where you can enter one or more semicolon-separated CQL statements to execute before data is inserted into the first row.