Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{scrollbar}
{
}
Excerpt

How

to

read

data

from

a

column

family

in

Cassandra

using

a

graphic

tool.

{excerpt}

By

...

the

...

end

...

of

...

this

...

guide

...

you

...

should

...

understand

...

how

...

data

...

can

...

be

...

read

...

from

...

Cassandra

...

and

...

written

...

to

...

many

...

places.

...

The

...

data

...

we

...

are

...

going

...

to

...

use

...

contains

...

data

...

about

...

the

...

flow

...

of

...

visitors

...

to

...

a

...

web

...

site.

Intro Video

Widget Connector
width600
urlhttps://www.youtube.com/watch?v=A4bUoBNOmWU
height480

Prerequisites

In order follow along with this how-to guide you will need the following:

Cassandra

A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that Cassandra is running on and have a user id and password for the server (if applicable).

These guides were developed using the Apache Cassandra distribution version 1.0.3.

...

You

...

can

...

find

...

Apache

...

Cassandra

...

downloads

...

here:

...

http://cassandra.apache.org/download/

...

Pentaho Data Integration

A desktop installation of the Kettle design tool called 'Spoon'. Download here.

Data

  1. To follow this guide you need to have a populated column family. If you do not have any data in Cassandra yet you can use the Write Data To Cassandra guide to add some data to your Cassandra installation.
  2. Add an index on the 'url' column for the 'PageSuccessions' column family. Using the cassandra-cli command line, enter:

    Code Block
    use Demo;
    
    update keyspace Demo;
    
    update column family PageSuccessions
      with column_metadata = [
        {column_name : 'Count',
        validation_class : LongType},
        {column_name : 'nextUrl',
        validation_class : UTF8Type},
        {column_name : 'url',
        validation_class : UTF8Type,
        index_type : KEYS}];
    
    

Step-By-Step Instructions

Setup

Start Cassandra if is not running.

Create a Data Transformation

  1. Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

    Tip
    titleSpeed Tip

    You can download the Kettle Transform read_from_cassandra.ktr already completed

  2. Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
    Image Added
  3. Edit the Cassandra Output Step: Double-click on the Cassandra Output step to edit its properties. Enter this information:
    1. Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
    2. Keyspace: 'Demo' or another keyspace if you want.
    3. Enter the CQL:

      Code Block
      sql
      sql
      SELECT * FROM PageSuccessions where url = '--firstpage--';
      
      

...

    1. Or

...

    1. a

...

    1. different

...

    1. query

...

    1. if

...

    1. you

...

    1. want.

...


    1. The

...

    1. window

...

    1. should

...

    1. look

...

    1. like

...

    1. this:

...


    1. Image Added
      Click 'OK'

...

    1. to

...

    1. close

...

    1. the

...

    1. window.

...

  1. Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass Image Added ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.

    Image Added
    Congratulations! You've read data from Cassandra. Close the preview window.
  2. Add an Output Step: Expand the 'Output' section of the design palette. You can see that there are different output options – files, databases, and applications. There are more output options in the 'Bulk loading' section. For this example we will write to a text file, but you can experiment to other output destinations if you want. Drag a 'Text file output' step from the palette onto the canvas.
    Image Added
  3. Connect the Input and Output Steps: Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Image Added Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Text file output' step. Your canvas should look like this: Image Added
  4. Edit the Text File Output Step: Double click on the 'Text file output' step to edit its properties. Click on the 'Browse' button to select a destination for the file. Select a destination for the file by click.
    Image Added
  5. Define the Output Fields: Click on the 'Fields' tab, then click on the 'Get Fields' button. The table of fields will be populated based on the metadata of the fields coming out of the 'Cassandra Input' step.
    Image Added
    Click on 'OK' to close the 'Text file input' window.
  6. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'read_from_cassandra.ktr'

...

  1. into

...

  1. a

...

  1. folder

...

  1. of

...

  1. your

...

  1. choice.

...

  1. Run the Transformation: Choose 'Action'

...

  1. ->

...

  1. 'Run'

...

  1. from

...

  1. the

...

  1. menu

...

  1. system

...

  1. or

...

  1. click

...

  1. on

...

  1. the

...

  1. green

...

  1. run

...

  1. button

...

  1. on

...

  1. the

...

  1. transformation

...

  1. toolbar.

...

  1. A

...

  1. 'Execute

...

  1. a

...

  1. transformation'

...

  1. window

...

  1. will

...

  1. open.

...

  1. Click

...

  1. on

...

  1. the

...

  1. 'Launch'

...

  1. button.

...

  1. An

...

  1. 'Execution

...

  1. Results'

...

  1. panel

...

  1. will

...

  1. open

...

  1. at

...

  1. the

...

  1. bottom

...

  1. of

...

  1. the

...

  1. Spoon

...

  1. window

...

  1. and

...

  1. it

...

  1. will

...

  1. show

...

  1. you

...

  1. the

...

  1. progress

...

  1. of

...

  1. the

...

  1. transformation

...

  1. as

...

  1. it

...

  1. runs.

...

  1. After

...

  1. a

...

  1. few

...

  1. seconds

...

  1. the

...

  1. transformation

...

  1. should

...

  1. finish

...

  1. successfully:

...

  1. Image Added
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

Check The Results

  1. If your transformation ran successfully you can open the text file you created to see the data written there.

Summary

During this guide you learned how to read data from a Cassandra column family and write it to a text file using Kettle's graphical design tool. You can use can use this procedure to read data from Cassandra and write it to many different destinations.

Other guides in this series cover to sort and group Cassandra data, create reports, and combine data from Cassandra with data from other sources.

Wiki Markup
{scrollbar}