Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{scrollbar}
{
}
Excerpt

How

to

read

data

from

a

data

source

(flat

file)

and

write

it

to

a

column

family

in

Cassandra

using

a

graphic

tool.

{excerpt}

By

...

the

...

end

...

of

...

this

...

guide

...

you

...

should

...

understand

...

how

...

data

...

can

...

be

...

read

...

from

...

many

...

different

...

data

...

sources

...

and

...

written

...

to

...

Cassandra.

...

The

...

data

...

we

...

are

...

going

...

to

...

use

...

contains

...

data

...

about

...

the

...

flow

...

of

...

visitors

...

to

...

a

...

web

...

site.

Intro Video

Widget Connector
width600
urlhttps://www.youtube.com/watch?v=E5nSQt4gdWI
height480

Prerequisites

In order follow along with this how-to guide you will need the following:

Cassandra

A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that Cassandra is running on and have a user id and password for the server (if applicable).
These guides were developed using the Apache Cassandra distribution version 1.0.3.

...

You

...

can

...

find

...

Apache

...

Cassandra

...

downloads

...

here:

...

http://cassandra.apache.org/download/

...

Kettle

A desktop installation of the Kettle design tool called 'Spoon'. Download here.

Sample Files

The sample data files for this guide is called page_successions.txt.zip

...

Step-By-Step

...

Instructions

Setup

Start Cassandra if is not running.

Create a Cassandra Keyspace

Using the Cassandra command line interface (CLI),

...

create

...

a

...

keyspace

...

to

...

use

...

for

...

this

...

exercise.

...

  1. To

...

  1. start

...

  1. the

...

  1. Cassandra

...

  1. CLI,

...

  1. at

...

  1. a

...

  1. command

...

  1. line

...

  1. in

...

  1. the

...

  1. Cassandra

...

  1. home

...

  1. directory

...

  1. type:

...

  1. Code Block

...

  1.  

...

  1. bin/cassandra-cli --host localhost
    
    

...

  1. Once the Cassandra CLI has started type:
    create keyspace Demo;

Create a Data Transformation

  1. Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

    Tip
    titleSpeed Tip

    You can download the Kettle Transform populate_cassandra_page_successions.ktr

...

  1. already

...

  1. completed

  2. Add a Text File Input Step: We are going to read data from a text file, so expand the 'Input' section of the Design palette and drag a 'Text file input' step onto the transformation canvas.
    Image Added
    Notice that there are lots of other inputs that we could have used such as a database (including Hive), applications, and specific file formats. Under the Hadoop section there are other input including HDFS, HBase, and MapReduce.
  3. Select the file: Double-click on the 'Text file input' step to edit it's properties. Click on the 'Browse' button on the right side of the dialog to select a file. Locate the page_successions.txt file. Click on the 'Add' button to add the file to the selected files list. The dialog should look something like this:
    Image Added
  4. Create Data Fields: Click on the 'Fields' tab. Then click the 'Get Fields' button. Click 'OK' to sample 100 lines. You will see the 'Scan results' window. When you close the 'Scan results' window you will see the fields filled in for you:
    Image Added
  5. Preview Data: Click on the 'Preview Rows' button and accept 1000 as the number of rows to preview. You will see a table of preview data read from the text file: Image Added
  6. Add a Cassandra Output Step: Close the preview window and click on 'OK' on the 'Text file input' window. On the design palette expand the 'Big Data' section and drag a 'Cassandra Output' step onto the transformation canvas. Your canvas should look like this:
    Image Added
  7. Connect the Input and Output Steps: Hover the mouse over the 'Text file input' step and a tooltip will appear. Image Added Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Cassandra Output' step. Your canvas should look like this:
    Image Added
  8. Edit the Cassandra Output Step: Double-click on the 'Cassandra Output' step to edit its properties. Enter this information:
    1. Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
    2. Keyspace: The name of the keyspace you created in step 2 above – 'Demo'.
    3. Column family (table): Enter 'PageSuccessions'
    4. Incoming field to use as the key: Click on the 'Get Fields' button to populate the drop-down list, then choose the field 'key' from the list.
    5. Create column family: Checked. This will create the column family if it does not exist.
    6. Truncate column family: Checked. This will empty the PageSuccessions column family before adding the incoming data.
    7. Update column family meta data: Checked. This will make the column family metadata consistent with the fields of data.
      When you are done your 'Cassandra Output' window should look like this (your connection information may be different): Image Added
      Click 'OK' to close the window.
  9. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'populate_cassandra_page_successions.ktr'

...

  1. into

...

  1. a

...

  1. folder

...

  1. of

...

  1. your

...

  1. choice.

...

  1. Run the Transformation: Choose 'Action'

...

  1. ->

...

  1. 'Run'

...

  1. from

...

  1. the

...

  1. menu

...

  1. system

...

  1. or

...

  1. click

...

  1. on

...

  1. the

...

  1. green

...

  1. run

...

  1. button

...

  1. on

...

  1. the

...

  1. transformation

...

  1. toolbar.

...

  1. A

...

  1. 'Execute

...

  1. a

...

  1. transformation'

...

  1. window

...

  1. will

...

  1. open.

...

  1. Click

...

  1. on

...

  1. the

...

  1. 'Launch'

...

  1. button.

...

  1. An

...

  1. 'Execution

...

  1. Results'

...

  1. panel

...

  1. will

...

  1. open

...

  1. at

...

  1. the

...

  1. bottom

...

  1. of

...

  1. the

...

  1. Spoon window

...

  1. and

...

  1. it

...

  1. will

...

  1. show

...

  1. you

...

  1. the

...

  1. progress

...

  1. of

...

  1. the

...

  1. transformation

...

  1. as

...

  1. it

...

  1. runs.

...

  1. After

...

  1. a

...

  1. few

...

  1. seconds

...

  1. the

...

  1. transformation

...

  1. should

...

  1. finish

...

  1. successfully:

...

  1. Image Added
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

Check the Cassandra Column Family

  1. Using the Cassandra CLI, type:

    Code Block
    use Demo;
    get PageSuccessions

...

  1. ['/about~^~/events'

...

  1. ];
    
    

...

  1. You

...

  1. should

...

  1. see

...

  1. a

...

  1. result

...

  1. like

...

  1. this:

...


  1. =>

...

  1. (column=Count,

...

  1. value=12,

...

  1. timestamp=1325632014571099)

...


  1. =>

...

  1. (column=nextUrl,

...

  1. value=/events,

...

  1. timestamp=1325632014571101)

...


  1. =>

...

  1. (column=url,

...

  1. value=/about,

...

  1. timestamp=1325632014571100)

...

  1. Returned

...

  1. 3

...

  1. results.

...


  1. Elapsed

...

  1. time:

...

  1. 4

...

  1. msec(s).

...

This

...

data

...

tells

...

you

...

that

...

the

...

number

...

of

...

times

...

visitors

...

went

...

from

...

the

...

About

...

page

...

to

...

the

...

Events

...

page

...

was

...

12

...

during

...

the

...

timeframe

...

in

...

the

...

data.

...

Summary

During this guide you learned how to populate a Cassandra column family using Kettle's graphical design tool. You can use can use this tool to load data into Cassandra from many data sources.

Other guides in this series cover to sort and group Cassandra data, create reports, and combine data from Cassandra with data from other sources.

Wiki Markup
{scrollbar}