Wiki Markup |
---|
{scrollbar}
{ |
Excerpt |
---|
How to read data from a column family in Cassandra using a graphic tool. {excerpt} |
By
...
the
...
end
...
of
...
this
...
guide
...
you
...
should
...
understand
...
how
...
data
...
can
...
be
...
read
...
from
...
Cassandra
...
and
...
written
...
to
...
many
...
places.
...
The
...
data
...
we
...
are
...
going
...
to
...
use
...
contains
...
data
...
about
...
the
...
flow
...
of
...
visitors
...
to
...
a
...
web
...
site.
...
Intro
...
Video
Widget Connector | ||||||
---|---|---|---|---|---|---|
|
Prerequisites
In order follow along with this how-to guide you will need the following:
Cassandra
A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that Cassandra is running on and have a user id and password for the server (if applicable).
These guides were developed using the Apache Cassandra distribution version 1.0.3.
...
You
...
can
...
find
...
Apache
...
Cassandra
...
downloads
...
here:
...
http://cassandra.apache.org/download/
...
Pentaho Data Integration
A desktop installation of the Kettle design tool called 'Spoon'.
...
Download
...
...
Data
- To follow this guide you need to have a populated column family. If you do not have any data in Cassandra yet you can use the Write Data To Cassandra guide to add some data to your Cassandra installation.
Add an index on the 'url' column for the 'PageSuccessions' column family. Using the cassandra-cli command line, enter:
Code Block use Demo; update keyspace Demo; update column family PageSuccessions with column_metadata = [ {column_name : 'Count', validation_class : LongType}, {column_name : 'nextUrl', validation_class : UTF8Type}, {column_name : 'url', validation_class : UTF8Type, index_type : KEYS}];
...
Step-By-Step
...
Instructions
...
Setup
...
Start
...
Cassandra
...
if
...
is
...
not
...
running.
...
Create
...
a
...
Data
...
Transformation
...
Start
...
Spoon
...
on
...
your
...
desktop.
...
Once
...
it
...
is
...
running
...
choose
...
'File'
...
->
...
'New'
...
->
...
'Transformation'
...
from
...
the
...
menu
...
system
...
or
...
click
...
on
...
the
...
'New
...
file'
...
icon
...
on
...
the
...
toolbar
...
and
...
choose
...
the
...
'Transformation'
...
option.
...
Tip
...
title
...
Speed
...
Tip
...
You
...
can
...
download
...
the
...
Kettle
...
Transform
...
...
already
...
completed
- Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
- Edit the Cassandra Output Step: Double-click on the Cassandra Output step to edit its properties. Enter this information:
- Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
- Keyspace: 'Demo' or another keyspace if you want.
Enter the CQL:
Code Block sql sql SELECT * FROM PageSuccessions where url = '--firstpage--';
...
Or
...
a
...
different
...
query
...
if
...
you
...
want.
...
The
...
window
...
should
...
look
...
like
...
this:
...
Click 'OK'
...
to
...
close
...
the
...
window.
...
- Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.
Congratulations! You've read data from Cassandra. Close the preview window. - Add an Output Step: Expand the 'Output' section of the design palette. You can see that there are different output options – files, databases, and applications. There are more output options in the 'Bulk loading' section. For this example we will write to a text file, but you can experiment to other output destinations if you want. Drag a 'Text file output' step from the palette onto the canvas.
- Connect the Input and Output Steps: Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Text file output' step. Your canvas should look like this:
- Edit the Text File Output Step: Double click on the 'Text file output' step to edit its properties. Click on the 'Browse' button to select a destination for the file. Select a destination for the file by click.
- Define the Output Fields: Click on the 'Fields' tab, then click on the 'Get Fields' button. The table of fields will be populated based on the metadata of the fields coming out of the 'Cassandra Input' step.
Click on 'OK' to close the 'Text file input' window. - Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'read_from_cassandra.ktr'
...
- into
...
- a
...
- folder
...
- of
...
- your
...
- choice.
...
- Run the Transformation: Choose 'Action'
...
- ->
...
- 'Run'
...
- from
...
- the
...
- menu
...
- system
...
- or
...
- click
...
- on
...
- the
...
- green
...
- run
...
- button
...
- on
...
- the
...
- transformation
...
- toolbar.
...
- A
...
- 'Execute
...
- a
...
- transformation'
...
- window
...
- will
...
- open.
...
- Click
...
- on
...
- the
...
- 'Launch'
...
- button.
...
- An
...
- 'Execution
...
- Results'
...
- panel
...
- will
...
- open
...
- at
...
- the
...
- bottom
...
- of
...
- the
...
- Spoon
...
- window
...
- and
...
- it
...
- will
...
- show
...
- you
...
- the
...
- progress
...
- of
...
- the
...
- transformation
...
- as
...
- it
...
- runs.
...
- After
...
- a
...
- few
...
- seconds
...
- the
...
- transformation
...
- should
...
- finish
...
- successfully:
...
-
If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.
Check The Results
- If your transformation ran successfully you can open the text file you created to see the data written there.
Summary
During this guide you learned how to read data from a Cassandra column family and write it to a text file using Kettle's graphical design tool. You can use can use this procedure to read data from Cassandra and write it to many different destinations.
Other guides in this series cover to sort and group Cassandra data, create reports, and combine data from Cassandra with data from other sources.
Wiki Markup |
---|
{scrollbar} |