Wiki Markup |
---|
{scrollbar}
{ |
Excerpt |
---|
How to create a report that uses data from a column family in Cassandra using graphic tools. {excerpt} |
By
...
the
...
end
...
of
...
this
...
guide
...
you
...
should
...
understand
...
how
...
data
...
can
...
be
...
read
...
from
...
Cassandra
...
and
...
used
...
in
...
a
...
report.
...
The
...
data
...
we
...
are
...
going
...
to
...
use
...
contains
...
data
...
about
...
the
...
flow
...
of
...
visitors
...
to
...
a
...
web
...
site.
...
This
...
guide
...
shows
...
how
...
to
...
create
...
a
...
report
...
that
...
shows
...
the
...
most
...
popular
...
landing
...
pages
...
for
...
the
...
sample
...
web
...
site.
Intro Video
Widget Connector | ||||||
---|---|---|---|---|---|---|
|
Prerequisites
In order follow along with this how-to guide you will need the following:
Cassandra
A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that Cassandra is running on and have a user id and password for the server (if applicable).
These guides were developed using the Apache Cassandra distribution version 1.0.3.
...
You
...
can
...
find
...
Apache
...
Cassandra
...
downloads
...
here:
...
http://cassandra.apache.org/download/
...
Pentaho Data Integration
A desktop installation of the Kettle design tool called 'Spoon'. Download here.
Pentaho Report Designer
Pentaho Report Designer (PRD) is a desktop tool for creating highly formatted reports that can be exported to many popular formats. Reports created with PRD can be published to a Pentaho BI Server so they can be accessed using a browser.
Data
To follow this guide you need to have a populated column family. If you do not have any data in Cassandra yet you can use the Write Data To Cassandra guide to add some data to your Cassandra installation.
Step-By-Step Instructions
We will create the report using two tools. First we will use Spoon to create a data transformation that selects data from Cassandra and sorts it into descending order. Then we will use PRD to create a report using the data transformation as its data source.
Setup
Start Cassandra if is not running.
Create a Data Transformation
- Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.
- Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
- Edit the Cassandra Input Step: Double-click on the Cassandra Input step to edit its properties. Enter this information:
- Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
- Keyspace: 'Demo' or another keyspace if you want.
CQL:
Code Block SELECT * FROM PageSuccessions WHERE url = '--firstpage--';
Or a different query if you want.
The window should look like this:
Click 'OK' to close the window.
- Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.
Congratulations! You've read data from Cassandra. Close the preview window. - Add a sort step: Cassandra Query Language (CQL) does not allow us to sort data so we will sort it using Spoon. Expand the 'Transform' section of the design palette. Drag a 'Sort rows' step from the palette onto the canvas.
- Connect the Input and Sort Steps: We connect the steps before editing the 'Sort rows' step so that the sorting step know what fields are available to be sorted. Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Sort rows' step. Your canvas should look like this:
- Edit the sort step: Double click on the 'Sort rows' step. Change the 'Sort size' to 100000. Using the dropdown list for Fieldname, select 'Count'. Using the dropdown list for 'Ascending', select 'N'.
- Preview the Sorted Data: With the 'Sort rows' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the sorting step.
Notice the data is sorted into descending order by the Count field. Close the preview window. - Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'top_landing_pages.ktr'
...
- into
...
- a
...
- folder
...
- of
...
- your
...
- choice.
...
- Run
...
- the
...
- Transformation
...
- :
...
- Choose
...
- 'Action'
...
- ->
...
- 'Run'
...
- from
...
- the
...
- menu
...
- system
...
- or
...
- click
...
- on
...
- the
...
- green
...
- run
...
- button
...
- on
...
- the
...
- transformation
...
- toolbar.
...
- A
...
- 'Execute
...
- a
...
- transformation'
...
- window
...
- will
...
- open.
...
- Click
...
- on
...
- the
...
- 'Launch'
...
- button.
...
- An
...
- 'Execution
...
- Results'
...
- panel
...
- will
...
- open
...
- at
...
- the
...
- bottom
...
- of
...
- the
...
- Spoon window
...
- and
...
- it
...
- will
...
- show
...
- you
...
- the
...
- progress
...
- of
...
- the
...
- transformation
...
- as
...
- it
...
- runs.
...
- After
...
- a
...
- few
...
- seconds
...
- the
...
- transformation
...
- should
...
- finish
...
- successfully:
...
-
If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.
Start Pentaho Report Designer
When PRD starts click on the 'Report Wizard' button or choose 'File' -> 'Report Wizard...'
...
from
...
the
...
menu.
...
- Select
...
- a
...
- Template
...
- :
...
- On
...
- the
...
- 'Look
...
- and
...
- Feel'
...
- stage
...
- of
...
- the
...
- wizard
...
- select
...
- a
...
- report
...
- template
...
- and
...
- click
...
- on
...
- 'Next'
...
- Add
...
- a
...
- Data
...
- Source
...
- :
...
- On
...
- the
...
- 'Data
...
- Source'
...
- stage
...
- click
...
- on
...
- the
...
- '+'
...
- icon
...
- in
...
- the
...
- top
...
- right
...
- to
...
- add
...
- a
...
- new
...
- data
...
- source.
...
- Choose
...
- the
...
- Data
...
- Source
...
- Type
...
- :
...
- From
...
- the
...
- 'Choose
...
- Type'
...
- list
...
- click
...
- on
...
- 'Pentaho
...
- Data
...
- Integration'.
...
- Add a Query: In the 'Pentaho Data Integration Data Source' window click on the '+' icon to add a new query. A default query, called 'Query 1' is added. Change the name of the query to 'Top Landing Pages'. Then click on the 'Browse' button and select the 'top_landing_pages.ktr'
...
- file
...
- created
...
- above.
...
- Finally
...
- select
...
- the
...
- 'Sort
...
- rows'
...
- step.
...
If you want you can click the 'Preview'
...
- button
...
- to
...
- see
...
- the
...
- data
...
- generated
...
- by
...
- the
...
- 'Sort
...
- rows'
...
- step.
...
- Click
...
- on
...
- 'OK'
...
- to
...
- close
...
- the
...
- 'Pentaho
...
- Data
...
- Integration
...
- Data
...
- Source'
...
- window.
...
- Select
...
- the
...
- Query
...
- :
...
- In
...
- the
...
- 'Report
...
- Design
...
- Wizard'
...
- click
...
- on
...
- the
...
- 'Top
...
- Landing
...
- Pages'
...
- query
...
- to
...
- select
...
- it
...
- and
...
- then
...
- click
...
- on
...
- the
...
- 'Next'
...
- button.
...
- Layout
...
- the
...
- Fields
...
- :
...
- In
...
- the
...
- 'Layout
...
- Step'
...
- of
...
- the
...
- wizard
...
- drag
...
- 'nextUrl'
...
- and
...
- 'Count'
...
- to
...
- the
...
- 'Selected
...
- Items'
...
- box.
...
- This
...
- will
...
- position
...
- these
...
- two
...
- fields
...
- as
...
- two
...
- columns
...
- in
...
- the
...
- report.
...
Click on the 'Next'
...
- button.
...
- Format
...
- the
...
- Fields
...
- :
...
- In
...
- this
...
- step
...
- you
...
- can
...
- change
...
- the
...
- formatting
...
- of
...
- the
...
- fields.
...
- Click
...
- on
...
- the
...
- 'nextUrl'
...
- field
...
- to
...
- highlight
...
- it.
...
- Change
...
- the
...
- Display
...
- Name
...
- to
...
- 'Web
...
- Page'.
...
- Then
...
- click
...
- on
...
- 'Count'
...
- to
...
- highlight
...
- it.
...
- Change
...
- the
...
- data
...
- format
...
- to
...
- '#,###;(#,###)'
...
- and
...
- select
...
- 'Sum'
...
- from
...
- the
...
- 'Aggregation'
...
- list.
- Finish the Wizard: Click on 'Finish'.
...
- The
...
- wizard
...
- will
...
- close
...
- and
...
- you
...
- will
...
- see
...
- your
...
- report
...
- in
...
- design
...
- mode.
- Change the Titles: Double-click
...
- on
...
- the
...
- report
...
- title
...
- and
...
- change
...
- it
...
- to
...
- 'Top
...
- Landing
...
- Pages'.
...
- Double-click
...
- on
...
- the
...
- first
...
- subtitle
...
- and
...
- change
...
- it
...
- to
...
- 'In
...
- Descending
...
- order'.
...
- Double-click
...
- on
...
- the
...
- second
...
- subtitle
...
- and
...
- remove
...
- the
...
- text.
...
- Notice
...
- that
...
- there
...
- are
...
- lot
...
- of
...
- style
...
- properties
...
- you
...
- can
...
- set
...
- for
...
- these
...
- report
...
- elements.
...
- Preview
...
- the
...
- Report
...
- :
...
- Click
...
- on
...
- the
...
- preview
...
- icon
...
- (the
...
- eye
...
- towards
...
- the
...
- top
...
- left)
...
- to
...
- preview
...
- your
...
- report.
...
- Notice
...
- that
...
- the
...
- rows
...
- of
...
- data
...
- are
...
- sorted
...
- into
...
- descending
...
- order.
...
- You
...
- can
...
- page
...
- through
...
- the
...
- report.
...
- At
...
- the
...
- end
...
- of
...
- the
...
- last
...
- page
...
- you
...
- will
...
- see
...
- the
...
- total
...
- for
...
- the
...
- page
...
- counts.
...
Click on the design icon (where the preview icon used to be) to return to design mode.- Run or Export the Report: Click on the run button (green arrow on the toolbar) or choose 'File' -> 'Export' from the menu system. Select the output format for your report. If you used the run button the report will run and the appropriate application will be opened to view the report. If you chose the export option you will be prompted for the location and name of the file that will be exported.
Check The Results
- Using the run or export options you will be able to create and view PDF, Excel, HTML, and other file types. Try these options and check that the exported files contain the expected data.
Summary
During this guide you learned how to read data from a Cassandra column family and use it as the data source for a report.
Other guides in this series cover to sort and group Cassandra data, and combine data from Cassandra with data from other sources.
Wiki Markup |
---|
{scrollbar} |