Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{scrollbar}
Excerpt

How to create a report that uses data from a column family in Cassandra using graphic tools.

By the end of this guide you should understand how data can be read from Cassandra and used in a report. The data we are going to use contains data about the flow of visitors to a web site. This guide shows how to create a report that shows the most popular landing pages for the sample web site.

Intro Video

Widget Connector
width600
urlhttps://www.youtube.com/watch?v=54vPHs-lHzk
height480

Prerequisites

In order follow along with this how-to guide you will need the following:

...

  1. Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.
  2. Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
  3. Edit the Cassandra Input Step: Double-click on the Cassandra Input step to edit its properties. Enter this information:
    1. Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
    2. Keyspace: 'Demo' or another keyspace if you want.
    3. CQL:

      Code Block
      
      SELECT * FROM PageSuccessions WHERE url = '--firstpage--';
      
      


      Or a different query if you want.
      The window should look like this:
      Image Modified
      Click 'OK' to close the window.

  4. Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.

    Congratulations! You've read data from Cassandra. Close the preview window.
  5. Add a sort step: Cassandra Query Language (CQL) does not allow us to sort data so we will sort it using Spoon. Expand the 'Transform' section of the design palette. Drag a 'Sort rows' step from the palette onto the canvas.
  6. Connect the Input and Sort Steps: We connect the steps before editing the 'Sort rows' step so that the sorting step know what fields are available to be sorted. Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Sort rows' step. Your canvas should look like this:
  7. Edit the sort step: Double click on the 'Sort rows' step. Change the 'Sort size' to 100000. Using the dropdown list for Fieldname, select 'Count'. Using the dropdown list for 'Ascending', select 'N'.
  8. Preview the Sorted Data: With the 'Sort rows' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the sorting step.

    Notice the data is sorted into descending order by the Count field. Close the preview window.
  9. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'top_landing_pages.ktr' into a folder of your choice.
  10. Run the Transformation: Choose 'Action' -> 'Run' from the menu system or click on the green run button on the transformation toolbar. A 'Execute a transformation' window will open. Click on the 'Launch' button. An 'Execution Results' panel will open at the bottom of the Spoon window and it will show you the progress of the transformation as it runs. After a few seconds the transformation should finish successfully:
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

...

  1. Select a Template: On the 'Look and Feel' stage of the wizard select a report template and click on 'Next'
  2. Add a Data Source: On the 'Data Source' stage click on the '+' icon in the top right to add a new data source.
  3. Choose the Data Source Type: From the 'Choose Type' list click on 'Pentaho Data Integration'.
  4. Add a Query: In the 'Pentaho Data Integration Data Source' window click on the '+' icon to add a new query. A default query, called 'Query 1' is added. Change the name of the query to 'Top Landing Pages'. Then click on the 'Browse' button and select the 'top_landing_pages.ktr' file created above. Finally select the 'Sort rows' step.

    If you want you can click the 'Preview' button to see the data generated by the 'Sort rows' step. Click on 'OK' to close the 'Pentaho Data Integration Data Source' window.
  5. Select the Query: In the 'Report Design Wizard' click on the 'Top Landing Pages' query to select it and then click on the 'Next' button.
  6. Layout the Fields: In the 'Layout Step' of the wizard drag 'nextUrl' and 'Count' to the 'Selected Items' box. This will position these two fields as two columns in the report.

    Click on the 'Next' button.
  7. Format the Fields: In this step you can change the formatting of the fields. Click on the 'nextUrl' field to highlight it. Change the Display Name to 'Web Page'. Then click on 'Count' to highlight it. Change the data format to '#,###;(#,###)' and select 'Sum' from the 'Aggregation' list.
  8. Finish the Wizard: Click on 'Finish'. The wizard will close and you will see your report in design mode.
  9. Change the Titles: Double-click on the report title and change it to 'Top Landing Pages'. Double-click on the first subtitle and change it to 'In Descending order'. Double-click on the second subtitle and remove the text. Notice that there are lot of style properties you can set for these report elements.
  10. Preview the Report: Click on the preview icon (the eye towards the top left) to preview your report. Notice that the rows of data are sorted into descending order. You can page through the report. At the end of the last page you will see the total for the page counts.

    Click on the design icon (where the preview icon used to be) to return to design mode.
  11. Run or Export the Report: Click on the run button (green arrow on the toolbar) or choose 'File' -> 'Export' from the menu system. Select the output format for your report. If you used the run button the report will run and the appropriate application will be opened to view the report. If you chose the export option you will be prompted for the location and name of the file that will be exported.

...