Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{scrollbar}
Excerpt

How to read data from a column family in Cassandra using a graphic tool.

By the end of this guide you should understand how data can be read from Cassandra and written to many places. The data we are going to use contains data about the flow of visitors to a web site.

Intro Video

Widget Connector
width600
urlhttps://www.youtube.com/watch?v=A4bUoBNOmWU
height480

Prerequisites

In order follow along with this how-to guide you will need the following:

...

  1. To follow this guide you need to have a populated column family. If you do not have any data in Cassandra yet you can use the Write Data To Cassandra guide to add some data to your Cassandra installation.
  2. Add an index on the 'url' column for the 'PageSuccessions' column family. Using the cassandra-cli command line, enter:

    Code Block
    
    use Demo;
    
    update keyspace Demo;
    
    update column family PageSuccessions
      with column_metadata = [
        {column_name : 'Count',
        validation_class : LongType},
        {column_name : 'nextUrl',
        validation_class : UTF8Type},
        {column_name : 'url',
        validation_class : UTF8Type,
        index_type : KEYS}];
    
    

...

Create a Data Transformation

  1. Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

    Tip
    titleSpeed Tip

    You can download the Kettle Transform read_from_cassandra.ktr already completed

  2. Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
  3. Edit the Cassandra Output Step: Double-click on the Cassandra Output step to edit its properties. Enter this information:
    1. Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
    2. Keyspace: 'Demo' or another keyspace if you want.
    3. Enter the CQL:

      Code Block
      sql
      sql
      
      SELECT * FROM PageSuccessions where url = '--firstpage--';
      
      

      Or a different query if you want.
      The window should look like this:
      Image Modified
      Click 'OK' to close the window.

  4. Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.


    Congratulations! You've read data from Cassandra. Close the preview window.
  5. Add an Output Step: Expand the 'Output' section of the design palette. You can see that there are different output options – files, databases, and applications. There are more output options in the 'Bulk loading' section. For this example we will write to a text file, but you can experiment to other output destinations if you want. Drag a 'Text file output' step from the palette onto the canvas.
  6. Connect the Input and Output Steps: Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Text file output' step. Your canvas should look like this:
  7. Edit the Text File Output Step: Double click on the 'Text file output' step to edit its properties. Click on the 'Browse' button to select a destination for the file. Select a destination for the file by click.
  8. Define the Output Fields: Click on the 'Fields' tab, then click on the 'Get Fields' button. The table of fields will be populated based on the metadata of the fields coming out of the 'Cassandra Input' step.

    Click on 'OK' to close the 'Text file input' window.
  9. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'read_from_cassandra.ktr' into a folder of your choice.
  10. Run the Transformation: Choose 'Action' -> 'Run' from the menu system or click on the green run button on the transformation toolbar. A 'Execute a transformation' window will open. Click on the 'Launch' button. An 'Execution Results' panel will open at the bottom of the Spoon window and it will show you the progress of the transformation as it runs. After a few seconds the transformation should finish successfully:
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

...