Reporting on HBase Data in MapR

Reporting on HBase Data in MapR

Unknown macro: {scrollbar}

How to create a report that sources data from HBase.

In order to report on HBase data in a parameterized fashion, you will need to perform the following steps:

  • Create a PDI Transformation that sources the picklist of distinct IP Address to be used for a report parameter

  • Create a PDI Transformation that sources a set of weblog data for a selected IP Address. This will be the primary data source for the report

  • Create a Report that uses the PDI transformations for parameter list and report data.

Prerequisites

In order follow along with this how-to guide you will need the following:

  • MapR

  • Pentaho Data Integration

  • HBase

  • Pentaho Report Designer

  • Loading Data into HBase guide completed

Sample Files

There are no sample files for this guide. The Loading Data into MapR HBase Guide must be completed prior to starting this guide as it loads the sample HBase table data.

Step-By-Step Instructions

Setup

Start MapR if it is not already running.

Create a HBase Parameter Picklist PDI Transformation

In this task you will create a PDI transformation to get a list of distinct IP Addresses from HBase. This transformation will later be used as the data source for a report parameter.

  1. Start PDI on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

  2. Add a HBase Input Step: You are going to read data from a HBase table, so expand the 'Big Data' section of the Design palette and drag a 'HBase Input' node onto the transformation canvas. Your transformation should look like:


  3. Edit the HBase Input Step: Double-click on the 'HBase Input' node to edit its properties. Do the following:

    1. Zookeeper host(s) and Zookeeper port: Enter your Zookeeper connection information. For local single node clusters use 'localhost' and port '5181'.

    2. Create a Key Only Mapping: Switch to the 'Create/Edit mappings' tab and enter the following.

      1. HBase table name: Select 'weblogs'

      2. Mapping name: Enter 'key_only'

      3. Alias: Enter 'key'

      4. Key: Select 'Y'

      5. Type: Select 'String'

      6. Click Save Mapping
        When you are done your window should look like this:


    3. Configure Query: Switch to the 'Configure Query' tab and do the following.

      1. HBase table name: Click the 'Get mapped table names' button, then select 'weblogs'.

      2. Mapping name: Click the 'Get mappings for the specified table' button, then select 'key_only'.

      3. Click the 'Get Key/Fields Info' button.
        When you are done your window should look like:


        Click 'OK' to close the window.

  4. Add a Split Fields Step: You need to split the key field currently formatted as client_ip|year into two separate fields, so expand the 'Transform' section of the Design palette and drag a 'Split Fields' node onto the transformation canvas. Your transformation should look like:


  5. Connect the Input and Split Fields steps: Hover the mouse over the 'HBase Input' node and a tooltip will appear.

    Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Split Fields' node. Your canvas should look like this:

  6. Edit the Split Fields Step: Double-click on the 'Split Fields' node to edit its properties. Do the following:

    1. Field to split: Select 'key'.

    2. Delimiter: Enter '|'

    3. Fields:


      When you are done your window should look like:


      Click 'OK' to close the window.

  7. Add a Sort Rows Step: You need to sort the HBase data, so expand the 'Transform' section of the Design palette and drag a 'Sort rows' node onto the transformation canvas. Your transformation should look like:


  8. Connect the Split Fields and Sort steps: Hover the mouse over the 'Split Fields' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Sort rows' node. Your canvas should look like this:


  9. Edit the Sort Step: Double-click on the 'Sort rows' node to edit its properties. Enter this information:

    1. Check 'Only pass unique rows? (verifies keys only)'

    2. Fields: Add 'client_ip' sorted in ascending order.
      When you are done your window should look like this:


      Click 'OK' to close the window.

  10. Add a Dummy Step: You need a component for the report to select it's data from, so expand the 'Flow' section of the Design palette and drag a 'Dummy (do nothing)' node onto the transformation canvas. Your transformation should look like:


  11. Connect the Sort and Dummy steps: Hover the mouse over the 'Sort rows' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Dummy (do nothing)' node. Your canvas should look like this:


  12. Edit the Dummy Step: Double-click on the 'Dummy (do nothing)' node to edit its properties. Set the Step name to 'Output'. When you are done your window should look like:


    Click 'OK' to close the window.

  13. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'hbase_ip_list.ktr' into a folder of your choice.