Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

unmigrated-inline-wiki-markup
{scrollbar}

...

Excerpt

...

How

...

to

...

use

...

a

...

PDI

...

transformation

...

to

...

extract

...

data

...

from

...

HBase

...

and

...

load

...

it

...

into

...

a

...

RDBMS

...

table.

...

The

...

new

...

RDBMS

...

table

...

will

...

contain

...

the

...

count

...

of

...

page

...

views

...

by

...

IP

...

address

...

and

...

month.

...

:=}
Info
title
Note

For

brevity's

sake,

this

transformation

will

only

contain

three

steps:

HBase

Input,

Split

Fields

and

Table

Output.

In

practice,

the

full

expressiveness

of

the

PDI

transformation

semantic

is

available.

Further,

PDI

supports

bulk

loading

for

many

RDBMS

and

that

would

be

a

viable,

and

common,

alternative

to

using

a

Table

Output

approach.

{info} h1. Prerequisites In order to follow along with this

Prerequisites

In order to follow along with this how-to

...

guide

...

you

...

will

...

need

...

the

...

following:

...

Sample Files

There are no sample files for this guide. The Loading Data into HBase guide must be completed prior to starting this guide as it loads the sample data.

Step-By-Step

...

Instructions

Setup

Start Hadoop if it is not already running.

Start HBase if it is not already running.

Include Page
Create RDBMS Connection
Create RDBMS Connection

Create a Transformation to Extract Data from HBase

In this task you will create a transformation to extract data from HBase and load into a RDBMS table.

Tip
titleSpeed Tip

You can download the Kettle Transform hbase_to_rdbms.ktr

]

already

completed{tip} \\ # *Add a HBase Input Step:* You are going to extract data from HBase, so expand the 'Big Data' section of the Design palette and drag a 'HBase Input' node onto the transformation canvas. Your transformation should look like: !BAD:Common Images^AddHBaseInput.PNG|width=300,height=206!\\ \\ # *Edit the HBase Input* *Step*: Double-click on the 'HBase Input' node to edit its properties. Do the following: ## Zookeeper host(s) and Zookeeper port: Your HBase Zookeeper connection information. For a local single node cluster your host is 'localhost' and your port is '2181'. ## HBase table name: Click 'Get mapped table names' and select 'weblogs'. ## Mapping name: Click 'Get mappings for the specified table' and select 'pageviews'. ## Click the 'Get Key/Fields Info' button to populate the grid \\ When you are done your window should look like: !EditHBaseInput.PNG|width=624,height=408! Click 'OK' to close the window. \\ \\ # *Add a Split Fields Step:* You need to split the key field which is client_ip\|year into two fields, so expand the 'Transform' section of the Design palette and drag a 'Split Fields' node onto the transformation canvas. Your transformation should look like: !BAD:Common Images^AddSplitFields.png|width=350,height=353!\\ # *Connect the Input and Split Fields steps*: Hover the mouse over the 'HBase Input' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Split Fields' node. Your canvas should look like this: !BAD:Common Images^ConnectHBaseInputandSplitFields.png!\\ \\ # *Edit the Split Fields Step*: Double-click on the 'Split Fields' node to edit its properties. Do the following: ## Field to split: Select 'key'. ## Delimiter: Enter '\|'. ## Fields: Add the following: | New field | Type | | client_ip | String | | year | Integer | \\ When you are done your window should look like: !BAD:Common Images^ConfigureSplitFields.png|width=297,height=194! Click 'OK' to close the window. \\ \\ # *Add a Table Output Step:* You want to write the values to a RDBMS, so expand the 'Output' section of the Design palette and drag a 'Table Output' node onto the transformation canvas. Your transformation should look like: !BAD:Common Images^AddTableOutputHBase.png|width=389,height=376!\\ \\ # *Connect the Split Fields and Table Output steps*: Hover the mouse over the 'Split Fields' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Table Output' node. Your canvas should look like this: !BAD:Common Images^ConnectSplitFieldsandTableOutput.png!\\ \\ # *Edit the Table Output* *Step*: Double-click on the 'Table Output' node to edit its properties. Do the following: ## Connection: Select 'RDBMS' ## Target Table: Enter 'aggregate_hbase' ## Check 'Truncate table' so you can re-run this transformation. ## Click the 'SQL' button to create the table in your target database. !BAD:Common Images^HBaseSimpleSQLEditor.png|width=236,height=197! ## Click the 'Execute' button to run the SQL. !BAD:Common Images^HBaseSQLResults.png|width=240,height=301! The 'Results of the SQL statements' window will appear telling you if the SQL was successfully executed or give you troubleshooting information if the SQL failed. ## Click 'OK' to close the window. ## Click 'Close' to close the 'Simple SQL editor' window. \\ When you are done your window should look like: !BAD:Common Images^ConfigureTableOutputHBase.png! Click 'OK' to close the window. \\ \\ # *Save the Transformation*: Choose 'File' \-> 'Save as...' from the menu system. Save the transformation as

completed


  1. Add a HBase Input Step: You are going to extract data from HBase, so expand the 'Big Data' section of the Design palette and drag a 'HBase Input' node onto the transformation canvas. Your transformation should look like:
    Image Added

  2. Edit the HBase Input Step: Double-click on the 'HBase Input' node to edit its properties. Do the following:
    1. Zookeeper host(s) and Zookeeper port: Your HBase Zookeeper connection information. For a local single node cluster your host is 'localhost' and your port is '2181'.
    2. HBase table name: Click 'Get mapped table names' and select 'weblogs'.
    3. Mapping name: Click 'Get mappings for the specified table' and select 'pageviews'.
    4. Click the 'Get Key/Fields Info' button to populate the grid
      When you are done your window should look like:
      Image Added
      Click 'OK' to close the window.

  3. Add a Split Fields Step: You need to split the key field which is client_ip|year into two fields, so expand the 'Transform' section of the Design palette and drag a 'Split Fields' node onto the transformation canvas. Your transformation should look like:
    Image Added
  4. Connect the Input and Split Fields steps: Hover the mouse over the 'HBase Input' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Split Fields' node. Your canvas should look like this:
    Image Added

  5. Edit the Split Fields Step: Double-click on the 'Split Fields' node to edit its properties. Do the following:
    1. Field to split: Select 'key'.
    2. Delimiter: Enter '|'.
    3. Fields: Add the following:

      New field

      Type

      client_ip

      String

      year

      Integer


      When you are done your window should look like:
      Image Added
      Click 'OK' to close the window.

  6. Add a Table Output Step: You want to write the values to a RDBMS, so expand the 'Output' section of the Design palette and drag a 'Table Output' node onto the transformation canvas. Your transformation should look like:
    Image Added

  7. Connect the Split Fields and Table Output steps: Hover the mouse over the 'Split Fields' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Table Output' node. Your canvas should look like this:
    Image Added

  8. Edit the Table Output Step: Double-click on the 'Table Output' node to edit its properties. Do the following:
    1. Connection: Select 'RDBMS'
    2. Target Table: Enter 'aggregate_hbase'
    3. Check 'Truncate table' so you can re-run this transformation.
    4. Click the 'SQL' button to create the table in your target database.
      Image Added
    5. Click the 'Execute' button to run the SQL.
      Image Added
      The 'Results of the SQL statements' window will appear telling you if the SQL was successfully executed or give you troubleshooting information if the SQL failed.
    6. Click 'OK' to close the window.
    7. Click 'Close' to close the 'Simple SQL editor' window.
      When you are done your window should look like:
      Image Added
      Click 'OK' to close the window.

  9. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'hbase_to_rdbms.ktr'

...

  1. into

...

  1. a

...

  1. folder

...

  1. of

...

  1. your

...

  1. choice.

...



  1. Run the Transformation: Choose 'Action'

...

  1. ->

...

  1. 'Run'

...

  1. from

...

  1. the

...

  1. menu

...

  1. system

...

  1. or

...

  1. click

...

  1. on

...

  1. the

...

  1. green

...

  1. run

...

  1. button

...

  1. on

...

  1. the

...

  1. transformation

...

  1. toolbar.

...

  1. A

...

  1. 'Execute

...

  1. a

...

  1. transformation'

...

  1. window

...

  1. will

...

  1. open.

...

  1. Click

...

  1. on

...

  1. the

...

  1. 'Launch'

...

  1. button.

...

  1. An

...

  1. 'Execution

...

  1. Results'

...

  1. panel

...

  1. will

...

  1. open

...

  1. at

...

  1. the

...

  1. bottom

...

  1. of

...

  1. the

...

  1. PDI

...

  1. window

...

  1. and

...

  1. it

...

  1. will

...

  1. show

...

  1. you

...

  1. the

...

  1. progress

...

  1. of

...

  1. the

...

  1. transformation

...

  1. as

...

  1. it

...

  1. runs.

...

  1. After

...

  1. several

...

  1. seconds

...

  1. the

...

  1. transformation

...

  1. should

...

  1. finish

...

  1. successfully:

...

  1. Image Added
    If any errors occurred the job step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

Check RDBMS for HBase Aggregated Table

  1. Explore the Database: Choose 'Tools' -> 'Database' -> 'Explore' from the menu system.

  2. Select the Connection: In the 'Make your selection' window select 'RDBMS' and click 'OK'.
    Image Added

  3. Preview the Table: Expand RDBMS -> Tables. Right click on 'aggregate_hbase' and select 'Preview first 100'
    Image Added

Summary

During this guide you learned how to use a PDI transformation to extract data from HBase and load into an RDBMS table.

Wiki Markup
{scrollbar}