Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Wiki Markup
{scrollbar}
{

Excerpt

...

How

...

to

...

unit

...

test

...

the

...

mapper

...

and

...

reducer

...

transformations

...

that

...

make

...

up

...

a

...

Pentaho

...

MapReduce

...

job.

...

Unit

...

testing

...

mapper

...

and

...

reducer

...

code

...

before

...

it

...

runs

...

on

...

a

...

cluster

...

can

...

generate

...

significant

...

development

...

time

...

savings.

...

Using

...

the

...

PDI

...

development

...

environment

...

to

...

debug

...

and

...

performance

...

test

...

mapper

...

and

...

reducer

...

code

...

is

...

more

...

productive

...

than

...

poring

...

over

...

Hadoop

...

logs!

...

Pentaho

...

recommends

...

that

...

you

...

unit

...

test

...

your

...

Pentaho

...

MapReduce

...

transformations

...

locally

...

before

...

running

...

them

...

on

...

the

...

cluster.

...

The

...

general

...

technique

...

is

...

to

...

stub

...

the

...

input

...

data

...

to

...

the

...

mapper

...

(or

...

reducer)

...

transformation

...

with

...

a

...

File

...

Input

...

or

...

generated

...

rows

...

of

...

data,

...

then

...

execute

...

the

...

transformation

...

in

...

preview

...

mode

...

to

...

ensure

...

that

...

it

...

is

...

processing

...

correctly.

...

In

...

the

...

steps

...

that

...

follow

...

you

...

will

...

create

...

a

...

stub

...

file.

...

Alternatively,

...

in

...

situations

...

where

...

the

...

key

...

field

...

is

...

not

...

important

...

or

...

the

...

original

...

file

...

contains

...

the

...

key

...

field

...

you

...

may

...

be

...

able

...

to

...

read

...

your

...

original

...

file

...

from

...

Hadoop

...

via

...

a

...

Hadoop

...

File

...

Input

...

step.

...

Prerequisites

In order to follow along with this guide you will need the following:

  • Pentaho Data Integration

Sample Files

None

Sample Code

This guide uses the weblog_parse_mapper.ktr

...

from

...

the

...

Using

...

Pentaho

...

MapReduce

...

to

...

Parse

...

Weblog

...

Data

...

in

...

MapR

...

guide.

...

If

...

you

...

have

...

completed

...

that

...

guide,

...

you

...

should

...

already

...

have

...

this

...

mapper.

...

Otherwise,

...

click

...

on

...

the

...

link

...

above

...

to

...

download

...

it.

...

Step-By-Step

...

Instructions

...

Setup

...

None

...

Create

...

Test

...

File

...

In

...

this

...

task

...

you

...

will

...

create

...

a

...

test

...

file

...

that

...

you

...

will

...

use

...

to

...

unit

...

test

...

your

...

transformation.

...

  1. In a text editor create a new file in key tab value format like your transformation would receive. For reducer transformations the keys must be in sorted order and should only contain one value per line. Repeat the key on multiple lines for multiple values.
    For this guide use:
    Code Block
    
    
    1	0.0.0.0 - - \[01/Jan/2011:12:00:00 -0500\] "GET /test.html HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19"
    175	1.1.1.1 - - \[02/Jan/2011:01:38:30 -0700\] "POST /test/test2.html HTTP/1.1" 200 0 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070719 CentOS/1.5.0.12-3.el5.centos Firefox/1.5.0.12"
    
    

...

Unit Test Transformation

In this task you will unit test the transformation. The same steps may be used for both mapper and reducer transformations.

  1. Start PDI on your desktop. Once it is running choose 'File' -> 'Open', browse to and select the 'weblog_parse_mapper.ktr',

...

  1. then

...

  1. click

...

  1. 'OK'.

...



  1. Add a Text File Input Step: You will provide the mapper with an alternate input step, so expand the 'Input' section of the Design palette and drag a 'Text File Input' step onto the job canvas.
    Image Added
    NOTE: You could also use a Hadoop File Input step to pull the test file from the Hadoop cluster.

  2. Connect the Text File Input and Regex Evaluation Steps: Hover the mouse over the 'Text File Input' node and a tooltip will appear. Image Added Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Regex Evaluation' node. Click 'OK' on the warning message. Your canvas should look like this:
    Image Added

  3. Edit the Text File Input Step: Double-click on the 'Text File Input' node to edit its properties. Enter this information:
    1. File or directory: Browse to the test file you created earlier.
    2. Click the Add button.
      When you are done your window should look like this:
      Image Added

  4. Configure the File Content: Switch to the 'Content' tab and enter the following information:
    1. Separator: Clear and click 'Insert TAB'
    2. Uncheck 'Header'
    3. Format: Select 'Mixed'
      Image Added

  5. Configure the Fields: Switch to the 'Fields' tab and enter the following information:
    1. Create a field with Name 'key' and Type 'String'
    2. Create a field with Name 'value' and Type 'String'
      Image Added

  6. Disable the Map/Reduce Input Hop: Disable the hop between Map/Reduce Input and RegEx Evaluation by clicking on the hop. The hop will turn gray.
    Image Added

  7. Run the Unit Test: Highlight then right click on the 'User Defined Java Expression' step and select 'Preview'. A 'Transformation debug dialog' will appear, click 'Quick Launch'. The results of the transformation will appear in the 'Examine preview data' window.
    Image Added
    Click 'Close' to close the window.

    Re-Configure Transformation to Run as MapReduce

  8. Disable the Text File Input Hop: Disable the hop between 'Text File Input' and 'Regex Evaluation' by clicking on it. It will turn gray.
    Image Added

  9. Enable Map/Reduce Input Hop: Enable the hop between 'Map/Reduce Input' and 'Regex Evaluation' by clicking on it. It will turn black.
    Image Added

  10. Save the Transformation

Summary

In this guide you learned how to unit test Pentaho MapReduce Transformations. It is recommended that you unit test your transformations in this way as debugging using Hadoop logs is both complex and time consuming.

Wiki Markup
{scrollbar}