Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Image Added This section contains a series of How-Tos

...

that

...

demonstrate

...

the

...

integration

...

between

...

Pentaho

...

and

...

MapR

...

using

...

a

...

sample

...

weblog

...

dataset.

...

The

...

how-tos

...

are

...

organized

...

by

...

function

...

with

...

each

...

set

...

explaining

...

various

...

techniques

...

for

...

loading,

...

transforming,

...

extracting

...

and

...

reporting

...

on

...

data

...

within

...

a

...

MapR

...

cluster.

...

You

...

are

...

encouraged

...

to

...

perform

...

the

...

how-tos

...

in

...

order

...

as

...

often

...

the

...

output

...

of

...

one

...

is

...

used

...

as

...

the

...

input

...

of

...

another.

...

However,

...

if

...

you

...

would

...

like

...

to

...

jump

...

to

...

a

...

how-to

...

in

...

the

...

middle

...

of

...

the

...

flow,

...

instructions

...

for

...

preparing

...

input

...

data

...

are

...

provided.

...

Pre-Requisites

...

In

...

order

...

to

...

perform

...

all

...

of

...

the

...

how-tos

...

in

...

this

...

section,

...

you

...

will

...

need

...

the

...

following.

...

Since

...

not

...

every

...

how-to

...

uses

...

every

...

component

...

(e.g.

...

HBase,

...

Hive,

...

ReportDesigner),

...

specific

...

component

...

requirements

...

will

...

be

...

identified

...

within

...

each

...

how-to.

...

This

...

section

...

enumerates

...

all

...

of

...

the

...

components

...

with

...

some

...

additional

...

configuration

...

and

...

installation

...

tips.

...

MapR

A single-node

...

local

...

cluster

...

is

...

sufficient

...

for

...

these

...

exercises

...

but

...

a

...

larger

...

and/or

...

remote

...

configuration

...

will

...

also

...

work.

...

You

...

will

...

need

...

to

...

know

...

the

...

addresses

...

and

...

ports

...

for

...

MapR.

...

These

...

guides

...

were

...

developed

...

using

...

the

...

MapR

...

M3

...

distribution

...

version

...

1.2.

...

You

...

can

...

find

...

MapR

...

downloads

...

here:

...

http://mapr.com/download

...

Pentaho Data Integration

PDI will be the primary development environment for the how-tos.

...

You

...

will

...

need

...

version

...

[TODO

...

].

...

You

...

can

...

download

...

the

...

software

...

here:

...

[TODO

...

]

...

Pentaho

...

Hadoop

...

Distribution

...

A

...

Hadoop

...

node

...

distribution

...

of

...

the

...

Pentaho

...

Data

...

Integration

...

(PDI)

...

tool.

...

Pentaho

...

Hadoop

...

Distribution

...

(referred

...

to

...

as

...

PHD

...

from

...

this

...

point

...

on)

...

allows

...

you

...

to

...

execute

...

Pentaho

...

MapReduce

...

jobs

...

on

...

the

...

MapR

...

cluster.

...

You

...

can

...

find

...

instructions

...

to

...

download

...

and

...

install

...

the

...

software

...

here:

...

[TODO

...

]

...

Pentaho

...

Report

...

Designer

...

A

...

desktop

...

installation

...

of

...

Pentaho

...

Report

...

Designer

...

tool

...

called

...

with

...

the

...

PDI

...

jars

...

in

...

the

...

lib

...

directory.

...

You

...

must

...

copy

...

all

...

jars

...

from

...

PDI's

...

libext

...

directory

...

and

...

sub

...

folders

...

with

...

the

...

exception

...

of

...

the

...

JDBC

...

folder

...

into

...

Report

...

Designers

...

lib

...

directory.

...

Hive

A MapR supported version of Hive. Hive is a Map/Reduce

...

abstraction

...

layer

...

that

...

provides

...

SQL-like

...

access

...

to

...

MapR

...

data.

...

You

...

can

...

find

...

instructions

...

to

...

install

...

Hive

...

for

...

MapR

...

here:

...

http://mapr.com/doc/display/MapR/Hive

...

HBase

A MapR supported version of HBase. HBase is a NoSQL database that leverages MapR's CLDB storage.

You can find instructions to install HBase for MapR here: http://mapr.com/doc/display/MapR/HBase

...

Sample Data

The how-to's

...

in

...

this

...

guide

...

were

...

built

...

with

...

sample

...

weblog

...

data.

...

The

...

following

...

files

...

which

...

are

...

used

...

and/or

...

generated

...

by

...

the

...

how-to's

...

in

...

this

...

guide.

...

Each

...

specific

...

how-to

...

will

...

explain

...

which

...

file(s)

...

it

...

requires.

...

File

...

Name

...

...

Unparsed,

...

raw

...

weblog

...

data

weblogs_parse.txt

...

Tab-delimited,

...

parsed

...

weblog

...

data

weblogs_hive.txt

...

Tab-delimited,

...

aggregated

...

weblog

...

data

...

for

...

a

...

Hive

...

weblogs_agg

...

table

weblogs_aggregate.txt

...

Tab-delimited,

...

aggregated

...

weblog

...

data

webogs_hbase.txt

...

Prepared

...

data

...

for

...

HBase

...

load

Child pages (Children Display)