Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Image Added This section contains a series of How-Tos

...

that

...

demonstrate

...

the

...

integration

...

between

...

Pentaho

...

and

...

MapR

...

using

...

a

...

sample

...

weblog

...

dataset.

...

The

...

how-tos

...

are

...

organized

...

by

...

topic with

...

each

...

set

...

explaining

...

various

...

techniques

...

for

...

loading,

...

transforming,

...

extracting

...

and

...

reporting

...

on

...

data

...

within

...

a

...

MapR

...

cluster.

...

You

...

are

...

encouraged

...

to

...

perform

...

the

...

how-tos

...

in

...

order

...

as

...

the

...

output

...

of

...

one

...

is

...

sometimes used

...

as

...

the

...

input

...

of

...

another.

...

However,

...

if

...

you

...

would

...

like

...

to

...

jump

...

to

...

a

...

how-to

...

in

...

the

...

middle

...

of

...

the

...

flow,

...

instructions

...

for

...

preparing

...

input

...

data

...

are

...

provided.

MapR Topics

Child pages (Children Display)
depth2
excerpttrue
excerptTypesimple

Pre-Requisites

...

In

...

order

...

to

...

perform

...

all

...

of

...

the

...

how-tos

...

in

...

this

...

section,

...

you

...

will

...

need

...

the

...

following.

...

Since

...

not

...

every

...

how-to

...

uses

...

every

...

component

...

(e.g.

...

HBase,

...

Hive,

...

ReportDesigner),

...

specific

...

component

...

requirements

...

will

...

be

...

identified

...

within

...

each

...

how-to.

...

This

...

section

...

enumerates

...

all

...

of

...

the

...

components

...

with

...

some

...

additional

...

configuration

...

and

...

installation

...

tips.

...

MapR

A single-node

...

local

...

cluster

...

is

...

sufficient

...

for

...

these

...

exercises

...

but

...

a

...

larger

...

and/or

...

remote

...

configuration

...

will

...

also

...

work.

...

You

...

will

...

need

...

to

...

know

...

the

...

addresses

...

and

...

ports

...

for

...

MapR.

...

These

...

guides

...

were

...

developed

...

using

...

the

...

MapR

...

M3

...

distribution

...

version

...

1.2.

...

You

...

can

...

find

...

MapR

...

downloads

...

here:

...

http://mapr.com/download

...

Kettle

A desktop installation of the Kettle design tool called 'Spoon'. Download here and configure instructions are here

Pentaho Hadoop Distribution

A Hadoop node distribution of the Pentaho Data Integration (PDI) tool. Pentaho Hadoop Distribution (referred to as PHD from this point on) allows you to execute Pentaho MapReduce jobs on the MapR cluster. Download here and configure instructions are here

Pentaho Report Designer

Pentaho Report Designer (PRD) is a desktop tool for creating highly formatted reports that can be exported to many popular formats. Reports created with PRD can be published to a Pentaho BI Server so they can be accessed using a browser. Download here and configure instructions are here

Hive

A MapR supported version of Hive. Hive is a Map/Reduce abstraction layer that provides SQL-like access to MapR data.

You can find instructions to install Hive for MapR here: http://mapr.com/doc/display/MapR/Hive

...

HBase

A MapR supported version of HBase. HBase is a NoSQL database that leverages the MapR filesystem.

You can find instructions to install HBase for MapR here: http://mapr.com/doc/display/MapR/HBase

...

Sample Data

The how-to’s in this guide were built with sample weblog data. The following files which are used and/or generated by the how-to’s in this guide. Each specific how-to will explain which file(s) it requires.

File Name

Content

weblogs_rebuild.txt.zip

Unparsed, raw weblog data

weblogs_parse.txt.zip

Tab-delimited, parsed weblog data

weblogs_hive.txt.zip

Tab-delimited, aggregated weblog data for a Hive weblogs_agg table

weblogs_aggregate.txt.zip

Tab-delimited, aggregated weblog data

weblogs_hbase.txt.zip

Prepared data for HBase load