...
This section contains a series of How-Tos
...
that
...
demonstrate
...
the
...
integration
...
between
...
Pentaho
...
and
...
MapR
...
using
...
a
...
sample
...
weblog
...
dataset.
...
The
...
how-tos
...
are
...
organized
...
by
...
function
...
with
...
each
...
set
...
explaining
...
various
...
techniques
...
for
...
loading,
...
transforming,
...
extracting
...
and
...
reporting
...
on
...
data
...
within
...
a
...
MapR
...
cluster.
...
You
...
are
...
encouraged
...
to
...
perform
...
the
...
how-tos
...
in
...
order
...
as
...
often
...
the
...
output
...
of
...
one
...
is
...
used
...
as
...
the
...
input
...
of
...
another.
...
However,
...
if
...
you
...
would
...
like
...
to
...
jump
...
to
...
a
...
how-to
...
in
...
the
...
middle
...
of
...
the
...
flow,
...
instructions
...
for
...
preparing
...
input
...
data
...
are
...
provided.
...
Pre-Requisites
...
In
...
order
...
to
...
perform
...
all
...
of
...
the
...
how-tos
...
in
...
this
...
section,
...
you
...
will
...
need
...
the
...
following.
...
Since
...
not
...
every
...
how-to
...
uses
...
every
...
component
...
(e.g.
...
HBase,
...
Hive,
...
ReportDesigner),
...
specific
...
component
...
requirements
...
will
...
be
...
identified
...
within
...
each
...
how-to.
...
This
...
section
...
enumerates
...
all
...
of
...
the
...
components
...
with
...
some
...
additional
...
configuration
...
and
...
installation
...
tips.
...
MapR
A single-node
...
local
...
cluster
...
is
...
sufficient
...
for
...
these
...
exercises
...
but
...
a
...
larger
...
and/or
...
remote
...
configuration
...
will
...
also
...
work.
...
You
...
will
...
need
...
to
...
know
...
the
...
addresses
...
and
...
ports
...
for
...
MapR.
...
These
...
guides
...
were
...
developed
...
using
...
the
...
MapR
...
M3
...
distribution
...
version
...
1.2.
...
You
...
can
...
find
...
MapR
...
downloads
...
here:
...
...
Pentaho Data Integration
PDI will be the primary development environment for the how-tos.
...
You
...
will
...
need
...
version
...
[TODO
...
].
...
You
...
can
...
download
...
the
...
software
...
here:
...
[TODO
...
]
...
Pentaho
...
Hadoop
...
Distribution
...
A
...
Hadoop
...
node
...
distribution
...
of
...
the
...
Pentaho
...
Data
...
Integration
...
(PDI)
...
tool.
...
Pentaho
...
Hadoop
...
Distribution
...
(referred
...
to
...
as
...
PHD
...
from
...
this
...
point
...
on)
...
allows
...
you
...
to
...
execute
...
Pentaho
...
MapReduce
...
jobs
...
on
...
the
...
MapR
...
cluster.
...
You
...
can
...
find
...
instructions
...
to
...
download
...
and
...
install
...
the
...
software
...
here:
...
[TODO
...
]
...
Pentaho
...
Report
...
Designer
...
A
...
desktop
...
installation
...
of
...
Pentaho
...
Report
...
Designer
...
tool
...
called
...
with
...
the
...
PDI
...
jars
...
in
...
the
...
lib
...
directory.
...
You
...
must
...
copy
...
all
...
jars
...
from
...
PDI's
...
libext
...
directory
...
and
...
sub
...
folders
...
with
...
the
...
exception
...
of
...
the
...
JDBC
...
folder
...
into
...
Report
...
Designers
...
lib
...
directory.
...
Hive
A MapR supported version of Hive. Hive is a Map/Reduce
...
abstraction
...
layer
...
that
...
provides
...
SQL-like
...
access
...
to
...
MapR
...
data.
...
You
...
can
...
find
...
instructions
...
to
...
install
...
Hive
...
for
...
MapR
...
here:
...
http://mapr.com/doc/display/MapR/Hive
...
HBase
A MapR supported version of HBase. HBase is a NoSQL database that leverages MapR's CLDB storage.
You can find instructions to install HBase for MapR here: http://mapr.com/doc/display/MapR/HBase
...
Sample Data
The how-to's
...
in
...
this
...
guide
...
were
...
built
...
with
...
sample
...
weblog
...
data.
...
The
...
following
...
files
...
which
...
are
...
used
...
and/or
...
generated
...
by
...
the
...
how-to's
...
in
...
this
...
guide.
...
Each
...
specific
...
how-to
...
will
...
explain
...
which
...
file(s)
...
it
...
requires.
...
File |
...
Name |
...
Content | |
...
Unparsed, |
...
raw |
...
weblog |
...
data |
weblogs_parse.txt |
...
Tab-delimited, |
...
parsed |
...
weblog |
...
data |
weblogs_hive.txt |
...
Tab-delimited, |
...
aggregated |
...
weblog |
...
data |
...
for |
...
a |
...
Hive |
...
weblogs_agg |
...
table |
weblogs_aggregate.txt |
...
Tab-delimited, |
...
aggregated |
...
weblog |
...
data |
webogs_hbase.txt |
...
Prepared |
...
data |
...
for |
...
HBase |
...
load |
Child pages (Children Display) |
---|