...
This section contains a series of How-Tos
...
that
...
demonstrate
...
the
...
integration
...
between
...
Pentaho
...
and
...
MapR
...
using
...
a
...
sample
...
weblog
...
dataset.
...
The
...
how-tos
...
are
...
organized
...
by
...
topic with
...
each
...
set
...
explaining
...
various
...
techniques
...
for
...
loading,
...
transforming,
...
extracting
...
and
...
reporting
...
on
...
data
...
within
...
a
...
MapR
...
cluster.
...
You
...
are
...
encouraged
...
to
...
perform
...
the
...
how-tos
...
in
...
order
...
as
...
the
...
output
...
of
...
one
...
is
...
sometimes used
...
as
...
the
...
input
...
of
...
another.
...
However,
...
if
...
you
...
would
...
like
...
to
...
jump
...
to
...
a
...
how-to
...
in
...
the
...
middle
...
of
...
the
...
flow,
...
instructions
...
for
...
preparing
...
input
...
data
...
are
...
provided.
MapR Topics
Child pages (Children Display) | ||||||
---|---|---|---|---|---|---|
|
Pre-Requisites
...
In
...
order
...
to
...
perform
...
all
...
of
...
the
...
how-tos
...
in
...
this
...
section,
...
you
...
will
...
need
...
the
...
following.
...
Since
...
not
...
every
...
how-to
...
uses
...
every
...
component
...
(e.g.
...
HBase,
...
Hive,
...
ReportDesigner),
...
specific
...
component
...
requirements
...
will
...
be
...
identified
...
within
...
each
...
how-to.
...
This
...
section
...
enumerates
...
all
...
of
...
the
...
components
...
with
...
some
...
additional
...
configuration
...
and
...
installation
...
tips.
...
MapR
A single-node
...
local
...
cluster
...
is
...
sufficient
...
for
...
these
...
exercises
...
but
...
a
...
larger
...
and/or
...
remote
...
configuration
...
will
...
also
...
work.
...
You
...
will
...
need
...
to
...
know
...
the
...
addresses
...
and
...
ports
...
for
...
MapR.
...
These
...
guides
...
were
...
developed
...
using
...
the
...
MapR
...
M3
...
distribution
...
version
...
1.2.
...
You
...
can
...
find
...
MapR
...
downloads
...
here:
...
...
Kettle
A desktop installation of the Kettle design tool called 'Spoon'. Download here and configure instructions are here
Pentaho Hadoop Distribution
A Hadoop node distribution of the Pentaho Data Integration (PDI) tool. Pentaho Hadoop Distribution (referred to as PHD from this point on) allows you to execute Pentaho MapReduce jobs on the MapR cluster. Download here and configure instructions are here
Pentaho Report Designer
Pentaho Report Designer (PRD) is a desktop tool for creating highly formatted reports that can be exported to many popular formats. Reports created with PRD can be published to a Pentaho BI Server so they can be accessed using a browser. Download here and configure instructions are here
Hive
A MapR supported version of Hive. Hive is a Map/Reduce abstraction layer that provides SQL-like access to MapR data.
You can find instructions to install Hive for MapR here: http://mapr.com/doc/display/MapR/Hive
...
HBase
A MapR supported version of HBase. HBase is a NoSQL database that leverages the MapR filesystem.
You can find instructions to install HBase for MapR here: http://mapr.com/doc/display/MapR/HBase
...
Sample Data
The how-to’s in this guide were built with sample weblog data. The following files which are used and/or generated by the how-to’s in this guide. Each specific how-to will explain which file(s) it requires.
File Name | Content |
Unparsed, raw weblog data | |
Tab-delimited, parsed weblog data | |
Tab-delimited, aggregated weblog data for a Hive weblogs_agg table | |
Tab-delimited, aggregated weblog data | |
Prepared data for HBase load |