Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{scrollbar}
{excerpt} How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.{excerpt}

h1. Prerequisites

In order follow along with this how-to guide you will need the following:
* Hadoop
* Pentaho Data Integration
* Hive

h1. Sample Files

The source data for this guide will reside in a Hive table called weblogs.    If you have previously completed the [Loading Data into Hive] guide, then you can skip to [#Create a Database Connection to Hive].   If not, then you will need the following datafile and perform the Create[#Create a Hive Table\] instructions before proceeding.
The sample data file needed for the [#Create a Hive Table] instructions is:
| File Name | Content |
| [weblogs_parse.txt.zip|Transforming Data within Hive in MapR^weblogs_parse.zip] | Tab-delimited, parsed weblog data |
NOTE: If you have previously completed the [Using Pentaho MapReduce to Parse Weblog Data] guide, then the necessary files will already be in the proper location.
This file should be placed in the /weblogs/parse directory of the CLDB using the following commands.
{code}
hadoop fs -mkdir /weblogs
hadoop fs -mkdir /weblogs/parse
hadoop fs -put weblogs_parse.txt /weblogs/parse/part-00000
{code}
\\

h1. Step-By-Step Instructions


h2. Setup

Start Hadoop if it is not already running.
Start Hive Server if it is not already running.
{anchor:Create a Hive Table}

{include:Include Transforming Data within Hive}