Wiki Markup |
---|
{scrollbar} {excerpt} How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.{excerpt} h1. Prerequisites In order follow along with this how-to guide you will need the following: * Hadoop * Pentaho Data Integration * Hive h1. Sample Files The source data for this guide will reside in a Hive table called weblogs. If you have previously completed the [Loading Data into Hive] guide, then you can skip to [#Create a Database Connection to Hive]. If not, then you will need the following datafile and perform the Create[#Create a Hive Table\] instructions before proceeding. The sample data file needed for the [#Create a Hive Table] instructions is: | File Name | Content | | [weblogs_parse.txt.zip|Transforming Data within Hive in MapR^weblogs_parse.zip] | Tab-delimited, parsed weblog data | NOTE: If you have previously completed the [Using Pentaho MapReduce to Parse Weblog Data] guide, then the necessary files will already be in the proper location. This file should be placed in the /weblogs/parse directory of the CLDB using the following commands. {code} hadoop fs -mkdir /weblogs hadoop fs -mkdir /weblogs/parse hadoop fs -put weblogs_parse.txt /weblogs/parse/part-00000 {code} \\ h1. Step-By-Step Instructions h2. Setup Start Hadoop if it is not already running. Start Hive Server if it is not already running. {anchor:Create a Hive Table} {include:Include Transforming Data within Hive} |
Page Comparison
Manage space
Manage content
Integrations