Page Comparison

...

The sample data file needed for this guide is:

File Name	Content
weblogs_parse.txt\|Using Pentaho with MapR^weblogs_parse.txt.zip\|\	Unparsed, raw weblog data

...

Tip

title	Speed Tip

You can download the Kettle Job load_hive.kjb already completed

Start PDI on your desktop. Once it is running choose 'File' -> 'New' -> 'Job' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Job' option.
Add a Start Job Entry: You need to tell PDI where to start the job, so expand the 'General' section of the Design palette and drag a 'Start' job entry onto the job canvas. Your canvas should look like:
Image Modified
Add a Copy File Job Entry: You will need to copy the parsed file into the Hive table, so expand the 'File Management' section of the Design palette and drag a 'Copy Files' job entry onto the job canvas. Your canvas should look like:
Image Modified
Connect the Start and Copy Files job entries: Hover the mouse over the 'Start' job entry and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Copy Files' node. Your canvas should look like:
Image Modified
Edit the Copy Files Job Entry: Double-click on the 'Copy Files' job entry to edit its properties. Enter this information:
1. File/Folder source: maprfs://<CLDB>:<PORT>/weblogs/parse
  When running PDI on the same machine as the MapR cluster use: maprfs:///weblogs/parse the CLDB and port are not required.
  <CLDB> is the server name of the machine running the MapR CLDB.
  <PORT> is the port the MapR CLDB is running on.
2. File/Folder destination: maprfs://<CLDB>:<PORT>/user/hive/warehouse/weblogs
  When running PDI on the same machine as the MapR cluster use: maprfs:///user/hive/warehouse/weblogs the CLDB and port are not required.
  <CLDB> is the server name of the machine running the MapR CLDB.
  <PORT> is the port the MapR CLDB is running on.
3. Wildcard (RegExp): Enter 'part-.*'
4. Click the 'Add' button to add the files to the list of files to copy.

When you are done your window should look like (your folder path may be different):

Click 'OK' to close the window.
Notice that you could also load a local file into hive using this step. The file does not already have to be in MapR.

...

Open the Hive Shell: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line.
Query Hive for Data: Verify the data has been loaded to Hive by querying the weblogs table.
Code Block
select * from weblogs limit 10;
Close the Hive Shell: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell.

...

Versions Compared

Old Version 5

New Version 6

Key