Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{scrollbar}
{excerpt}How to create a report that sources data from Hive.{excerpt}

h1. Prerequisites

In order follow along with this how-to guide you will need the following:
* MapR
* Pentaho Data Integration
* Hive
* Report Designer
* RDBMS Database

h1. Sample Files

The sample data file needed for this guide is:
| File Name | Content |
| [How To's^weblogs_hive.txt.zip]\\ | Tab-delimited, aggregated weblog data for a Hive weblogs_agg table |
\\
NOTE:  If you have already completed the [Transforming Data within Hive in MapR] guide, then the necessary Hive table, weblog_aggs, will already be created and populated.

h1. Step-By-Step Instructions


h2. Setup

Start MapR if it is not already running.
Start Hive Server if it is not already running.

h2. Create a Hive Table

NOTE: This task may be skipped if you have completed [Transforming Data within Hive in MapR] guide.

# *Open the Hive Shell*: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line.
# *Create the Table in Hive:* You need a hive table to load the data to, so enter the following in the hive shell.
{code:sql}
create table weblogs_agg (
    client_ip    string,
    year    string,
    month    string,
    month_num    int,
    pageviews    bigint
)
row format delimited
fields terminated by '\t';
{code}
# *Close the Hive Shell*: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell.
# *Load the Table:* Load the Hive table by running the following commands:
{code}
hadoop fs -put weblog_hive.txt /user/hive/warehouse/weblogs_agg/

{code}

h2. Create a Hive Report

In this task you will create a report that uses JDBC and HiveQL to report on data from MapR Hive.
# *Start Report Designer on your desktop.* Once it is running choose 'File' \-> 'Report Wizard' \-> from the menu system.
{tip:title=Speed Tip}You can download the Report definition [^HiveReportReporting on Hive Data in MapR^HiveReport.prpt] already completed{tip}
# *Select a Template:* Report Wizard will automatically layout and do some basic formatting of your report, so select a template of your choice in the select box.  When you are done your screen should look like:
!worddav0c46e9dc56fae64269431deb52ad2076.png|height=388,width=530!
Click 'Next' to go to the next screen.
\\
\\
# *Create a Data Source:* You need to create the Hive query to select data for this report, so click the plus !worddav3ae20df9e3d0beb377e3379dfc837427.png|height=28,width=28! button and do the following:
## Choose Type 'JDBC'
!worddavde72ff7851cff1df88c4dd0efc3b18bd.png|height=334,width=168!
## Select 'Sample Query' in the Available Queries box and delete it by clicking the !worddavd2236327c7744da4ab9d826d274b33c3.png|height=19,width=20! .
## Above the Connection box create a new connection by clicking the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! .
## *Create a Database Connection:* Click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the connection box and do the following:
### Connection Name: Enter 'Hive'.
### Connection Type: Select 'Hadoop Hive'.
### Host Name and Port Number: Your Hive connection information.  For local single node clusters use 'localhost' and port '10000'.
### Database Name: Enter 'default'
### Leave User Name and Password empty.
### Test the connection by pressing the 'Test' button.
!worddavaae73a5752ce5ea2065ab838a9b0d175.png|height=138,width=242!\\
When you are done your window should look like:
\\ !worddave35debfeed0b77cb7e0223675a74240c.png|height=472,width=506!\\
Click 'OK' to close the window.
\\
## In the Connections box select 'Hive'.
## *Create a New Query:* You need a query to select the data from Hive for the report, so click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the 'Available Queries' box and do the following:
### Query Name: Enter 'Page Views'.
### Query: Enter the following
{code:sql}
Select year, month,month_num, sum(pageviews) as pageviews
From weblogs_agg
Group by year, month,month_num
Order by year, month_num
{code}
\\
When you are done your window should look like:
\\ !worddav5668023a92b2f71457516110eea268e2.png|height=285,width=542!\\
## *Create a Second Query:* You need a second query to get the list of unique IP Addresses so you can use them in your report parameters, so click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the 'Available Queries' box and do the following:
### Query Name: Enter 'IP Addresses'.
### Query: Enter the following
{code:sql}
Select distinct client_ip from weblogs_agg
{code}
When you are done your window should look like:
!worddav81dfc073e494bda256aa360e4ed2920e.png|height=254,width=485!
Click 'OK' to close the window.
## Highlight the 'Page Views' query in the Report Design Wizard window.
\\
Your Report Design Wizard window should now look like:
!worddavac6c3f943997bd9a0921cf17453adcf7.png|height=344,width=469!
Click 'Next' to go to the next screen.
\\
\\
# *Set up Report Layout:* You need to tell Report Design Wizard how to lay out your report, so do the following:
## Group Items By: Add 'year'
## Selected Items: Add 'month' and 'pageviews' in that order.
\\
When you are done your window should look like:
!worddavabb9bbb8d0978221bd2390e01c91982b.png|height=354,width=484!\\
Click 'Finish' to complete to the design wizard.
\\
\\
# *Create a Parameter:* You need to create a parameter so you will be able to select a specific IP address when running the report, so in the menu system select 'Data' \-> 'Add Parameter' and do the following:
## Select the 'JDBC (Hive)' data source.
## Name: Enter 'paramIPAddress'.
## Label: Enter 'IP Address'
## Value Type: Select 'String'
## Check 'Mandatory'
## Display Type: 'Drop Down'
## Query: Select 'IP Addresses'
## Value: Select 'client_ip'
## Display Name: Select 'client_ip'
\\
When you are done your window should look like:
!worddav80190193767c9570328d95fc6d60d65f.png|height=325,width=488!\\
Click 'OK' to close the window.
\\
\\
# *Add Parameter to Query:* You need to add the parameter you just created to your Page Views query, so in the 'Data' pane expand 'JDBC: Hive', right click on 'Page Views' and select 'Edit Query'.
!worddav4d58961bc071b706dafa390919506c8e.png|height=192,width=191!\\
Then do the following:
## In the Query box add the following line between the From and Group by lines:
\\
Where client_ip = $\{paramIPAddress\}
\\
Note there must be a space before the $.
\\
## When you are done your entire query will look like:
\\
{code:sql}
Select year, month,month_num, sum(pageviews) as pageviews
From weblogs_agg
Where client_ip = ${paramIPAddress}
Group by year, month,month_num
Order by year, month_num
{code}
Click 'OK' to close the window.
\\
\\
# *Save the Report*: Choose 'File' \-> 'Save as...' from the menu system. Save the report as 'hive_report.prpt' into a folder of your choice.
\\
\\
# *Preview the Report*: Choose 'File' \-> 'Preview' \-> 'Print Preview' from the menu system. A 'Print Preview' window will open. Select an 'IP Address' of your choice from the drop down.  After a few seconds the report results will appear:
!worddavf0e2275b9067ef601d4c9bd318fa37a0.png|height=494,width=494!\\
\\
# Do any desired formatting to the report.

h1. Summary

In this guide you learned how to construct a report that sources data from Hive{anchor:_GoBack}.
{scrollbar}