Wiki Markup |
---|
{scrollbar} {excerpt}How to create a report that sources data from Hive.{excerpt} h1. Prerequisites In order follow along with this how-to guide you will need the following: * MapR * Pentaho Data Integration * Hive * Report Designer * RDBMS Database h1. Sample Files The sample data file needed for this guide is: | File Name | Content | | [How To's^weblogs_hive.txt.zip]\\ | Tab-delimited, aggregated weblog data for a Hive weblogs_agg table | \\ NOTE: If you have already completed the [Transforming Data within Hive in MapR] guide, then the necessary Hive table, weblog_aggs, will already be created and populated. h1. Step-By-Step Instructions h2. Setup Start MapR if it is not already running. Start Hive Server if it is not already running. h2. Create a Hive Table NOTE: This task may be skipped if you have completed [Transforming Data within Hive in MapR] guide. # *Open the Hive Shell*: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line. # *Create the Table in Hive:* You need a hive table to load the data to, so enter the following in the hive shell. {code:sql} create table weblogs_agg ( client_ip string, year string, month string, month_num int, pageviews bigint ) row format delimited fields terminated by '\t'; {code} # *Close the Hive Shell*: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell. # *Load the Table:* Load the Hive table by running the following commands: {code} hadoop fs -put weblog_hive.txt /user/hive/warehouse/weblogs_agg/ {code} h2. Create a Hive Report In this task you will create a report that uses JDBC and HiveQL to report on data from MapR Hive. # *Start Report Designer on your desktop.* Once it is running choose 'File' \-> 'Report Wizard' \-> from the menu system. {tip:title=Speed Tip}You can download the Report definition [^HiveReportReporting on Hive Data in MapR^HiveReport.prpt] already completed{tip} # *Select a Template:* Report Wizard will automatically layout and do some basic formatting of your report, so select a template of your choice in the select box. When you are done your screen should look like: !worddav0c46e9dc56fae64269431deb52ad2076.png|height=388,width=530! Click 'Next' to go to the next screen. \\ \\ # *Create a Data Source:* You need to create the Hive query to select data for this report, so click the plus !worddav3ae20df9e3d0beb377e3379dfc837427.png|height=28,width=28! button and do the following: ## Choose Type 'JDBC' !worddavde72ff7851cff1df88c4dd0efc3b18bd.png|height=334,width=168! ## Select 'Sample Query' in the Available Queries box and delete it by clicking the !worddavd2236327c7744da4ab9d826d274b33c3.png|height=19,width=20! . ## Above the Connection box create a new connection by clicking the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! . ## *Create a Database Connection:* Click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the connection box and do the following: ### Connection Name: Enter 'Hive'. ### Connection Type: Select 'Hadoop Hive'. ### Host Name and Port Number: Your Hive connection information. For local single node clusters use 'localhost' and port '10000'. ### Database Name: Enter 'default' ### Leave User Name and Password empty. ### Test the connection by pressing the 'Test' button. !worddavaae73a5752ce5ea2065ab838a9b0d175.png|height=138,width=242!\\ When you are done your window should look like: \\ !worddave35debfeed0b77cb7e0223675a74240c.png|height=472,width=506!\\ Click 'OK' to close the window. \\ ## In the Connections box select 'Hive'. ## *Create a New Query:* You need a query to select the data from Hive for the report, so click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the 'Available Queries' box and do the following: ### Query Name: Enter 'Page Views'. ### Query: Enter the following {code:sql} Select year, month,month_num, sum(pageviews) as pageviews From weblogs_agg Group by year, month,month_num Order by year, month_num {code} \\ When you are done your window should look like: \\ !worddav5668023a92b2f71457516110eea268e2.png|height=285,width=542!\\ ## *Create a Second Query:* You need a second query to get the list of unique IP Addresses so you can use them in your report parameters, so click the !worddav16760cf056fcda5f0bd0180db9ab25b8.png|height=22,width=21! above the 'Available Queries' box and do the following: ### Query Name: Enter 'IP Addresses'. ### Query: Enter the following {code:sql} Select distinct client_ip from weblogs_agg {code} When you are done your window should look like: !worddav81dfc073e494bda256aa360e4ed2920e.png|height=254,width=485! Click 'OK' to close the window. ## Highlight the 'Page Views' query in the Report Design Wizard window. \\ Your Report Design Wizard window should now look like: !worddavac6c3f943997bd9a0921cf17453adcf7.png|height=344,width=469! Click 'Next' to go to the next screen. \\ \\ # *Set up Report Layout:* You need to tell Report Design Wizard how to lay out your report, so do the following: ## Group Items By: Add 'year' ## Selected Items: Add 'month' and 'pageviews' in that order. \\ When you are done your window should look like: !worddavabb9bbb8d0978221bd2390e01c91982b.png|height=354,width=484!\\ Click 'Finish' to complete to the design wizard. \\ \\ # *Create a Parameter:* You need to create a parameter so you will be able to select a specific IP address when running the report, so in the menu system select 'Data' \-> 'Add Parameter' and do the following: ## Select the 'JDBC (Hive)' data source. ## Name: Enter 'paramIPAddress'. ## Label: Enter 'IP Address' ## Value Type: Select 'String' ## Check 'Mandatory' ## Display Type: 'Drop Down' ## Query: Select 'IP Addresses' ## Value: Select 'client_ip' ## Display Name: Select 'client_ip' \\ When you are done your window should look like: !worddav80190193767c9570328d95fc6d60d65f.png|height=325,width=488!\\ Click 'OK' to close the window. \\ \\ # *Add Parameter to Query:* You need to add the parameter you just created to your Page Views query, so in the 'Data' pane expand 'JDBC: Hive', right click on 'Page Views' and select 'Edit Query'. !worddav4d58961bc071b706dafa390919506c8e.png|height=192,width=191!\\ Then do the following: ## In the Query box add the following line between the From and Group by lines: \\ Where client_ip = $\{paramIPAddress\} \\ Note there must be a space before the $. \\ ## When you are done your entire query will look like: \\ {code:sql} Select year, month,month_num, sum(pageviews) as pageviews From weblogs_agg Where client_ip = ${paramIPAddress} Group by year, month,month_num Order by year, month_num {code} Click 'OK' to close the window. \\ \\ # *Save the Report*: Choose 'File' \-> 'Save as...' from the menu system. Save the report as 'hive_report.prpt' into a folder of your choice. \\ \\ # *Preview the Report*: Choose 'File' \-> 'Preview' \-> 'Print Preview' from the menu system. A 'Print Preview' window will open. Select an 'IP Address' of your choice from the drop down. After a few seconds the report results will appear: !worddavf0e2275b9067ef601d4c9bd318fa37a0.png|height=494,width=494!\\ \\ # Do any desired formatting to the report. h1. Summary In this guide you learned how to construct a report that sources data from Hive{anchor:_GoBack}. {scrollbar} |
Page Comparison
Manage space
Manage content
Integrations