Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  1. Open the Hive Shell: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line.

  2. Create the Table in Hive: You need a hive table to load the data to, so enter the following in the hive shell.
    Code Block
    create table weblogs (
        client_ip    string,
        full_request_date string,
        day    string,
        month    string,
        month_num int,
        year    string,
        hour    string,
        minute    string,
        second    string,
        timezone    string,
        http_verb    string,
        uri    string,
        http_status_code    string,
        bytes_returned        string,
        referrer        string,
        user_agent    string)
    row format delimited
    fields terminated by '\t';
    
  3. Close the Hive Shell: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell.

  4. Load the Table: Load the Hive table by running the following commands:
    Code Block
    hadoop fs -put part-00000.txt /user/hive/warehouse/weblogs/
Include Page
BAD:Create Hive Database ConnectionBAD:
Create Hive Database Connection
Include Page
BAD:Create RDBMS ConnectionBAD:
Create RDBMS Connection

Create a Transformation to Extract Data from Hive

...