Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Start PDI on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.
    Tip
    titleSpeed Tip

    You can download the Kettle Transform aggregate_mapper.ktr already completed

  2. Add a Map/Reduce Input Step: You are going to read data into the transformation from MapReduce, so expand the 'Big Data' section of the Design palette and drag a 'Map/Reduce Input' node onto the transformation canvas. Your transformation should look like:


  3. Edit the Map/Reduce Input Step: Double-click on the 'Map/Reduce Input' node to edit its properties. Enter this information:
    1. Key Field Type: Enter String
    2. Value Field Type: Enter String
      When you are done your 'Map/Reduce Input' window should look like this:

      Click 'OK' to close the window.

  4. Add a Split Fields Step: You need to split the incoming records on tab to get the individual fields in the record, so expand the 'Transform' section of the Design palette and drag a 'Split Fields' node onto the transformation canvas. Your transformation should look like:


  5. Connect the Input and Split Fields Steps: Hover the mouse over the 'Map/Reduce Input' node and a tooltip will appear.
    Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Split Fields' node. Your canvas should look like this:


  6. Edit the Split Fields Step: Double-click on the 'Split Fields' node to edit its properties. Enter this information:
    1. Field to split: Select 'value'
    2. Delimiter: Will be a tab character. In a text editing application press tab to create a tab. Copy and paste this into the 'Delimiter' parameter.Enter '$[09]'  09 is the hexadecimal representation of the ASCII tab character.
    3. Fields: The field list will be the following of with 'Type' set to 'String'
      1. client_ip
      2. full_request_date
      3. day
      4. month
      5. month_num
      6. year
      7. hour
      8. minute
      9. second
      10. timezone
      11. http_verb
      12. uri
      13. http_status_code
      14. bytes_returned
      15. referrer
      16. user_agent
        When you are done your 'Map/Reduce Input' window should look like this:

        Click 'OK' to close the window.

  7. Add a User Defined Java Expression Step: You need to concatenate the client_ip, year, and month together to create the key field, so expand the 'Scripting' section of the Design palette and drag a 'User Defined Java Expression' node onto the transformation canvas. Your transformation should look like:


  8. Connect the Split Fields and User Defined Java Expression Steps: Hover the mouse over the 'Split Fields' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'User Defined Java Expression' node. Your canvas should look like this:


  9. Edit the User Defined Java Expression Step: Double-click on the 'User Defined Java Expression' node to edit its properties. Do the following:
    1. Create a new field 'new_key' with Type 'String' and the following Java expression:
      Code Block
      client_ip + '	\t' + year + '	\t' + month_num
      Note the characters between the '' are tabs. You will have to copy and paste tab characters into the Java Expression.
    2. Create a new field 'new_value' with Type 'Integer' and the Java expression '1'.
      When you are done your window should look like:

      Click 'OK' to close the window.

  10. Add a Map/Reduce Output Step: You need to write the new key and new value to the output, so expand the 'Big Data' section of the Design palette and drag a 'Map/Reduce Output' node onto the transformation canvas. Your transformation should look like:


  11. Connect the Java Expression and Output Steps: Hover the mouse over the 'User Defined Java Expression' node and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Map/Reduce Output' node. Your canvas should look like this:


  12. Edit the Output Step: Double-click on the 'Map/Reduce Output' node to edit its properties. Enter the following information:
    1. Key field: Select 'new_key'
    2. Value field: Select 'new_value'
      When you are done your window should look like:

      Click 'OK' to close the window.

  13. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'aggregate_mapper.ktr' into a folder of your choice.

...