Content Comparison

...

Wiki Markup
{scrollbar}

Excerpt
How to use Pentaho to ingest a Mainframe file into HDFS, then use MapReduce to process into delimited records.

The steps in this guide include:

Installing a plugin from the Marketplace
Ingesting a Mainframe file into HDFS
Developing a PDI Transformation as a Mapper Task
Developing a PDI Job to Orchestrate all of the components
Executing and Reviewing Output

Note
This how-to guide was built using Pentaho Data Integration 5.0.4

Prerequisites

In order to follow along with this how-to guide you will need the following:

Hadoop
Pentaho Data Integration
Pentaho configured properly for your Distribution Configuring Pentaho for your Hadoop Distro and Version (Pentaho Suite Version 5.1)
Complete the Using Pentaho MapReduce to Parse Weblog Data example Using Pentaho MapReduce to Parse Weblog Data
LegStar z/OS File reader Plugin installed through the Marketplace (see below)

...

3. Edit the z/OS File Input step. For z/OS filename you will select your Mainframe file. If using the sample data, this is under mf-files/ZOS.FCUSTDAT.bin. This file contains messages of a variable length, so check that box as well. This uses the IBM01140 character set. You will need to know what codepage your EBCDIC file uses: {+}http://en.wikipedia.org/wiki/EBCDIC+

4. Configure the COBOL copybook. Go to the COBOL tab to place your copybook. The copybook is used to "translate" the Mainframe file into fields. If you are using the sample files, you can select Import COBOL and browse to copybooks/CUSTDATCC. Note that in the file browser you must change the drop-down to All files so this will show up in the list. You should now see your copybook has been loaded.

...

13. Create Value. Double click to edit the Concat Fields step. For the step name, change to "Create value", make the Target Field Name "newvalue", and make the Seperator a pipe "|". Then click the Get Fields button and you will see all fields in your stream are added. Remove key and newkey, and you should be left with 9 fields total.
To remove extra spaces that are from the original Mainframe file, make the following adjustments:

CustomerId: Set Format to #.#
CustomerName: Set Trim Type to "both"
CustomerAddress: Set Trim Type to "both"
TransactionNbr: Set Format to #.#
Tx Amount: Set Format to #.#

14. Add MapReduce Output. Now that we have our new key and new value, we're ready for these to be returned by our mapper. Add the MapReduce Output step, and create a hop from Create value to MapReduce Output.

...

Version	Old Version 12	New Version Current
Changes made by	Former user	Former user
Saved on	May 14, 2014	Feb 12, 2015

Versions Compared

Key

Prerequisites