Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

Panel
title:Warning
title:Warning
borderColorblack
bgColor#ffff00
borderStylesolid

(warning) PLEASE NOTE: This documentation applies to Pentaho 8.1 and earlier. For Pentaho 8.2 and later, see User Defined Java Class on the Pentaho Enterprise Edition documentation site.

Description

This step allows you to enter User Defined Java Class to drive the functionality of a complete step.  In essence, this step allows you to program your own plugin in a step.

...

If you need others you need to include them yourself at the very top of your code, for example:

Code Block

import java.util.*;

Another thing to note is that Janino, essentially a Java byte-code generator only supports a sub-set of the Java 1.5 specification.  To see a complete list of the features and limitations, please go to the Janino homepage.  At the time of writing the most apparent limitation is the absence of generics.

...

Most of the time, working with input and output fields is the most important thing you'll be doing in your UDJC code.  As such, there are a number of ways to handle the manipulation of fields.  To start with let's look at the description of the input row:

Code Block

RowMetaInterface inputRowMeta = getInputRowMeta();

The "inputRowMeta" object contains the metadata of the input row.  This includes all the fields, their data types, lengths, names, format masks and much more.  You can use this to look up input fields and much more.  For example, if you want to look for a field called named "customer" you use the following code:

Code Block

ValueMetaInterface customer = inputRowMeta.searchValueMeta("year");

Because looking up field names is slow if you need to do it for every row that passes through a transformation, we advice you to look up field names in advance in a first block like this (in the processRow() method):

Code Block

if (first) {
 yearIndex = getInputRowMeta().indexOfValue(getParameter("YEAR"));
 if (yearIndex<0) {
   throw new KettleException("Year field not found in the input row, check parameter 'YEAR'\!");
 }
}

To get your hands on the Integer value contained in field "year" you can then use the following construct:

Code Block

Object[] r = getRow();
...
Long year = inputRowMeta().getInteger(r, yearIndex);

To make this process easier you can use a shortcut in this form:

Code Block

Long year = get(Fields.In, "year").getInteger(r); 

...

IMPORTANT: The Java data types that you get from previous steps always corresponds to the Kettle data type as described on the PDI Rows Of Data page.

Output fields

You can define all the new fields you want in the output of the step in the "Fields" section of the steps dialog:
Doing this will automatically calculate the layout of the output row metadata and store it in "data.outputRowMeta".  That in turn allows you to create the output row.  In case the step writes as many (or less) rows as it reads, you can simply resize the row you get on input:

Code Block

Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());

or more memorable:

Code Block

Object[] outputRowData = createOutputRow(r, data.outputRowMeta.size());

If rows are being copied make sure to create separate copies to prevent subsequent steps from modifying the same Object[] copy many times at once:

Code Block

Object[] outputRowData = RowDataUtil.createResizedCopy(r, data.outputRowMeta.size());

...

Using the index you can set a value like this:

Code Block

outputRowData[getInputRowMeta().size()] = easterDate(year.intValue());

or like this with the shortcut:

Code Block

get(Fields.Out, "easter").setValue(r, easterDate(year.intValue());

...

In this example, taken from your Kettle distribution file "samples/transformations/User Defined Java Class - Calculate the date of Easter.ktr", we have a parameter called YEAR that is referenced with the getParameter() method, for example:

Code Block

getParameter("YEAR")

At runtime this will return the "year" String value.

Processing rows

The processRow() method is the heart of the step.  This method is called by the transformation in a tight loop and will continue until false is returned.  A very simple example that calculates firstname+" "+lastname and stores it into a "name" field is this:

Code Block

String firstnameField;
String lastnameField;
String nameField;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // Let's look up parameters only once for performance reason.
    //
    if (first) {
      firstnameField = getParameter("FIRSTNAME_FIELD");
      lastnameField = getParameter("LASTNAME_FIELD");
      nameField = getParameter("NAME_FIELD");
      first=false;
    }

    // First, get a row from the default input hop
    //
    Object[] r = getRow();

    // If the row object is null, we are done processing.
    //
    if (r == null) {
      setOutputDone();
      return false;
    }

    // It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
    // enough to handle any new fields you are creating in this step.
    //
    Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

    String firstname = get(Fields.In, firstnameField).getString(r);
    String lastname = get(Fields.In, lastnameField).getString(r);

    // Set the value in the output field
    //
    String name = firstname+" "+lastname;
    get(Fields.Out, nameField).setValue(outputRow, name);

    // putRow will send the row on to the default output hop.
    //
    putRow(data.outputRowMeta, outputRow);

    return true;
}

...

As GetRow() method returns first row from any input stream( either input stream or info stream), and the only possible and reasonable use of Info steps - is that input rowMeta and info rowMeta varies.
So the adopted approach is to read/get all data from info stream before calling getRow() method. (See example or issues: PDI-8738 and PDI-8740)

Code Block
        if (first){
        first = false;

        /* TODO: Your code here. (Using info fields)

        FieldHelper infoField = get(Fields.Info, "info_field_name");

        RowSet infoStream = findInfoRowSet("info_stream_tag");

        Object[] infoRow = null;

        int infoRowCount = 0;

        // Read all rows from info step before calling getRow() method, which returns first row from any
        // input rowset. As rowMeta for info and input steps varies getRow() can lead to errors.
        while((infoRow = getRowFrom(infoStream)) != null){

          // do something with info data
          infoRowCount++;
        }
        */
       }

       Object[] r = getRow();

       if (r == null) {
              setOutputDone();
              return false;
       }

...

When getting parameters that point to transformation parameters, the UDJC behaves differently depending on when the getVariable function is called: if in the init() method, everything works fine; if on initialization of a class member variable, the variable gets not resolved by design. (see PDI-8963)

Code Block

private final String par = getVariable("somePar"); // DOES NOT resolve correctly
private String par2 = null;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
   logBasic("Parameter value="+par+"\[MEMBER INIT\]");
   logBasic("Parameter value="+par2+"\[INIT FUNCTION\]");
   setOutputDone();
   return false;
}

public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface) {
   par2 = getVariable("somePar"); // WORKS FINE
   return parent.initImpl(stepMetaInterface, stepDataInterface);
}

...

It is necessary to implement logging yourself. This is because you may wish to log read, written, output, updated etc.  Other steps log like so:

Code Block

    putRow( data.outputMeta, r );

    if ( checkFeedback( getLinesOutput() ) ) {
      if ( log.isBasic() ) {
        logBasic( "Have I got rows for you! " + getLinesOutput() );
      }
    }

...