What's new in PDI version 3.2

What's new in PDI version 3.2

Index

Introduction


When you compare this release to the previous one, you will see that the changes are more evolutionary, rather than revolutionary.  Even so, there have been a large amount of changes for a minor version increase.  The main focus of the release is once again stability and usability.  We went through a large number of pet-peeves, common mis-understandings and simply solved them, either through new features or by modifying existing ones.  On top of that we worked on the clustering side to make that mode "cloud-ready" with dynamic clustering, making it more solid, adding features too.

Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.

General changes

  • Visual changes

    • Hop color scheme with mini-icons, tooltips (note: tooltips not available on OSX currently

      *** After running with errors : show error icons (note: error details tooltip not available on OSX currently)

      • Visual feedback : reading from info steps

      • Visual feedback : writing to target steps that run in multiple copies

      • New step categories

      • Step filter in step tree tool bar

    • Long standing bugs attack

    • Long standing wish list attack

    • Resource Exporter to export transformations or complete jobs including their used resources (sub-transformations and sub-jobs) to a single ZIP file.

    • Named Parameters

      • Jobs and transformations can now define parameters with default values that will be available at runtime as variables.  This makes it easy to have dynamic configuration of a job/transformation from the command line (e.g. specifying a date range to process with the default being yesterday)

    • Dynamic clustering

      • Instead of having to configure all of the slaves that a transformation will be executed on in clustered mode, you can run Carte slaves in dynamic mode, configuring them to register with a master (or multiple masters) when they start up.  The clustered transformation is configured with a list of the masters it can run on.  When the transformation is executed, it will go down the list of masters, attempting to submit the job to each one until it is accepted.  That master will then execute the transformation using all of the currently available slaves that are registered to it.

Step changes

New steps

  • Analytic Query : get information from previous/first rows

  • User Defined Java Expression : evaluate Java expressions, in-line compiled for maximum performance

  • Formula step : promoted from a plug-in to a native step

  • Synchronize after merge : performs updates, inserts or deletes in a database depending on a flag in the incoming data

  • SalesForce Input : reads information from SalesForce (promoted from a plug-in to a native step)

  • Replace in string : replace values in strings

  • Strings cut : cut strings down to size

  • If field value is null : ... then set default values per type or per field

  • Mail : send e-mails all over the globe

  • Process files: Copy, move or delete files

  • Identify last row in a stream : sets a flag when the last row in a stream was reached

  • Credit card validator : validates a credit card number, extracts information

  • Mail validator : valides an e-mail address

  • Reservoir sampling : promoted from a plug-in to a native step

  • Univariate statistics : promoted from a plug-in to a native step + upgrade

  • LucidDB Bulk loader : high performance bulk loader for the LucidDB column database

  • Unique Rows by Hashset : Allows de-duping a stream without having to sort it first. Requires enough memory to be able to store each set of unique keys.

Updated steps

  • Table Output: ability to specify the fields to insert

  • Calculator : all sorts of new calculations, string manipulations, etc.

  • Java script values : ability to replace values + improved script testing

  • Database lookup : pre-load cache option (load all values in memory)

  • Dimension Lookup/Update:

    • Cache pre-load (load all dimension entries in memory)

    • Support for alternative start of date range scenarios

    • Support for timestamp columns (last update/insert/both)

    • Support for current version column

  • Combination Lookup/Update : support for last update timestamp column

  • Data validator:

    • New option to report all errors, not only the first

    • Ability to read data from another step

  • Group By: added support for cumulative sum and average

  • Text File Input: New option to pass through additional fields from previous step (removing the need to do a Cartesian join)

  • Mapping:

    • Inherit all variables from parent transformation

    • Allow setting of variables in mapping

    • Allow preview of mapping output

    • Improved logging

    • New "Open mapping" option in transformation graph right click