What's new in PDI 4.0

Index

#Introduction
#General changes
#Step changes
#Job entry changes
#Repository
#Databases
#Community and codebase

Introduction

PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.

Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.

General changes

Visual changes

Mouse-over
More intuitive menus
New welcome screen
Hop creation
Improved error handling configuration
New perspectives support for Agile BI visualisations, modelling, scheduling, etc.

Running jobs in Spoon

Drill down into running job entries
Visual indicators of running and completed job entries with success and failure mini-icons
Mouse over completion mini-icons shows details of execution results
Log capturing of completed job entries

Running transformations in Spoon

Drill down into running transformation job entries and mappings
Row input/output sniff testing: see what rows are passing
Remote input/output sniff testing on a Carte server

New logging architecture

Reduced memory consumption
Incremental log updates
Global log buffer size limit for long running jobs/transformations
Interval logging
Auto clean-up of old log records
Log record time-outs
Log record lineage
Log record colour coding in Spoon (blue and red for error lines)
Step Logging
Job entry logging
Execution lineage logging
Renaming individual columns
Global configuration options for all log tables

New plug-in architecture

Unified plug-in architecture
Easier deployment and packaging
Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable

New repository plug-in architecture

Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
Removed dependencies to relational database repository (still supported though)
Added support for repositories capable of team-development (file locking)
Added support for repositories capable of fine-grained security repositories
Added support for repositories capable of storing and retrieving revision history

Step changes

New steps

SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
Data Grid : Allows you to enter static rows of data for reference or testing purposes
OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
Salesforce Delete, Insert, Update, Upsert
Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
Java Filter : Filter based on a User Defined Java Expression
Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
LucidDB streaming bulk loader
Teradata Fastload Bulk loader
Experimental steps added: Get table names, Email messages input, ...

Updated steps

TODO

Job entry changes

New job entries

Send information using Syslog
Check DB connections

Updated job entries

TODO

Databases

New plugin architecture
...
TODO

Repository

New repository plug-in architecture
New Pentaho Unified Enterprise Repository type
New File repository type
New repository explorer
...

Internationalization

TODO:

Community and codebase

Codebase

Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:

find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'

This is what that gave us over the last releases:

Version	Lines of code	Increase	% inc.
2.1.4	160,000
2.2.2	177,450	17,450	10.9%
2.3.0	213,489	36,039	20.3%
2.4.0	256,030	42,541	19.9%
2.5.0	292,241	36,211	14.1%
3.0.0	348,575	56,334	19.3%
3.1.0	456,772	108,197	31.0%
3.2.0	529,277	72,505	15.8%
*4.0.0*	*607,180*	*77,903*	14.7%

Libraries

The total library portfolio of Pentaho Data Integration consists of these libs:

Filename	Description	Dependency
kettle-core.jar	A small set of core classes and utilities for the Kettle environments	none
kettle-db.jar	Contains database related classes	kettle-core
kettle-engine.jar	The transformation and job runtime engines	kettle-core, kettle-db
kettle-ui-swtjar	The UI classes, Spoon, dialogs, etc	kettle-core, kettle-db, kettle-engine

Matt Casters - Okegem/Belgium - March 29th 2010