What's new in PDI 4.0

Index

Introduction

PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.

Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.

General changes

Visual changes
- Mouse-over
- More intuitive menus
- New welcome screen
- Hop creation
New logging architecture
- Reduced memory consumption
- Incremental log updates
- Maximum memory consumption for long running jobs/transformations
- Interval logging
- Log record time-outs
- Log record lineage
- Log record color coding
New plugin architecture
- Unified plugin architecture
- Easier deployment and packaging
- Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable
New repository plugin architecture
- Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
...

Step changes

New steps

SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
Data Grid : Allows you to enter static rows of data for reference or testing purposes
OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
Salesforce Delete, Insert, Update, Upsert
Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
Java Filter : Filter based on a User Defined Java Expression
Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
Farrage streaming bulk loader
Teradata Fastload Bulk loader
Experimental steps added: Get table names, Email messages input, ...

Updated steps

TODO

Job entry changes

New job entries

Send information using Syslog
Check DB connections

Updated job entries

TODO

Databases

New plugin architecture
...
TODO

Repository

New repository plugin architecture
New Pentaho Unified Enterprise Repository type
New File repository type
New repository explorer
...

Internationalization

TODO:

Community and codebase

Codebase

Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:

find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'

This is what that gave us over the last releases:

Version	Lines of code	Increase
2.1.4	160,000
2.2.2	177,450	17,450
2.3.0	213,489	36,039
2.4.0	256,030	42,541
2.5.0	292,241	36,211
3.0.0	348,575	56,334
3.1.0	456,772	108,197
3.2.0	529,277	72,505
*4.0.0*	*607,180*	*77,903*

Libraries

The total library portfolio of Pentaho Data Integration consists of these libs:

Filename	Description	Dependency
kettle-core.jar	A small set of core classes and utilities for the Kettle environments	none
kettle-db.jar	Contains database related classes	kettle-core
kettle-engine.jar	The transformation and job runtime engines	kettle-core, kettle-db
kettle-ui-swtjar	The UI classes, Spoon, dialogs, etc	kettle-core, kettle-db, kettle-engine

Matt Casters - Okegem/Belgium - March 29th 2010