What's new in PDI 4.0
Index
- #Introduction
- #General changes
- #Step changes
- #Job entry changes
- #Repository
- #Databases
- #Community and codebase
Introduction
PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.
Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.
General changes
Visual changes
- Mouse-over
- More intuitive menus
- New welcome screen
- Hop creation
- Improved error handling configuration
- New perspectives support for Agile BI visualisations, modelling, scheduling, etc.
Running jobs in Spoon
- Drill down into running job entries
- Visual indicators of running and completed job entries with success and failure mini-icons
- Mouse over completion mini-icons shows details of execution results
- Log capturing of completed job entries
Running transformations in Spoon
- Drill down into running transformation job entries and mappings
- Row input/output sniff testing: see what rows are passing
- Remote input/output sniff testing on a Carte server
New logging architecture
- Reduced memory consumption
- Incremental log updates
- Global log buffer size limit for long running jobs/transformations
- Interval logging
- Auto clean-up of old log records
- Log record time-outs
- Log record lineage
- Log record colour coding in Spoon (blue and red for error lines)
- Step Logging
- Job entry logging
- Execution lineage logging
- Renaming individual columns
- Global configuration options for all log tables
New plug-in architecture
- Unified plug-in architecture
- Easier deployment and packaging
- Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable
New repository plug-in architecture
- Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
- Removed dependencies to relational database repository (still supported though)
- Added support for repositories capable of team-development (file locking)
- Added support for repositories capable of fine-grained security repositories
- Added support for repositories capable of storing and retrieving revision history
Step changes
New steps
- SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
- Data Grid : Allows you to enter static rows of data for reference or testing purposes
- OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
- Salesforce Delete, Insert, Update, Upsert
- Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
- User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
- Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
- Java Filter : Filter based on a User Defined Java Expression
- Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
- LucidDB streaming bulk loader
- Teradata Fastload Bulk loader
- Experimental steps added: Get table names, Email messages input, ...
Updated steps
- TODO
Job entry changes
New job entries
- Send information using Syslog
- Check DB connections
Updated job entries
- TODO
Databases
- New plugin architecture
- ...
- TODO
Repository
- New repository plug-in architecture
- New Pentaho Unified Enterprise Repository type
- New File repository type
- New repository explorer
- ...
Internationalization
TODO:
Community and codebase
Codebase
Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:
find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'
This is what that gave us over the last releases:
Version |
Lines of code |
Increase |
% inc. |
---|---|---|---|
2.1.4 |
160,000 |
 |
 |
2.2.2 |
177,450 |
 17,450 |
10.9% |
2.3.0 |
213,489 |
 36,039 |
20.3% |
2.4.0 |
256,030 |
 42,541 |
19.9% |
2.5.0 |
292,241 |
 36,211 |
14.1% |
3.0.0 |
348,575 |
 56,334 |
19.3% |
3.1.0 |
456,772 |
108,197 |
31.0% |
3.2.0 |
529,277 |
 72,505 |
15.8% |
4.0.0 |
607,180 |
 77,903 |
14.7% |
Libraries
The total library portfolio of Pentaho Data Integration consists of these libs:
Filename |
Description |
Dependency |
---|---|---|
kettle-core.jar |
A small set of core classes and utilities for the Kettle environments |
none |
kettle-db.jar |
Contains database related classes |
kettle-core |
kettle-engine.jar |
The transformation and job runtime engines |
kettle-core, kettle-db |
kettle-ui-swtjar |
The UI classes, Spoon, dialogs, etc |
kettle-core, kettle-db, kettle-engine |
Matt Casters - Okegem/Belgium - March 29th 2010