Index
- #Introduction
- #General changes
- #Step changes
- #Job entry changes
- #Repository
- #Databases
- #Community and codebase
Introduction
PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.
Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.
General changes
- Visual changes
- Mouse-over
- More intuitive menus
- New welcome screen
- Hop creation
- New logging architecture
- Reduced memory consumption
- Incremental log updates
- Maximum memory consumption for long running jobs/transformations
- Interval logging
- Log record time-outs
- Log record lineage
- Log record color coding
- New plugin architecture
- Unified plugin architecture
- Easier deployment and packaging
- Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable
- New repository plugin architecture
- Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
- ...
Step changes
New steps
- SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
- Data Grid : Allows you to enter static rows of data for reference or testing purposes
- OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
- Salesforce Delete, Insert, Update, Upsert
- Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
- User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
- Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
- Java Filter : Filter based on a User Defined Java Expression
- Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
- Farrage streaming bulk loader
- Teradata Fastload Bulk loader
- Experimental steps added: Get table names, Email messages input, ...
Updated steps
- TODO
Job entry changes
New job entries
- Send information using Syslog
- Check DB connections
Updated job entries
- TODO
Databases
- New plugin architecture
- ...
- TODO
Repository
- New repository plugin architecture
- New Pentaho Unified Enterprise Repository type
- New File repository type
- New repository explorer
- ...
Internationalization
TODO:
Community and codebase
Codebase
Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:
find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'
This is what that gave us over the last releases:
Version |
Lines of code |
Increase |
---|---|---|
2.1.4 |
160,000 |
|
2.2.2 |
177,450 |
17,450 |
2.3.0 |
213,489 |
36,039 |
2.4.0 |
256,030 |
42,541 |
2.5.0 |
292,241 |
36,211 |
3.0.0 |
348,575 |
56,334 |
3.1.0 |
456,772 |
108,197 |
3.2.0 |
529,277 |
72,505 |
4.0.0 |
607,180 |
77,903 |
Libraries
The total library portfolio of Pentaho Data Integration consists of these libs:
Filename |
Description |
Dependency |
---|---|---|
kettle-core.jar |
A small set of core classes and utilities for the Kettle environments |
none |
kettle-db.jar |
Contains database related classes |
kettle-core |
kettle-engine.jar |
The transformation and job runtime engines |
kettle-core, kettle-db |
kettle-ui-swtjar |
The UI classes, Spoon, dialogs, etc |
kettle-core, kettle-db, kettle-engine |
Matt Casters - Okegem/Belgium - March 29th 2010