Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Index

Introduction


PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.

Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.

General changes

  • Visual changes
    • Mouse-over
    • More intuitive menus
    • New welcome screen
    • Hop creation
  • New logging architecture
    • Reduced memory consumption
    • Incremental log updates
    • Maximum memory consumption for long running jobs/transformations
    • Interval logging
    • Log record time-outs
    • Log record lineage
    • Log record color coding
  • New plugin architecture
    • Unified plugin architecture
    • Easier deployment and packaging
    • Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable
  • New repository plugin architecture
    • Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
  • ...

Step changes

New steps

  • SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
  • Data Grid : Allows you to enter static rows of data for reference or testing purposes
  • OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
  • Salesforce Delete, Insert, Update, Upsert
  • Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
  • User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
  • Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
  • Java Filter : Filter based on a User Defined Java Expression
  • Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
  • Farrage streaming bulk loader
  • Teradata Fastload Bulk loader
  • Experimental steps added: Get table names, Email messages input, ...

Updated steps

  • TODO

Job entry changes

New job entries

  • Send information using Syslog
  • Check DB connections

Updated job entries

  • TODO

Databases

  • New plugin architecture
  • ...
  • TODO

Repository

  • New repository plugin architecture
  • New Pentaho Unified Enterprise Repository type
  • New File repository type
  • New repository explorer
  • ...

Internationalization

TODO:

Community and codebase

Codebase

Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:

find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'

This is what that gave us over the last releases:
 

Version

Lines of code

Increase

2.1.4

160,000

 

2.2.2

177,450

  17,450

2.3.0

213,489

  36,039

2.4.0

256,030

  42,541

2.5.0

292,241

  36,211

3.0.0

348,575

  56,334

3.1.0

456,772

108,197

3.2.0

529,277

72,505

4.0.0

607,180

  77,903


Libraries

The total library portfolio of Pentaho Data Integration consists of these libs:

Filename

Description

Dependency

kettle-core.jar

A small set of core classes and utilities for the Kettle environments

none

kettle-db.jar

Contains database related classes

kettle-core

kettle-engine.jar

The transformation and job runtime engines

kettle-core, kettle-db

kettle-ui-swtjar

The UI classes, Spoon, dialogs, etc

kettle-core, kettle-db, kettle-engine



Matt Casters - Okegem/Belgium - March 29th 2010

  • No labels