Auditing and Operational Metadata
September 1, 2006
Submitted by Matt Castors, Chief of Data Integration, Pentaho
See also Setting up Logging for PDI Transformations and Jobs in the Knowledge Base.
This Kettle tip was requested by one of the Kettle users and is about auditing.
The need for auditing and operational meta-data usually comes up after a number of transformations and jobs have been written and the whole collection has been running for a while.
Usually, simple questions come first and more complex ones come later, for example:
Did all my transformations run without error last night?
How many rows have been processed by a certain transformation last Friday?
What is the date range since the last time the transformation ran successfully?
Fortunately, these kinds of questions can be answered by setting up a logging table in your transformation or job.
Transformation audit trail
Setting up a logging table in your transformation is easy in the Transformation dialog. (CTRL-T or menu Transformation/Settings... )
Make sure you enter a name, even if you are using XML to store the transformation and then go to the "logging" tab:
The meaning of the "Use batch-ID?" check-box is that every run of the transformation will get a unique ID in the logging table. This is required if you want to use the history view.(see below)
The use "logfield to store logging in?" check-box means that the transformation stores the output of the logging also in a "large" text field in the logging table. Usually this is a CLOB/LONGTEXT type of field. This feature is very useful too in combination with the history view. (see below)
If you want to create the logging table, select the "SQL" button:
When you have created this logging table, a record will be inserted/updated (depending on the presence of the batch-id) into this table BEFORE and AFTER each run of the transformation.
In our case, after a couple of runs, the table looks like this:
A couple of remarks:
- From the status column we learn that if we stop a transformation, the status becomes "stop", otherwise we will find "end".
- If there is an error, we will see that the column "errors" contains a number larger than 0, usually just one. (we stop at the first error)
- The "startdate"/"enddate" columns contains the date-range "since the last time that the transformation ran without error".
-->NOTE: If you want to get a hold of this date range, use a "Get System Info" step and select options "Start of date range" and "End of date range". - The logdate is the date that the last record was inserted into the transformation and as such the time the transformation stopped running. (the ending date)
- The replay date is the date you can use to "replay" this transformation and is effectively the time that the transformation was executed. (execution date) Of-course, if you want, you can build reports on this, but we also have introduced a convenient "History view" in Spoon and Chef that will allow you to examine these log records with greater ease.
This history view is updated/refreshed when you load or execute a transformation:
If you click on a logging entry in the history view table, the corresponding logging text will be shown below.
Job audit trail
Hey, Matt, what about Jobs? Well, I'm please to announce that we have implemented this history view as well for jobs. This allows you to check on the successful completion of your job as well.
If you enter the appropriate details in the job settings dialog (CTRL-J or menu Job/Settings...) you will find the following recognizable options:
The audit functionality of the job log is the same, but make sure to set up 2 different tables for transformations and logs as the content is comparable but not 100% the same.
Let us know what you think of these new features and how we can improve on them in the future.
Until next time, enjoy Kettle, Matt