Report groups
Reports used in the real-world-business are more than simple list of rows and columns. To gain information from the presented data, the report data is sorted and condensed into separate sections, which corespond to the logical structure of the report. Therefore report engines introduced the concept of groups to support the structuring of report data.
With groups, a set of rows, the report data set, is subdivided into an ordered collection of smaller subsets of rows. All rows of a group instance share a common attribute. In the domain of relational databases, the attribute is defined by the name of one or more columns. Within the group instance, all those columns have the same value. That set of attributes, which identifies a group, is called the group key.
Groups in the reporting usually can be mapped to 'is-part-of' relations in the real world. Employees could be grouped by the department, where they work in, or by the first letter of their last name.
The Control Break Algorithm
Reporting can be a very resource intenstive process. Data has to be queried from databases and other sources, the data must be processed, sorted and intermediate results must be computed and finally the report output must be generated. Even with today's powerfull computers, performance is a key to customer satisfaction. To fullfill these performance requirements, it is unacceptable to waste time building a global view on the report data, unless absolutly necessary.
For performance reasons, group memberships are computed at runtime with an algorithm called the 'control-break algorithm'. Understanding this algorithm is the key to successful reporting, as this algorithm heavily influences how the report engine behaves and how groups are built up. The control-break-algorithm does not build a global view over all group instances. To detect the end of one and the start of a new group, it compares the current row with the previously read row of data. If the data of one of the group's key columns is different, the current group must be finished. If the report processing has not reached the last row, a new group instance is opened immediatly. This new group will remain active until either the end of the report data has been reached or the group key values changed.
Some Definitions
A group is a contigiuous set of rows, where all rows of the group share the same group key.
A group instance is identified by a group key. The key consists of one or more attributes, where all entites in that group have the same values stored in the specified attributes. When using databases, entity sets are usually represented by tables and attributes are represented by the table's columns.
A sub group is a group, which has a key that contains all attributes of its parent and has at least one more attribute. Therefore, a sub group's rowset is a subset of it's parent's rowset. All rows of the sub-group are fully contained in the parent's rowset.
A group with a group key without attributes defined is called the default group. This group has a single instance which always spans the complete report data set.
The single level Control-Break-Algorithm as pseudo code
OPEN file. READ record. PRINT ReportHeader. WHILE NOT END-OF-FILE DO VAR groupKey := GET-CURRENT-GROUP-KEY. PRINT GroupHeader. WHILE (NOT END-OF-FILE AND groupKey IS-EQUAL-TO GET-CURRENT-GROUP-KEY) DO PRINT ItemBand. READ DONE. PRINT GroupFooter. DONE PRINT ReportFooter. CLOSE FILE.
and in Java (complete Example is in the CVS):
public static void main (final String[] args) { final Object[][] data = initData(); // cursor at the first row ... // in this example modifying the 'position' is equal to a 'read' operation int position = 0; printReportHeader(data, position); while (isEndOfFile (data, position) == false) { // initialize the group key with the data from the first row // remember the current group key // if the group key changes, we have to do a control break ... final Object groupKey = datapositionCONTINENT_COL; printGroupHeader(data, position); while ((isEndOfFile (data, position) == false) && (datapositionCONTINENT_COL.equals(groupKey))) { // store the last value of the group key (so that we can detect changes) // print the items ... printItems (data, position); // now 'read' the next row of data ... position += 1; } printGroupFooter(data, position - 1); } printReportFooter(data, position - 1); }
From this algorithm, we can derive some basic principles about groups as they are used in Pentaho Reporting.
- Data used in group keys must be sorted according to the group hierarchy definition.
As only neighbouring rows are compared against each other, this algorithm will consider rows to be part of the currrent group if and only if all rows of a particular group key instance are kept together as direct neigbours in the report dataset. - A group with no attributes defined will have a single instance, which spans the complete report data set.
- As long as there is data available, a new group instance will be opened immediatly after the previous instance has been closed down. It is not possible to print the item band without having an open group.
- The order of the attribute specification within a group definition is not important. For the algorithm it is important, whether at least one attribute's value has changed, it is not important, which one changed.
By considering the group data to be a new data set, we can stack multiple control-break runs into each other. These multi-level control breaks add some new behavioural constraints to the engine.
- A sub group can only be opened, after its parent group has been opened.
Subgroup processing starts as soon as the group header of the parent group has been printed. Immediatly after the processing finishes, the parent group's footer gets printed. The processing flow cannot reach the subgroup without passing through the parent group. - A sub group must cease control as soon as one the parent's group attributes changes.
A control break in one of the parent groups will close all subgroups. As soon as the parent group generated a new group instance, new instances of the sub groups will be opened as well. (In Pentaho Reporting, sub groups must contain all the group attributes of it's parent.) - Adding an attribute, which has a constant value, to the group definition will not alter the number or order of the generated group instances.
This allows you to insert artificial group levels to the report by referencing static or non-existing fields (which always evaluate to 'null') in addition to the real group fields. That little trick can be used to print more than one group header or footer. - Adding an attribute, which has a constant value, to the group definition will not alter the number or order of the generated group instances.
- A group can only have one directly attached sub group. Building trees of groups is impossible with that algorithm.
Working with groups: Examples
The effects of the control break algorithm are best unterstood when looking at an example. Lets take the following table as datasource.
Table Basics of reporting.1. Initial data set
Location |
Department |
Employee |
Salary |
Denver |
Sales |
John Doe |
100.000 |
Denver |
Sales |
Jane Doe |
100.000 |
Denver |
Marketing |
Arthur Dent |
125.000 |
New York |
Marketing |
Adam Johnson |
125.000 |
New York |
Marketing |
Eve Lynn |
145.000 |
New York |
Management |
J.D. Salinger |
500.000 |
When grouping the table by the 'department' column, we'll get the following three group instances.
Table Basics of reporting.2. Data set grouped by 'Department'
Location |
Department |
Employee |
Salary |
Notes |
|
group start for 'department group' |
|||
Denver |
Sales |
John Doe |
100.000 |
|
Denver |
Sales |
Jane Doe |
100.000 |
|
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
Denver |
Marketing |
Arthur Dent |
125.000 |
|
New York |
Marketing |
Adam Johnson |
125.000 |
|
New York |
Marketing |
Eve Lynn |
145.000 |
|
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
New York |
Management |
J.D. Salinger |
500.000 |
|
|
group end for 'department group' |
Within each group instance, the value of the 'department' column is the same for all rows of that group. We get a group instance for each department type. Note, that the department data is sorted.
Now, lets use the 'location' column as group key. We'll receive two group instances now, 'Denver' and 'New York'.
Table Basics of reporting.3. Data set grouped by 'Location'
Location |
Department |
Employee |
Salary |
Notes |
|
group start for 'location group' |
|||
Denver |
Sales |
John Doe |
100.000 |
|
Denver |
Sales |
Jane Doe |
100.000 |
|
Denver |
Marketing |
Arthur Dent |
125.000 |
|
|
group end for 'location group' |
|||
|
group start for 'location group' |
|||
New York |
Marketing |
Adam Johnson |
125.000 |
|
New York |
Marketing |
Eve Lynn |
145.000 |
|
New York |
Management |
J.D. Salinger |
500.000 |
|
|
group end for 'location group' |
Of course, we can combine groups to create multi-level reports. The control break algorithm allows only one group definition per level. That means, we cannot have a report that has top level groupings for 'location' and 'department' at the same time. But we are able to subdivide groups in an ordered way.
For example, we can first group by the 'location' and in a second step group all locations by it's department. The first level grouping will produce the following layout:
Table Basics of reporting.4. Data set grouped by 'Location' and subgrouped by 'department'
Location |
Department |
Employee |
Salary |
Notes |
|
group start for 'location group' |
|||
|
group start for 'department group' |
|||
Denver |
Sales |
John Doe |
100.000 |
|
Denver |
Sales |
Jane Doe |
100.000 |
|
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
Denver |
Marketing |
Arthur Dent |
125.000 |
|
|
group end for 'department group' |
|||
|
group end for 'location group' |
|||
|
group start for 'location group' |
|||
|
group start for 'department group' |
|||
New York |
Marketing |
Adam Johnson |
125.000 |
|
New York |
Marketing |
Eve Lynn |
145.000 |
|
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
New York |
Management |
J.D. Salinger |
500.000 |
|
|
group end for 'department group' |
|||
|
group end for 'location group' |
The order of the groups is important. There's a difference, whether one first groups by the location and then by the department column or first by department and then location. As you can see, a subgroup is always part of the parent group. When a subgroup's headers are printed, the parent group's headers are already fully processed. For footers, we can see, that the footer of the subgroup is always printed before the parent group's footer gets on the paper.
Now we switch the group order, the department group is the top level group, followed by the 'location' subgroup.
Table Basics of reporting.5. Data set grouped by 'department' and subgrouped by 'location'
Location |
Department |
Employee |
Salary |
Notes |
|
group start for 'department group' |
|||
|
group start for 'location group' |
|||
Denver |
Sales |
John Doe |
100.000 |
|
Denver |
Sales |
Jane Doe |
100.000 |
|
|
group end for 'location group' |
|||
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
|
group start for 'location group' |
|||
Denver |
Marketing |
Arthur Dent |
125.000 |
|
|
group end for 'location group' |
|||
|
group start for 'location group' |
|||
New York |
Marketing |
Adam Johnson |
125.000 |
|
New York |
Marketing |
Eve Lynn |
145.000 |
|
|
group end for 'location group' |
|||
|
group end for 'department group' |
|||
|
group start for 'department group' |
|||
|
group start for 'location group' |
|||
New York |
Management |
J.D. Salinger |
500.000 |
|
|
group end for 'location group' |
|||
|
group end for 'department group' |
As we can see, whenever the department changes, the location group was closed down and reopened once the department group generated a new instance.