Reports used in the real-world-business are more than simple list of rows and columns. To gain information from the presented data, the report data is sorted and condensed into separate sections, which corespond correspond to the logical structure of the report. Therefore report engines introduced the concept of groups to support the structuring of report data.

Reporting can be a very resource intenstive intensive process. Data has to be queried from databases and other sources, the data must be processed, sorted and intermediate results must be computed and finally the report output must be generated. Even with today's powerfull computers, performance is a key to customer satisfaction. To fullfill fulfill these performance requirements, it is unacceptable to waste time building a global view on the report data, unless absolutly absolutely necessary.

For performance reasons, group memberships are computed at runtime with an algorithm called the 'control-break algorithm'. Understanding this algorithm is the key to successful reporting, as this algorithm heavily influences how the report engine behaves and how groups are built up. The control-break-algorithm does not build a global view over all group instances. To detect the end of one and the start of a new group, it compares the current row with the previously read row of data. If the data of one of the group's key columns is different, the current group must be finished. If the report processing has not reached the last row, a new group instance is opened immediatlyimmediately. This new group will remain active until either the end of the report data has been reached or the group key values changed.

Some Definitions

A group is a contigiuous contiguous set of rows, where all rows of the group share the same group key.

A group instance is identified by a group key. The key consists of one or more attributes, where all entites entities in that group have the same values stored in the specified attributes. When using databases, entity sets are usually represented by tables and attributes are represented by the table's columns.

...

Code Block

public static void main (final String[] args)

{

final Object[][] data = initData();


// cursor at the first row ...

// in this example modifying the 'position' is equal to a 'read' operation

int position = 0;


printReportHeader(data, position);

while (!isEndOfFile (data, position) == false)

{

  // initialize the group key with the data from the first row

  // remember the current group key

  // if the group key changes, we have to do a control break ...

  final Object groupKey = datapositionCONTINENT_COL;

  printGroupHeader(data, position);


  while ((!isEndOfFile (data, position) == false) &&

		 (datapositionCONTINENT_COL.equals(groupKey)))

  {
	// store the last value of the group key (so that we can detect changes)

	// print the items ...
	printItems (data, position);

	// now 'read' the next row of data ...
	position ++= 1;
  }  printGroupFooter(data, position - 1);

}


printReportFooter(data, position - 1);

}

...

Data used in group keys must be sorted according to the group hierarchy definition.
As only neighbouring rows are compared against each other, this algorithm will consider rows to be part of the currrent current group if and only if all rows of a particular group key instance are kept together as direct neigbours neighbours in the report dataset.
A group with no attributes defined will have a single instance, which spans the complete report data set.
As long as there is data available, a new group instance will be opened immediatly immediately after the previous instance has been closed down. It is not possible to print the item band without having an open group.
The order of the attribute specification within a group definition is not important. For the algorithm it is important, whether at least one attribute's value has changed, it is not important, which one changed.

...

A sub group can only be opened, after its parent group has been opened.
Subgroup processing starts as soon as the group header of the parent group has been printed. Immediatly Immediately after the processing finishes, the parent group's footer gets printed. The processing flow cannot reach the subgroup without passing through the parent group.
A sub group must cease control as soon as one the parent's group attributes changes.
A control break in one of the parent groups will close all subgroups. As soon as the parent group generated a new group instance, new instances of the sub groups will be opened as well. (In Pentaho Reporting, sub groups must contain all the group attributes of it's parent.)
Adding an attribute, which has a constant value, to the group definition will not alter the number or order of the generated group instances.
This allows you to insert artificial group levels to the report by referencing static or non-existing fields (which always evaluate to 'null') in addition to the real group fields. That little trick can be used to print more than one group header or footer.
Adding an attribute, which has a constant value, to the group definition will not alter the number or order of the generated group instances.
A group can only have one directly attached sub group. Building trees of groups is impossible with that algorithm.

...

The effects of the control break algorithm are best unterstood understood when looking at an example. Lets take the following table as datasource.

...

Versions Compared

Old Version 2

New Version 3

Key

Some Definitions

Page Comparison

Versions Compared

Old Version 2

New Version 3

Key

Some Definitions