Partitioned Fact Tables Design Discussion
This is a discussion of the Partitioned Fact Tables feature. It does not represent current behavior of Mondrian and if implemented, the feature may differ from how it is described in this discussion.
-- jhyde
Introduction
Peter Fopma writes:
I would like to put up some thoughts for discussion for a concept for partitioned tables.
Use case scenarios we expect
- every month/quarter/year, product (or similar) is in a single table
- for dimensions splitting a partition a lower or upper bound can be specified (as long as there is an order to the members)
- for dimensions splitting a partition a range can be specified
- ranges of a dimension can overlap for different partitions
- data in overlapping ranges can either be disjoint or not (eg. disjoint: the fact tables contain data from separate data sources. Not disjoint: The cube is created from overlapping partial cubes)
As a precondition the partitions must all have exactly the same columns (name and type). [I agree - Julian Hyde]
In BNF the definition of partitions would look something like this:
<Facttable> ::= <Table> | <Partitions>; <Partitions> ::= "<Partitions", [<Disjoint>], ">", {(<Partition> | <PartitionPattern> | <PartitionExclude)}, "</Partitions>"; <Disjoint> ::= "disjoint=" , ("true" | "false"); (* Default disjoint=false *) <Partition> ::= "<Partition", <Name>, <Cache>, ">", <Table>, <Ranges>, "</Partition>"; <Name> ::= "name=", <Identifier>; <Cache> ::= "cache=", ("true" | "false"); <Table> ::= "<Table" <Name>, <Schema> "/>"; <Schema> ::= "schema=", <Identifier>; <Ranges> ::= "<Ranges>" {<Range>}, "</Ranges>"; <Range> ::= "<Range", <Dimension>, ">" {<Rangemembers>}, "</Range>" <Dimension> ::= "dimension=", <Identifier>; <Rangemember> ::= "<RangeMember", (<LowerBoundMember> | <UpperBoundMember> | <BoundMember> | <Member>), "/>"; <LowerBoundMember> ::= "bound=lower", <Member>; <UpperBoundMember> ::= "bound=upper", <Member>; <BoundMember> ::= "bound=lower", <Member>, "bound=upper", <Member>; <Member> ::= "member=", <MemberName> <MemberName> ::= "[", <Identifier>, "]", [".", <MemberName>] <Identifier> ::= Any Characters <PartitionPattern> ::= "<PartitionPattern", <Pattern>, ">", <MappingFunction>, "</PartitionPattern>"; <Pattern> ::= "pattern=", <Identifier>; <PartitionExclude> ::= "<PartitionExclude", (<Pattern> | <Name>), "/>"; <MappingFunction> ::= "<MappingFunction name=", <Identifier>, "className=", <Identifier>, "/>";
Cache
The cache attribute is used to indicate whether or not the data of a partition should be cached. If data in a partition is changing it is useful to disable caching data from that partition.
Overlapping ranges, disjoint data
Disjoint or not can only be set for all partitions. Otherwise it would have to be specified for all possible combinations of partitions which would grow exponentially with the number of partitions. When the data is disjoint all partitions which contain the queried data must be aggregated. In the case of data being not disjoint only one partition needs to be queried.
Julian Hyde writes:
Note that 'disjoint' means 'not overlapping'; I think you may be using the term in the reverse sense.
For an example of a disjoint set of partitioned fact tables, look to the FoodMart schema. Foodmart's Sales cube is based on 3 partitioned fact tables:
- sales_fact_1997 year=1997
- sales_fact_1998 year=1998 and month != 12
- sales_fact_dec_1998 year=1998 and month=12
These conditions are disjoint - there is no value of year or month that satisfies more than one condition - but are not exhaustive - there are some values of year and month which do not fit into any fact table.
Lastly, partition tables will be handled by mondrian's SQL generation layer, which is not aware of members, only column values.
Here is how Foodmart's table would look using column-value-based predicates:
<Partitions disjoint="true"> <Partition> <Table name="sales_fact_1997"/> <Predicate> <Op name="="> <Column table="sales_fact_1997" name="the_year"/> <Value>1997</Value> </Op> </Predicate> </Partition> <Partition> <Table name="sales_fact_1998"/> <Predicate> <Op name="and"> <Op name="="> <Column table="sales_fact_1998" name="the_year"/> <Value>1998</Value> </Op> <Op name="!="> <Column table="sales_fact_1998" name="the_month"/> <Value>12</Value> </Op> </Op> </Predicate> </Partition> <Partition> <Table name="sales_fact_dec_1998"/> <Predicate> <Op name="and"> <Op name="="> <Column table="sales_fact_1998" name="the_year"/> <Value>1998</Value> </Op> <Op name="="> <Column table="sales_fact_1998" name="the_month"/> <Value>12</Value> </Op> </Op> </Predicate> </Partition> </Partitions>
(I added a new element, Partition, to combine the table and the predicate, so that I did not need to change the <Table> tag.)
With this scheme, implementing the 'disjoint' attribute will be straightforward: on loading the schema, mondrian should examine the conditions and make sure that they are logically disjoint. The 'disjoint' attribute is just a sanity check for the schema designer: I don't think it will improve runtime performance.
By the way, the predicates do not deal with the problem that a partition table contains records that do not match the predicate. E.g. if sales_fact_1997 contains rows for 1996. Mondrian should assume that sales_fact_1997 contains only rows for 1997 and therefore should not generate 'where the_year = 1997', and so if there are 1996 rows in the table mondrian will give wrong results. This is justified for performance reasons. We might add a schema validation utility that checks that partition table contents match their predicates.
Peter Fopma replies:
With the disjoint attribute it is indicated whether data is exclusively in a partition or is duplicated in another partition. A partition could contain data for products starting with the letters A to E. Another partition might contain the data for the product [Product].[Aardvark]. When the data is disjoint the aardvark product is not in the partition with the products from A to E.
Disjoint data can be defined with predicates along another dimension but the dimension might not be specified for the cube and therefore all partitions must be queried and aggregated to get the result.
When partitions are not disjoint they result from partial cubes which overlap. A partition might contain a years data plus the ultimo month from the previous year, therefore data for December would be duplicated in each partition. During aggregation it must be ensured that the data for December is only taken from one partition.
Disjoint data:
<Dimension name="Location"> ... </Dimension> <Partitions disjoint="true"> <Partition> <Table name="sales_large_stores"> <Predicate> <Op name="="> <Column table="sales_large_stores" name="the_year"/> <Value>1998</Value> </Op> </Predicate> </Partition> <Partition> <Table name="sales_small_stores"> <Predicate> <Op name="="> <Column table="sales_large_stores" name="the_year"/> <Value>1998</Value> </Op> </Predicate> </Partition>
Analyzing the sales figures for stores with [Location].[West] requires the data from both partitions.
Not disjoint data:
<Partitions disjoint="false"> <Partition> <Table name="sales_fact_1997"> <Op name="or"> <Op name="="> <Column table="sales_fact_1997" name="the_year"/> <value>1997</value> </Op> <Op name="and"> <Op name="="> <column table="sales_fact_1997" name="the_year"/> <value>1996</value> </Op> <Op name="="> <column table="sales_fact_1997" name="the_month"/> <value>12</value> </Op> </Op> </Partition> <Partition> <Table name="sales_fact_1998"> <Op name="or"> <Op name="="> <Column table="sales_fact_1998" name="the_year"/> <value>1998</value> </Op> <Op name="and"> <Op name="="> <column table="sales_fact_1998" name="the_year"/> <value>1997</value> </Op> <Op name="="> <column table="sales_fact_1998" name="the_month"/> <value>12</value> </Op> </Op> </Partition> </Partitions>
Analyzing the sales figures from [1997].[Q1].[1] to [1998].[Q4].[12] requires to exclude December from one of the partitions.
Partition, PartitionPattern, PartitionExclude
It is possible to define partitions by specifying a regular expression for the table of each partition. As for the aggregate tables, PartitionExlcude is evaluated first to exclude unwanted tables, then the Partitions are evaluated and last the PartitionPattern is evaluated. Attributes of the first match are applied.
The problem here is to find a general compact mapping from the table name to the bounds or range of the dimensions. For automatic mapping each dimension with lower and/or upper bound must be encoded in the table name. Maybe a user-defined function could solve this by providing a mapping from the table name to a list of dimensions with lower and/or upper bounds!?
Representing partition bounds
Julian Hyde writes:
Regarding the bounds of partition tables. I think that the bounds should be expressed in terms of predicates on column values rather than members. How would you say 'this table contains all products whose manufacturer starts with A through E'? Difficult to achieve using members: you'd have to set the bound to something like [Products].[Adidas] and then change it if someone adds Aardvark as a manufacturer.
Also, the predicates can contain and, or, and not, and that flexibility is required in dealing with lists of partition tables.
Peter Fopma replies:
Predicates on column names make life simpler when it comes down to creating SQL. But wouldn't the predicates then be defined on the columns of the dimension table?
<column table="time_by_day" name="the_year">
Julian Hyde replies:
That's where my PhysicalSchema change comes in. When you have defined a physical schema, a column reference such as <Column table="time_by_day" name="the_year"/> has a table usage (in this case "time_by_day") which implicitly has a well-defined path to the fact table.
Aggregate tables
For aggregate tables the are three cases to consider
- the data is disjoint and therefore the aggregate tables contain disjoint data. In this case all aggregate tables must be queried independently from the others
- the data is not disjoint and aggregate tables are not disjoint, too. If a dimension along which partitions are defined is dropped in all aggregate tables which need to queried, the aggregate tables cannot be used. If the dimension is dropped in only one aggregate table it can be used and the other tables need to be restricted to data not in the aggregate table with the dropped dimension.
- independent from the cases above an aggregate table might exist for the union of all partitions. If a query can be fulfilled by this table it is probably a hot candidate.
A concept for handling drill-through (yes, some SQL with UNION ALL) and changes in the partitions list has not matured yet. I expect some other issues to arise during initialization (eg. cardinality, etc...) of mondrian.
Julian Hyde writes:
There are problems defining aggregate tables against a single partition table, because some aggregate tables are at a higher level of granularity: a particular row in an aggregate table may combine rows from more than one partition table. But requiring aggregate tables to be 'global' - contain all data - is problematic. There is an important use case where the aggregate table contains all data except for the current month (or day), and mondrian knows to generate a query that uses the aggregate table for old data, and accesses the fact table for new data.
I think the best solution would be for aggregate tables to have predicates, similar to the ones I described for partition tables. However, I don't think that we need to fix aggregate tables to get partition tables working. If <Partition> can contain the same aggregate elements (<AggName>, <AggExclude> etc.) that can be included in <Table>, that would be fine. There is no need for tables inside a partition to have their own aggregate tables.
Peter Fopma replies:
I agree that aggregate tables should be defined on partitions with predicates. But I would first focus on the partitions.
Implementation
My idea to implement the partitioning concept is to derive a new table class, eg. PartitionedTable which contains the information about all Partitions. A Partition in turn contains the name of the database table of the partition and information about bounds and ranges for the dimensions. When Segments are created in an Aggregation and loaded the Constraints provide the context information which table really should be used. Segments can then be created with Columns which refer to the 'real' Table of the partition. For the mesures the partition tables can then be queried in an extra loop (over the partitions). When the constraints are set (or extended in the case of non-disjoint data) to avoid duplicates the results from the partitions can be combined to yield the result.
Do you agree that this could be a feasible way to implement partitions?
Julian Hyde writes:
I am optimistic that by treating <Partition> as a set of tables UNION-ed together, and simply generating
(select * from "sales_fact_1997" union all select * from "sales_fact_1998" union all select * from "sales_fact_dec_1998")
where we currently generate "sales_fact_1997" we can quickly cover the main SQL generation cases, including drill-through.
Even so, I think we should trim the feature list to a minimum for the first cut. I think we can leave out the 'disjoint' attribute, if you agree that it does not add any performance. I would also defer implementing PartitionPattern and PartitionExclude, because deducing the bounds seems too complex, and if anyone really needs dynamic partitions, they can achieve the same effect using a DynamicSchemaProcessor.
Peter Fopma replies:
I tried the union all approach in an older version of mondrian. In my solution, as you suggest, I replaced the table name in the query by a subquery using union all for all the partitions. The problem with this solution is, that we could not get a consistent performance for the various database systems. We tried ORACLE, DB2 and MS SQL and got results in the range from milliseconds to astronomical time units...
I would rather adapt the segments to load and run simpler queries on each partition and only use the union all construct to access the data for drill through.
Julian Hyde replies:
Agreed, UNION queries can have terrible performance. Better to generate multiple queries, and combine the results using in a big UNION ALL, than to generate a JOIN or GROUP BY query with UNION inside it. The inner UNION may be necessary if the partitioned fact tables are not disjoint or if a grand total is being computed; but we should avoid wherever possible.
Scoping
What is the minimum we could implement in the first version?
- Omit distjointness support?
- Omit cache support?
What is the time scale?
Peter Fopma replies:
Current Status
Julian,
I do have a first version of the partitioning concept and would like to put it up for discussion. In this version I focused on determining the needed partitions and how these can be combined in a SQL-Statement with optimized UNION ALLs. I used Mondrian version 3.0.4.11371 as a base.
With an inner union all I mean an expression of the form
select ... from (select * from T1 union all select * from T2) as fact ...
an outer union all is a statement of the form
select ... from T1 where ... UNION ALL select ... from T2 where ...
I implemented the description in the schema as follows:
The facttable of a cube can be <Partitions> which is a collection of <Partition>s.
A <Partition> contains a <Table> and a <Predicate> to describe which data is contained in the partition. The predicate itself can be a combination of <BooleanPredicateExpression> and <CompPredicateExpression>.
A boolean expression can be an 'and' or 'or' combination of two expressions (boolean or comparison). A comparison expression compares a column of a table with a value. The operator can be one of '=', '!=', '>' or '<'.
The evaluation of which partition is to be used for a Mdx-statement is done in the class AbstractQuerySpec in the function nonDistinctGenerateSql. Query generation in other functions will follow the same schema.
Since each column restricted by a StarColumnPredicate is used in the group by clause of the final SQL-Statement a single line for each value of the StarColumnPredicate will be created in the result. The values can therefore be retrieved from different partitions and combined with UNION ALL without changing the result set (compared to a non-partitioned facttable).
The StarColumnPredicates are grouped according to the dimension which they restrict. For each StarColumnPredicate the predicate of the partition is evaluated. If the predicate evaluates to true (the partition contains data for the StarColumnPredicate) this is marked in a bitvector. For a StarColumnPredicate with multiple values a set of bitvectors (one for each value) is created. The bitvectors for each StarColumnPredicate of a dimension are combined using logical-and to yield a final set of bitvectors for each dimension.
If bits overlap in two bitvectors this indicates that data for a StarColumnPredicate is contained in two partitions. In an outer UNION ALL this creates two result rows for a StarColumnPredicate. Therefore overlapping bitvectors must be combined with logical-or.
Bits in a bitvector indicate which partitions must be combined in an inner UNION ALL statement. Bitvectors which do not overlapp indicate which partitions can be combined in an outer UNION ALL statement.
For partitions on multiple dimensions the bitvectors must be combined across dimensions.
The class SqlQuery is extended to handle partitions in the function addFrom. Here an inner union of the form
select * from T1 union all select * from T2 union all ...
is created. The class is also extended to support union all for multiple (outer) statements.
Evaluating predicates for a StarColumnPredicate
To evaluate a predicate the comparison expressions must be combined using the boolean expressions to form a boolean result. Unless a comparison expression indicates that data for the StarColumnPredicate is not in the partition, the partition contains data for the StarColumnPredicate.
For the comparison exprressions it might be neccessary to match the level of the predicate and the StarColumnPredicate. For a time dimension it could be the case that the StarColumnPredicate restricts to a quarter but the expression is defined for months or vice versa. Each StarColumnPredicate is shifted to the level of the expression. For example the quarter Q1 and an expression on months, Q1 is replaced by (Jan, Feb, Mar).
The comparison expressions are then evaluated as follows.
- Equal:
At least one of the values of the StarColumnPredicate must be equal to the value of the predicate of the partition. - Not Equal:
At least one of the values of the StarColumnPredicate must not be equal to the value of the predicate of the partition. - Greater:
At least one of the values of the StarColumnPredicate must be greater than the value of the predicate of the partition. - Less:
At least one of the values of the StarColumnPredicate must be less than the value of the predicate of the partition.
ToDos:
The <BooleanPredicateExpression> and the <CompPredicateExpression> of the schema are only data containers in MondrianDef. The evaluation of a StarColumnPredicate is done in a separate class since data of a dimension might be needed which is not accessible in MondrianDef. This class could be created as soon as the schema is read. Currently this is done each time a predicate is evaluated.
The adoption of the level of the StarColumnPredicate to the level of the comparison expression should be done using the information about the dimension from the schema. Currently this is done in a class with fixed information. The approach to retrieve this information might be different when using the new physical star schema.
I am currently trying to find my way through the two todo points (maybe you could point me in the right direction!?). The result I got from the tests I made so far appear to be much better than the results from the first approach using only inner union alls.
Peter