PA_CR_PA-3.7.0.0-752_impalad-1.3.0
Preface
This report covers the following products.
- Pentaho Analysis 5.1.0.0-752 ( 3.7.0.0-752 )
- Impala impalad version cdh5-1.3.0 RELEASE
- Pentaho SHIM for CDH 5.0, as shipped with the Pentaho Platform
Feature |
Status |
Notes |
---|---|---|
Degenerate Schemas |
|
 |
Star Schemas |
|
 |
Snowflake Schemas |
|
Implicit crossjoins are not supported. No functional impact on Analyzer, but can cause troubles with complex hand-crafted MDX. |
Filters & data types |
|
The JDBC driver fails to recognize the TIMESTAMP keyword. |
Top Count |
|
 |
Aggregation Tables |
|
The JDBC driver doesn't return the proper metadata when providing a list of the tables present in a database. |
Null Values & Keys |
|
 |
Inline Tables |
|
 |
Distinct Count |
|
Not all forms of distinct counts are supported, although the minimum support it offers is sufficient for Mondrian. |
Grouping Sets |
|
Grouping sets are not supported. |
Failures
Data types and Native filters
Symptom
Not all types of data are supported. The dialect for Impala (and Hive) don't represent TIME and TIMESTAMP values correctly, resulting in a SQL error.
Failed tests
Test |
Result |
---|---|
org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicateNoJoinsDateLiteralSyntax |
The columns of type TIMESTAMP are not represented correctly by Mondrian's dialect. The presence of the keywork 'TIMESTAMP' seems superfluous and not required by Impala and Hive. java.lang.Exception: Query failed to run successfully: select sum(store.store_sqft) as m0 from store store where ( store.store_country = 'USA' and store.first_opened_date = '1981-01-03' and store.last_remodel_date = TIMESTAMP '1991-03-13 00:00:00' ) or ( store.store_city = 'San Diego' and store.store_state = 'CA' ) or ( store.store_state = 'WA' and store.store_sqft > 30000 ) or ( store.store_sqft is null) at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73) at org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicateNoJoinsDateLiteralSyntax(NativeFilterTest.java:230) |
org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicate |
The columns of type TIMESTAMP are not represented correctly by Mondrian's dialect. The presence of the keywork 'TIMESTAMP' seems superfluous and not required by Impala and Hive. java.lang.Exception: Query failed to run successfully: select sum(sales_fact_1997.unit_sales) as m0 from store store , product product , sales_fact_1997 sales_fact_1997 where sales_fact_1997.store_id = store.store_id and sales_fact_1997.product_id = product.product_id and (( store.store_country = 'USA' and store.first_opened_date = '1981-01-03' and store.last_remodel_date = TIMESTAMP '1991-03-13 00:00:00' ) or ( store.store_city = 'San Diego' and store.store_state = 'CA' ) or ( store.store_state = 'WA' and store.store_sqft > 50000 and product.gross_weight = 17.1 ) or ( store.store_sqft is null ) ) at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73) at org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicate(NativeFilterTest.java:224) |
Automatic recognition of aggregation tables
Symptom
The JDBC driver does not return properly formatted data when Mondrian asks for a list of the tables available. This results in an inability to automatically discover the aggregation tables which might be present. This does not affect aggregation tables that are declared explicitly in schema.
Failed tests
Test |
Result |
---|---|
org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testAggregationRecognition |
The method to obtain a list of tables isn't implemented properly in the Pentaho shim. It returns only one columns with the table names, whereas the API says it must return at least 4 columns, the 4th being the name. Caused by: java.sql.SQLException: Invalid columnIndex: 3 at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:491) at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:629) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$ResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:682) at com.sun.proxy.$Proxy8.getString(Unknown Source) at mondrian.rolap.aggmatcher.JdbcSchema.addTable(JdbcSchema.java:1282) at mondrian.rolap.aggmatcher.JdbcSchema.loadTablesOfType(JdbcSchema.java:1265) at mondrian.rolap.aggmatcher.JdbcSchema.loadTables(JdbcSchema.java:1231) at mondrian.rolap.aggmatcher.JdbcSchema.load(JdbcSchema.java:1100) at mondrian.rolap.aggmatcher.AggTableManager.loadRolapStarAggregates(AggTableManager.java:178) at mondrian.rolap.aggmatcher.AggTableManager.initialize(AggTableManager.java:91) |
org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testGetTablesJdbc |
The method to obtain a list of tables isn't implemented properly in the Pentaho shim. It returns only one columns with the table names, whereas the API says it must return at least 4 columns, the 4th being the name. java.lang.AssertionError: Column 'table_cat' doesn't exist in the columns result set '[name]' at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.pentaho.mondrian.tck.SqlExpectation.validateColumns(SqlExpectation.java:75) at org.pentaho.mondrian.tck.SqlExpectation.verify(SqlExpectation.java:62) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:75) at org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testGetTablesJdbc(AggregationTablesRecognitionTest.java:53) |
Warnings
Support for database joins
Symptom
Not all forms of joins are supported by the database. This will prevent some types of schemas from being supported on the DB evaluated. In the case at hand however, only a single type of joins has failed and it is not a type that can be reproduced nor exercised by the MDX generated by analyzer. Mondrian can use all 3 of the general schema forms of schemas; degenerate, star and snowflake.
Failed tests
Test |
Result |
---|---|
org.pentaho.mondrian.tck.JoinTest.testImplicitJoin |
Implicit joins are not supported. If mondrian tries to evaluate a crossjoin of the members of two levels in a context allowing empty cells, the fact table is omitted from the SQL query and both tables are joined by what is called an 'implicit' join. java.lang.Exception: Query failed to run successfully: select warehouse.warehouse_id, warehouse_class.description from warehouse, warehouse_class at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73) at org.pentaho.mondrian.tck.JoinTest.testImplicitJoin(JoinTest.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.sql.SQLException: NotImplementedException: Join with 'warehouse_class' requires at least one conjunctive equality predicate. To perform a Cartesian product between two tables, use a CROSS JOIN. at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:167) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:155) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:210) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$CaptureResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:513) at com.sun.proxy.$Proxy7.execute(Unknown Source) at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:264) ... 25 more |
Grouping sets
Symptom
Queries which use grouping sets are not supported. This is a optimization feature supported by some more advanced databases. It allows to batch cell requests and improve the overall performance.
Failed tests
Test |
Result |
---|---|
org.pentaho.mondrian.tck.GroupingSetTest.testEmptyEntry |
Grouping set queries are not supported. select customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost from time_by_day, sales_fact_1997, customer where (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997 and sales_fact_1997.customer_id = customer.customer_id) group by grouping sets ((customer.gender),()) |
org.pentaho.mondrian.tck.GroupingSetTest.testPlainEntry |
Grouping set queries are not supported. select customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost from time_by_day, sales_fact_1997, customer where (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997 and sales_fact_1997.customer_id = customer.customer_id) group by grouping sets ((customer.gender)) |
org.pentaho.mondrian.tck.GroupingSetTest.testComplexEntry |
Grouping set queries are not supported. select time_by_day.the_year as the_year, customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost from time_by_day, sales_fact_1997, customer where (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997 and sales_fact_1997.customer_id = customer.customer_id) group by grouping sets ((time_by_day.the_year, customer.gender)) |
org.pentaho.mondrian.tck.GroupingSetTest.testMultipleEntries |
Grouping set queries are not supported. select time_by_day.the_year as the_year, customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost from time_by_day, sales_fact_1997, customer where (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997 and sales_fact_1997.customer_id = customer.customer_id) group by grouping sets ((time_by_day.the_year, customer.gender), (time_by_day.the_year),()) |
Distinct Count
Symptom
Not all forms of distinct count queries are supported. One form of distinct count for multiple columns is supported however, so mondrian can batch the queries as needed. The integration tests have also shown that the dialect is issuing the distinct count queries correctly.
Additionally, the JDBC driver doesn't provide Mondrian with metadata concerning the cardinality of the columns. This forces mondrian to issue queries like "select count " which are costly to run.
Failed tests
Test |
Result |
---|---|
org.pentaho.mondrian.tck.DistinctCountTest.testMultipleColumnSQL |
Cannot batch multiple distinct count columns with the following syntax java.lang.Exception: Query failed to run successfully: select count(distinct(customer_id)), count(distinct(product_id)) from sales_fact_1997 at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:265) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73) at org.pentaho.mondrian.tck.DistinctCountTest.testMultipleColumnSQL(DistinctCountTest.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.sql.SQLException: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (customer_id)); deviating function: count(DISTINCT (product_id)) at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:167) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:155) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:210) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$CaptureResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:513) at com.sun.proxy.$Proxy6.execute(Unknown Source) at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:263) ... 25 more |
org.pentaho.mondrian.tck.DistinctCountTest.testJDBCIndexes |
The call to obtain a list of indexes isn't implemented in the JDBC driver java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveDatabaseMetaData.getIndexInfo(HiveDatabaseMetaData.java:386) at org.pentaho.mondrian.tck.DistinctCountTest$1.getData(DistinctCountTest.java:77) at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73) at org.pentaho.mondrian.tck.DistinctCountTest.testJDBCIndexes(DistinctCountTest.java:92) |