Pentaho Data Integration Case Studies

At the Traffic Centre of the Flemish governement, they have an application called the Traffic Control Centre that they use to monitor the state of the road network in Flanders. Every minute data is entering the system from more then 570 locations on the highways. However, because of the high volume of data this represents a problem towards the analyses of traffic situations. For that reason, a decision was made to go for the creation of a data warehouse. This data warehouse puts data in a multi-dimensional data model to allow the combination of different types of data against common dimensions. For example, it becomes possible to look the speeds measurements on a certain point on a road together with the weather conditions in that region. Long term storage is also a part of the objective. Because a data warehouse is by definition a historical database, this part would be easy if not for the large data volumes involved in this case. The largest fact table of the Traffic Centre's data warehouse contains above 1 bilion rows for 13 months worth of historical data. Kettle is used in this project to handle the data acquisition on the Traffic Control Centre application, as wel as the update of the dimensions and fact tables in the data warehouse. Furthermore, detailed logging is used to check for errors in the jobs that are launched every 15 minutes. (96 times a day!) So far the project is running for almost a year without a problem.



 Financial institutions have to deal more and more with increasing regulations and obligations. One of the new accounting rule-sets that are coming their way is called "Basel II". One interesting part of "Basel II" says that banks need to keep a certain standard reserve for bad loans. However, if the bank in question can prove that they have a better then average loans portofolio, they can lower the percentage of money that they have to keep in reserve. This is called the Internal Rating Based (IRB) calculation. So, instead of letting money sit idle in the reserve, this money can be used to generate new income from other banking operations. However, how do yo prove that your customers are better than average? Well, you can use a data warehouse to do this. The data warehouse needs to acquire information from different parts of the lending process, from the request for a loan, over the acceptance to the processing of the monthly payments. A lot of information is also uploaded in the warehouse about the customers info and the different kind of products he has. For example if a customer has a lot of savings, this would lower the risk of a bad loan. A data warehouse can then excell in the creation of reports that combine all this info, providing a relative easy solution to this complex problem. The use of a data warehouse is also encouraged in this case because the IRB system has to have historical data for the past 6 years in order to be valid.

Direct Marketing

This is one area where a business intelligence system can really make a difference. That is because a direct marketing manager can make a lot of use of a data warehouse to report on his customers. For example if he has a budget to send a mailing to his 1000 best customers, he needs to select these 1000 customers. To determine what his best customers are, the manager wants to use Recency, Frequency and Monetary value (RFM) as parameters. These parameters are determined in the data warehouse by first gathering all the sales data in a sales fact table. (sales per customer per product). Every month, the data warehouse looks at all the customers and looks in the sales fact table and determines

  • how long it's been since the customer ordered? (Recency)
  • how many times he ordered in the last year? (Frequency)
  • the total amount of products ordered in that year? (Monetary value)

Based on this new fact table containing the RFM data, the manager can then make a better choice of customers based on the marketing principles of RFM. In short RFM means: a customer that orders frequently and has ordered not long ago and for a lot of money is more likely to do so then others in the future.As a consequence, if you select the 1000 customers based on RFM, you are likely to get a better return on investment then by selecting 1000 random customers. Departing from the sales and RFM fact tables, you can segment your customers into types like 'new customer', 'very good customer', '...' that eases the selection even more.