Pentaho Software Architecture

Pentaho Software Architecture

Introduction

The purpose of this document is to provide a detailed view of the overall software components that when combined make up the entire Pentaho open source software suite as it exists today.

At a high level, the software components can be divided into a variety of forms.  In the following detailed list, the general organization includes third party libraries and components that Pentaho has needed to fork and maintain, common libraries and projects that are used in general ways, pillars that are core business analytics or data integration elements, tools that allow access to pillars, and plugins across the pillars that provide additional functionality.  These same components can be looked at from a architectural purpose point of view, including four general areas including information delivery, data management / integration, analytics / reporting, and platform services.  For each project below we categorize in both manners to give a multi-faceted view of the overall architecture of Pentaho.

Cross Cutting Architectures, Best Practices and Use Cases

This section discusses high level cross cutting software architectures and use cases.

Configuration Management

At this time, Pentaho utilizes a combination of SVN and GIT for managing the source.  Here are some related articles:

http://wiki.pentaho.com/display/PEOpen/Advanced+Git+Topics

Metadata Definitions

As we continue to build a community of projects, it's important that they share terminology and common metadata.  Here's the beginnings of capturing shared metadata to be used across all Pentaho projects:

http://wiki.pentaho.com/display/COM/Standard+MetaStore+Element+types

Javascript Development Guidelines

Pentaho's core technology is developed within the Java Platform, but more and more the need for rich browser-based applications is becoming critical.  Pentaho has a number of components that are browser only.  It's important that we share a common approach across these projects.  Here is the beginnings of our Javascript Development Guidelines:

http://wiki.pentaho.com/display/ServerDoc2x/Javascript+Development+Guidelines

Pentaho Prompting API

This is a generally useful library used in a variety of contexts, and is part of the Common UI plugin to the Pentaho Platform.

http://wiki.pentaho.com/display/ServerDoc2x/Pentaho+Prompting+API

Pentaho Coding Standards

Cross cutting coding standards for all modules of the Pentaho suite can be found on our github project, this includes configurations for the most popular IDEs.

https://github.com/pentaho/pentaho-coding-standards

Additional content needed around:

Visualizations

Logging

Plugin Architectures

Platform / BA Server Related

Scheduling and Background Execution in Pentaho User Console: Plugin Scheduling and Background Execution 

Intro for creating a REST service for the BA Server: How to create and register a new REST service from a plugin

Developing Plugins Developing Plugins

Kettle Related

Extending Kettle (Infocenter SDK)

UI Technologies

Datasources

Detailed Software Listing

This detailed software listing is organized in the general order in which software components are dependent on one another, although it should not be used as the official build order of Pentaho.

Third Party Maintained Forks
Common Components
Pillars
Tools
Plugins

Third Party Maintained Forks

It is Pentaho's intention to avoid having to fork and maintain third party open source software, but on a few occasions it has been necessary.  The following list is of the current third party maintained forks that Pentaho includes in our product.

kettle-vfs

Kettle VFS is a maintained fork of Apache Commons VFS

Source Path: svn://source.pentaho.org/svnkettleroot/kettle-vfs

Architectural Owner: Matt Casters

Architectural Area: Data Management / Integration

hive

Due to the dynamic nature of Hadoop, Pentaho currently maintains our own Hive JDBC Driver implementation

Source Path: https://github.com/pentaho/hive

Architectural Owner: Will Gorman

Architectural Area: Data Management / Integration

pentaho-ofc4j

Pentaho ChartBeans Flash components, which are still used by Pentaho Dashboards and Action Sequences, are based on Open Flash Chart.  OFC4J is a Java to JSON converter that is used to generate the correct metadata for the charts on the server that is no longer maintained by the creator of the project.

Source Path: https://github.com/pentaho/pentaho-ofc4j

Architectural Owner: Will Gorman

Architectural Area: Information Delivery

Common Components

This is a list of all the common libraries that Pentaho maintains that are included as part of the Pentaho Suite of technologies.   Each common component has a specific purpose, and may be used by one or more pillars.

subfloor

Subfloor is Pentaho's common build system, based on ant and used by all projects for compilation, assembly, unit testing and code coverage.

Source Path: https://code.google.com/p/subfloor/  (Note that this location is out of date and should be transitioned to GitHub)

Architectural Owner: Will Gorman

Architectural Area: Build

pentaho-commons-database

This commons project is a GWT thin client of the shared database dialog.  The submodule pentaho-database-model was an attempt at a thin Kettle DatabaseMeta implementation, which includes a dialect and JDBC Metadata architecture.

Source Path: https://github.com/pentaho/pentaho-commons-database