Module Creation Policy
Purpose
The purpose of this wiki page and the team behind it is to get a handle on the proliferation of SVN folders and C.I. and Release projects.
Drivers
- The ramp-up cost to platform development is too high in part due to the large number of Eclipse projects. It feels unwieldy and complex.
- Regarding platform plugins, we do not want to wind up with every new feature winding up in a plugin when it really belongs as part of the core platform
- The rich dependency structure and shear number of projects in the CI environment make it difficult to maintain
- Releasing the platform requires an orchestration of project/module release builds. We do not want to arbitrarily increase complexity here by adding new projects where they need not exist.
Team Members
Marc Batchelor - Chief Engineer, Pentaho
James Dixon - CTO, Pentaho
Doug Moran - VP Community, Pentaho
Will Gorman - Lead Engineer, Pentaho
Aaron Phillips - Engineer, Pentaho
Thomas Morgner - Chief Architect (Reporting), Pentaho
Definitions
- Module
A module is a folder under source code control that is built to create one or more artifacts.
- Project
A project is a collection of modules that occur in source control under a "trunk" folder. They are distinguished from modules in that they are delivered and versioned together.
Guidelines
We need to ensure that our new policy does not leave us in a worse place. We also want to keep in mind the reasons why we have chosen in the past to either consolidate or modulize projects (anyone have a better word than modulize?, if so pls replace). There are valid reasons for both and we should make those reasons known.
Guidelines for the new policy:
- supports the declarative nature of dependency management
- reduces community development cost (can we be more specific?)
- reduces Pentaho development cost (can we be more specific?)
- must continue to deter dependency creep (a motivator for modulization)
The decision on how far to modularize has a technical impact to:
- Release environment - The cost to add a project or module tracks linearly to the number of projects or modules added. In the release environment there is a lot of flexibility in how a release job is configured. For example, one job could build several modules.
- CI environment - An increase in the number of modules typically has a non-linear cost in the CI environment. Unlike the release environment, we do not typically group modules in a single job, but assign a unique job to the smallest unit of source that produces a jar (i.e. .
- development Eclipse project configuration
Options
- An approach to managing modules in Eclipse: Consider that an Eclipse project (little p) maps neither to a Project (big P) nor a Module. An Eclipse project then represents a view of one or more projects and/or modules. Given that assertion, we could present a much more clean view into developing the platform than we have in the past. For example, a developer may now see only 5 Eclipse projects instead of 15, where each of the 5 Eclipse projects manages 3 modules. We could make this work by having platform SVN "master" folders like we do with plugin-actions. Plugin-actions would be comprised of a number of self-contained modules which manage their own dependencies via unique ivy.xml files. The master project, plugin-actions, could then have a generated ivy.xml that includes the various module ivy.xml's to give Eclipse IvyDE developers a useable project.
Modularization: Pro's & Con's
Pro:
- Minimize dependency creep
- Smaller chunks to work with
- Enforces clean interfaces between modules
- Dependencies can be tracked easier
Contra:
- More complex to set-up and
- More expensive to maintain the build
- Hudson does not handle modularized projects very well
- Eclipse does not handle modularized projects well
Comments
- In addition to the module-builds that we currently have ... I would propose we have a project level build that would resolve & compile each module under it (in order to identify circular dependencies) and produce one jar file for the project. Examples: bi-platform-v2 (open) would have 20-25 modules each capable of producing their own artifacts, but the vi-platform-v2 project would have a build that would product 1 jar file which contained the each modules compiled code.
- It may already help to have a easy way to set-up the development environment for the developers. Have a master-script that checks out the project into a well-defined directory structure with all IDE-files set up so that you can start immediately.
- Question: Are we just having a semantic discussion here about modularization? The fact is that we need to have independent source trees work together to comprise a "project". I don't know if we are asking the right question: How far should we go with modularization? That doesn't seem like the right question. I think the modularization is inevitable. Isn't the real question is how will this modularization manifest (i.e. in many Eclipse projects)? I realize I'm going a little backwards here, but I think I need to understand what the problem actually is, and the problem is not that we are too modular or too consolidated. I started the Drivers section above to help us get a handle on what the pain points are and what problems we need to solve. -AP