What does BI-Lifecycle Management stand for and what Pentaho addresses?
What Pentaho addresses in 4.2 is showed in bold:
- Requirements, Feature management
- Modeling, Design (with the Spoon module, including Agile-BI)
- BI-Project Management
- Configuration Management (integration with 3rd party tools, see below)
- Software Management
- Build, Release management
- Testing, Validations
- Deployment (see details below)
- Issue management, Workflow
- [and more... highly depending on your project, company size and environment]
PDI can be integrated into existing Lifecycle Management tools, e.g. Subversion or Git.
Deployment, Configuration Management (Team Collaboration, Code-Versioning) and the Metadata Store
PDI options for storing Metadata:
- File based (Repository): Stored in XML format in the file system as .ktr (transformations) or .kjb files (jobs)
- Database Repository: Stored in relational SQL tables (ER-Diagram is available)
- Enterprise Edition (EE) Repository: Content Management System with versioning, team functions, security and much more, included in the Data Integration Server
Notes on the File Based Repository (since PDI 4.0):
The file based repository stores the jobs and transformations as .kjb and .ktr files in XML format below a given directory and can be used like the normal File based store. The main difference: It can be referenced like a job or transformation stored in a database or enterprise repository but is still stored in the file system.
Deployment options:
Manual tasks
- File based: Copy, CVS check in/out, commit etc.
- Repository: Export / Import, (EE only: Lock, Revert)
Automated tasks
- File based: automate the manual tasks, especially copy, CVS checkout, build processes, ...
- Repository: Command line options for repository Export / Import
Existing Enterprise Infrastructure Integration
Simple integration in your environment by using the file based store and/or command line options for
- Team collaboration, code versioning systems (CVS, Subversion, Rational, Git, ...)
- Often in combination with other areas like build, release management, testing...
Concepts for the Enterprise Edition (EE) Repository
Content Management - Features
- Repository based on JCR (Content Repository API for Java)
- Repository Browser
- Enterprise Security
- Configurable Authentication including support for LDAP and MSAD
- Task Permissions defining what actions a user/role can perform such as read/execute content, create content and administer security
- Granular permissions on individual files and folders
- Full revision history on content allowing you to compare and restore previous revisions of a job or transformation
- Ability to lock transformations/jobs for editing
- 'Recycling bin' concept for working with deleted files
Content Management – Team Projects
- Have one DI Server that holds the development repository, security and scheduling
- Deploy the DI Clients to the team members
- Depending on your environment, the options are:
- Have additional DI Servers for test and production
- Have dev/test/prod directories below your team project and change the directory by named variables (see also below)
- Private and public directories – proposed ongoing
- Use your private directory for your own work and tests
- Use the public directory for your team projects
- Deployment scenario – depending on the team size et al.
- When you want to work on a transformation or job:
- Lock it with your name or
- Move it to your private directory
- When you finished the work:
- Unlock it or
- Move it back to the public project directory
- When you want to work on a transformation or job:
Note: When you move it, it can still be referenced when links are specified with „Specify by reference" (see below)
- When team members still need the part you are working on, you need to copy it (when the links are entered without references):
- At this time you can only do a „Save as" for copying
- The backdraw with „Save as" is: you loose the Version history
- Therefor a move to another work place, referencing or locking is the best ongoing when possible
Content Management – Backup and Deployment
- The actual backup strategy is to backup the whole folder:
/data-integration-server/pentaho-solutions/ - This includes the repository, security and scheduling e.g.
pentaho-solutions/system/jackrabbit/
pentaho-solutions/quartz/ - Please remember to stop and start the DI server.
- More details can be found in the PDI Admin Guide in the Knowledge Base.
Backup from one server and restoring to another server could be a option for the deployment, but not, when you have different security and scheduling on the test and production servers. You may omit the pentaho-solutions/quartz/ folder in this case.
Another deployment method would be to Export and Import the whole repository or parts of it. (see deployment options and the links to the related documentation for reference)
You may chose the option to have dev/test/prod directories below your team project and change the directory by named variables:
You can link to a resource (like a sub-job, a transformation, or a sub-transformation) by name or reference.
- The advantage of specifying by name and directory is:
- You can use variables for the name and directory
- The advantages of specifying the reference are:
- You can move the referenced object to another location
- You can rename the object
- You can rename parts of the directory
Related Documentation
- Database Repository
- Enterprise Repository (Pentaho Knowledgebase)
- Automated Repository Export: See the job entry Export repository to XML file (this can be combined with command line options)
- Automated Repository Import: The Import tool provides a command line interface for the import of transformations and jobs into a repository
- Exporting resources
- Example of an automated process: M2P - Move To Production