Documenting Pentaho Data Integration (Kettle) Projects
Introduction
Kettle transformations and jobs files are saved as xml files.Â
XSLT transformations can be used to generate dynamic documentation of ETL projects in Pentaho.
Organizing folders in projects
Organize your folders starting with a project folder.
Folder |
Content |
---|---|
/ETL |
Main folder for all ETL projects |
/ETL/extract_prices |
Project folder for "extract_prices" project |
/ETL/extract_prices/logs |
log output for all jobs and transforms |
/ETL/extract_prices/docs |
documentation folder |
/ETL/extract_prices/docs/xslt |
xslt transforms for documentation |
/ETL/extract_prices/sandbox |
sandbox for testing transformations and jobs |
/ETL/extract_prices/data |
Data files used in ETL |
Using xalan script to generate xhtml files in batch
Step 1 - Download xalan
Download xalan project in http://xml.apache.org/xalan-j/
Copy the following 4 files and place them in PROJECT/docs/xslt/xalan/ folder
- serializer.jar
- xalan.jar
- xercesImpl.jar
- xml-apis.jar
Step 2 - write a batch file
Assuming a windows installation, write a bat file and place it in xalan folder...
set CLASSPATH=. java org.apache.xalan.xslt.Process -IN %1 -XSL %2 -OUT %3
Alternatively, look at the kettledoc.bat file attached.
Step 3 - Copy kettle.xsl to PROJECT/docs/xslt
Copy attached file kettle.xsl to PROJECT/docs/xslt
Step 4 - Copy pentaho.css to PROJECT/docs/xslt
Copy attached file pentaho.css to PROJECT/docs/xslt
Step 5 - Copy ui/images
Copy /ui/images folder from the kettle installation folder into PROJECT/docs/xslt/ui/images
Step 5 - Transform ktr file to html
To convert a ktr/kjb file to html in Pentaho color scheme and style do this...
in xalan directory...
xalan.bat ../../../KETTLE.ktr ../kettle.xsl ../../KETTLE.ktr.html
open the html file in any browser.
Using any web browser to dynamically view the transform without a batch
- Copy your ktr kjb files to the PROJECT/docs folder
- Copy the attached xslt files into the docs folder
- Use an editor and insert this line right after the <?xml> tag (line 1)
<?xml-stylesheet type="text/xsl" href="kettle_job_xslt.xml"?> for kjb <?xml-stylesheet type="text/xsl" href="kettle_trans_xslt.xml"?> for ktr
- Use any browser and open each kjb/ktr directly.
Note: Attached is kettle.xsl that combines both jobs and transformations.
Modification to spoon to enable dynamic documentation
Modify org/pentaho/di/core/xml/XMLHandler.java.
Look for the following code segment...
/** * The header string to specify encoding in an XML file * @param encoding The desired encoding to use in the XML file * @return The XML header. */ public static final String getXMLHeader(String encoding) { return "<?xml version=\"1.0\" encoding=\""+encoding+"\"?>"+Const.CR; }
Insert this string to the XML header
<?xml-stylesheet type="text/xsl" href="kettle_job_xslt.xml"?>
Recompile.
Now, when you read or write ktr/kjb files, it will include the xsl header line.
Kettle cookbook
Look at kettle cookbook at code.google.
It generates the image of the transform/job.