Teradata Fastload Bulk Loader
Preface
The purpose of this paper is to provide a technical overview of the PDI Terafast Bulkloader Plugin.
Note: We recommend that you use Teradata TPT Insert Upsert Bulk Loader instead of this step. TPT brings combines FastLoad, MultiLoad, FastExport and TPump Teradata bulk loads. Teradata no longer updates the other bulk loaders.
Definitions
PDI |
Pentaho Data Integration. |
Fastload |
Teradata Fastload command line utility. |
Terafast |
Teradata Fastload PDI Plugin. |
Prerequisites
In order to use the Pentaho Terafast Bulkloader Plugin you need the following software installed:
- Pentaho Data Integration - http://kettle.pentaho.org/
- A Teradata DB.
- Teradata Client Tools (Fastload) on the client machine to access the Teradata DB. Those are installed together with the DB. If you want to run PDI and Terafast on a separate machine, you need to install the Client Tools separately there.
- Teradata JDBC Drivers are shipped with the Database. You can download them separately from: http://www.teradata-emea.com/DownloadCenter/Group48.aspx
Installation
Extract the ZIP-Archive Terafast-PDI-Piugin.zip and move following files/folders to the correct directory:
- Terafast Step Plugin
Move folder TeraFast (Can be found in: Terafast-Piugin/plugins/steps/) to KETTLE_HOME/plugins/steps/TeraFast (E.g: C:\pdi\plugins\steps) - ASC Commons
Move files asc-commons-io-1.0.0 .jar, asc-commons-lang-1.0.0 .jar, asc-commons-pentaho-1.O.O.jar (Can be found in: Terafast-Piugin/libext/) to KETTLE_HOME/Iibext/ (E.g: C:\pdi\libext) - Commons
Move files commons-io-1.4.jar and commons-lang-2.4.jar (Can be found in: Terafast-Piugin/libext/commons) to KETTLE_HOME/Iibext/commons (E.g: C: \pd i\li bext\com mons) - JDBC Driver
Move the JDBC Driver files (file names can differ slightly) tdgssconfig.jar and terajdbc4.jar to KETTLE_HOME/Iibext/JDBC/ (E.g: C:\pdi\libext\JDBC)
The TeraFast Plugin is now installed. It can be found within PDI Transformations in step category "Bulk Loading"
Functional Scope
The Pentaho TeraFast PDI Plugin supports fastloading data into a Teradata database. This functionality is achieved by using the Fastload command line tool from Teradata.
Basically the plugin supports 2 operating modes:
- Fastload Control File
- Interactive
Fastload Control File
- Runs as a step within a transformation completely independent from other steps (no input/output, just run the command line application with given Fastload control file as parameter. A reference for the Fastload tool including the control file syntax can be found here:
http://www .teradataforum .com/teradata pdf/b035-2411-062a.pdf - Supports replacement of PDI Variables in the fastload control file. (eg Target table, Run_ID)
Interactive
- The plugin can also be seamlessly integrated into a transformation chain, as shown in picture below. By not specifying a control file, the plugin expects rows from previous steps and pipes them into the Fastload tool. The necessary control f ile for fastload is built on demand with respect to the user options of the TeraFast plug in.
- IMPORTANT: This feature still lacks some performance which is planned to be improved in future releases.
GUI Usage
- Step name ... The PDI name for the Teradata Fastload Step.
- Use control file ... Work in control file mode. (see chapter Functional Scope).
- Control file ... The path to the control file to be used.
- Variable Substitution in control file ... Make use of POI Variables (eg: ${RUN_ID}) in the control file.
- Path to fastload ... The path to the fastload command line utility.
- Error log ... An optional Error log to be created by Fastload.
- Connection ... A connection to the Teradata DB.
- Target table ... The table to be loaded.
- Truncate table ... Truncate the target table before loading.
- Data file ... The name of the temporary data file.
- Sessions ... Number of sessions to be used by Fastload.
- Error limit ... The error limit for Fastload.
- Field mapping ... Definition of POI <-> DB Field mapping.
Note: Options 7 - 13 are only available in interactive mode.
Known Problems and Planned Features
The streaming mode lacks performance at the moment. This is mainly caused by known problems in the file writing process and on Linux (*nix) the Plugin could take advantage of a FIFO pipe that will allow the loading of the file with Fastload while it is still being written.
These improvements to the Plugin will be done later, when we have the time for it.
If these improvements are urgent to you, or need any other PDI-Plugin (or other Pentaho extensions), don’t hesitate to contact us at pentaho@aschauer-edv.at
Get in Contact
For more information on technical considerations or future planning of the Plugin, or if you want to contribute in developing further releases please contact Aschauer EDV via the follow e-mail address: pentaho@aschauer-edv.at
Attached is the documentation as provided by Aschauer EDV
Link to attachment: Terafast_Technical_Overview.pdf