Extraction of data from source systems
The complexity of extraction batch depends very much on the environment.
- Is source system mission critical?
- Can the source system sustain a long query?
- Is source system located in local lan or cloud?
- Is source system continuously being accessed?
There are 2 main scenarios...
- Push - kettle batch located at source system and pushes data to ETL staging area
- Pull - kettle batch located in ETL server pulling data into the ETL staging area
Pattern 1: Full extract with output truncate
The kettle script consist of an input step and an output step.
Output step is set to truncate table.
Pattern 2: Full extract with sql script
The kettle script consist of an input step and an output step plus a sql script that is not connected.
Pattern 3: Full extract