Using Spoon for Greenplum Parallel Data Loading with GPFDIST
With Greenplum's external tables and parallel file server, gpfdist, efficient data loads can be achieved. Spoon provides a convenient loading mechanism as described with this simple two step job.
The "Create External Table", as shown below creates an external table, external_samples_customer2. The data is provided by two locations on the same etl server, etl1. Two instances of gpfdist are running o this server, one on port 9080 , the other on port 9081
.
The customers-100.txt file server by both gpfdist servers share an identical layout but not data- they are partitioned. The server at port 9081 is provding customer data where the customer id is in the range 1 t0 49. The 9082 server provides customers 50 through 100.
Errors such as data being too long for the target column are logged in err_customer. This table is created in the same schema as the target table if it does not already exist.
The "Load from External Table" performs the data load using SQL: