Using Spoon for Greenplum Parallel Data Loading with GPFDIST

With Greenplum's external tables and parallel file server, gpfdist, efficient data loads can be achieved.  Spoon provides a convenient loading mechanism as described with this simple two step job.

The "Create External Table", as shown below creates an external table, external_samples_customer2.  The data is provided by two locations on the same etl server, etl1.  Two instances of gpfdist are running o this server, one on port 9080 , the other on port 9081


The customers-100.txt file server by both gpfdist servers share an identical layout but not data- they are partitioned.  The server at port 9081 is provding customer data where the customer id is in the range 1 t0 49.  The 9082 server provides customers 50 through 100.

Errors such as data being too long for the target column are logged in err_customer.  This table is created in the same schema as the target table if it does not already exist.

The "Load from External Table" performs the data load using SQL: