Using Spoon for Greenplum Parallel Data Loading with GPFDIST

Using Spoon for Greenplum Parallel Data Loading with GPFDIST

With Greenplum's external tables and parallel file server, gpfdist, efficient data loads can be achieved.  Spoon provides a convenient loading mechanism as described with this simple two step job.

The "Create External Table", as shown below creates an external table, external_samples_customer2.  The data is provided by two locations on the same etl server, etl1.  Two instances of gpfdist are running o this server, one on port 9080 , the other on port 9081

.

The customers-100.txt file server by both gpfdist servers share an identical layout but not data- they are partitioned.  The server at port 9081 is provding customer data where the customer id is in the range 1 t0 49.  The 9082 server provides customers 50 through 100.

Errors such as data being too long for the target column are logged in err_customer.  This table is created in the same schema as the target table if it does not already exist.

The "Load from External Table" performs the data load using SQL: