S3 File Output

(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

This step exports data to a text file on an Amazon Simple Storage Service (S3) account.

Options

File Tab

The File tab defines basic file properties for this step's output.

Option

Description

Step name

The name of this step in the transformation workspace.

Access Key

S3 Access Key (optional, see Filename)

Secret Key

S3 Secret Key (optional, see Filename)

Filename

The name of the output text file. A filename of a file in S3 cloud follows the schema: s3://(access_key):(secret_key)@s3/(s3_bucket_name)/(absolute_path_to_file) (s3://%28access_key%29:%28secret_key%29@s3/%28s3_bucket_name%29/%28absolute_path_to_file%29)
Access Key and Secret Key fields are honored too if specified and used for connecting to S3 giving a user more secure way to build S3 connection rather than explicitly specifying Access/Secret keys in plain text and logs.

Do not create file at start

When this check box is selected, the file will be created at the end of processing. When cleared, the file will be created at the start of processing.

Accept file name from field?

When checked, enables you to specify file names in a field in the input stream.

File name field

When the Accept file name from field option is checked, specify the field that will contain the filenames.

Extension

The three-letter file extension to append to the file name.

Include stepnr in filename

If you run the step in multiple copies (launching several copies of a step), the copy number is included in the file name, before the extension. (_0).

Include partition nr in file name?

Includes the data partition number in the file name.

Include date in file name

Includes the system date in the filename (_20101231).

Include time in file name

Includes the system time (24-hour format) in the filename (_235959).

Specify Date time format

When checked, enables you to specify the Date time format.

Date time format

The Date time format to use that is added to the filename.

Show filename(s)

Displays a list of the files that will be generated. This is a simulation and depends on the number of rows that will go into each file.

Add filenames to result

When this check box is selected, file names will be added to the output file.

Content Tab

The content tab contains options for describing the file's content.

Option

Description

Append

When checked, appends lines to the end of the file.

Separator

Specifies the character that separates the fields in a single line of text; typically this is semicolon or a tab.

Enclosure

Optionally specifies the character that defines a block of text that is allowed to have separator characters without causing separation. Typically a single or double quote.

Force the enclosure around fields?

Forces all field names to be enclosed with the character specified in the Enclosure property above.

Header

Enable this option if you want the text file to have a header row (first line in the file).

Footer

Enable this option if you want the text file to have a footer row (last line in the file).

Format

Specifies either DOS or UNIX file formats. UNIX files have lines that are separated by line feeds, DOS files have lines that are separated by carriage returns and line feeds.

Compression

Specifies the type of compression to use on the output file -- either zip or gzip. Only one file is placed in a single archive.

Encoding

Specifies the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.

Right pad fields

When checked, fields will be right-padded to their defined width.

Fast data dump (no formatting)

Improves the performance when dumping large amounts of data to a text file by not including any formatting information.

Split every ... rows

If the number N is larger than zero, splits the resulting text file into multiple parts of N rows.

Add Ending line of file

Enables you to specify an alternate ending row to the output file.

Fields Tab

The Fields tab defines properties for the exported fields.

Option

Description

Name

The name of the field.

Type

The field's data type; String, Date or Number.

Format

The format mask (number type).

Length

The length option depends on the field type. Number: total number of significant figures in a number; String: total length of a string; Date: determines how much of the date string is printed or recorded.

Precision

The precision option depends on the field type, but only Number is supported; it returns the number of floating point digits.

Currency

Symbol used to represent currencies.

Decimal

A decimal point; this is either a dot or a comma.

Group

A method of separating units of thousands in numbers of four digits or larger. This is either a dot or a comma.

Trim type

Truncates the field (left, right, both) before processing. Useful for fields that have no static length.

Null

Inserts the specified string into the text file if the field value is null.

Buttons

Get

Retrieves a list of fields from the input stream.

Minimal width

Minimizes field width by removing unnecessary characters (such as superfluous zeros and spaces). If set, string fields will no longer be padded to their specified length.

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.