Text File Output (Deprecated)

(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

The Text file output step is used to export data to text file format. This is commonly used to generate Comma Separated Values (CSV files) that can be read by spreadsheet applications.  It is also possible to generate fixed width files by setting lengths on the fields in the fields tab.

Note: It is not possible to execute this step on parallel to write to the same file. In this case you need to set the option "Include stepnr in filename" and merge the files later on.

Options

File Tab

The File tab is where you define basic properties about the file being created, such as:

Option

Description

Step name

Name of the step. Note: This name has to be unique in a single transformation.

Filename

This field specifies the filename and location of the output text file.
Note: Don't add the extension in this field, when the date and time should optionally be appended and afterwards the extension option (see Extension option field below). When the Extension option is left blank, add it to this field.

Run this as a command instead?

Enable to "pipe" the results into the command or script you specify. It can also be used for some database bulk loaders that can process the input from stdin. In this case set the filename to the script or binary to execute.

Pass output to servlet

Enable this option to return the data via a web service instead writing into a file (see PDI data over web service)

Create parent folder

Enable to create the parent folder

Do not create file at start

Enable to avoid empty files when no rows are getting processed.

Accept file name from field?

Enable to specify the file name(s) in a field in the input stream

File name field

When the previous option is enabled, you can specify the field that will contain the filename(s) at runtime.

Extension

Adds a point and the extension to the end of the filename. (.txt)

Include stepnr in filename

If you run the step in multiple copies (Launching several copies of a step), the copy number is included in the filename, before the extension. (_0).

Include partition nr in filename?

Includes the data partition number in the filename.

Include date in filename

Includes the system date in the filename. (default _20041231).

Include time in filename

Includes the system time in the filename. (default _235959).

Specify Date time format

Enable to specify the date time format

Date time format

Chose the date time format to append to the filename

Add file name to rest

This adds all processed filenames to the internal result filename set to allow for further processing.

Show filename(s)

This option shows a list of the files that will be generated.
Note: This is a simulation and among others depends on the number of rows that will go into each file.

Content Tab

The content tab contains the following options for describing the content being read:

Option

Description

Append

Check this to append lines to the end of the specified file.

Separator

Specify the character that separates the fields in a single line of text. Typically this is ; or a tab.

Enclosure

A pair of strings can enclose some fields. This allows separator or enclosure characters in fields. The enclosure string is optional.

Force the enclosure around fields?

This option forces all fields of an incoming string type (independent of the eventually changed field type within the Text File Output field definition) to be enclosed with the character specified in the Enclosure property above.

Disable the enclosure fix?

This is for backward compatibility reasons (since version 4.1) related to enclosures and separators. The logic since version 4.1 is: When a string field contains an enclosure it gets enclosed and the enclose itself gets escaped. When a string field contains a separator, it gets enclosed. Check this option, if this logic is not wanted. It has also an extra performance burden since the strings are scanned for enclosures and separators. So when you are sure there is no such logic needed since your strings don't have these characters in there and you want to improve performance, un-check this option.

Header

Enable this option if you want the text file to have a header row. (First line in the file).

Footer

Enable this option if you want the text file to have a footer row. (Last line in the file). Note: Be careful to enable this option when in Append mode since it is not possible to strip footers from the file contents before appending new rows. There are use cases where this option is wanted, e.g. to have a footer after each run of a transformation to separate sections within the file.

Format

This can be either DOS or UNIX. UNIX files have lines are separated by linefeeds. DOS files have lines separated by carriage returns and line feeds.
Since 4.2 the options are: CR+LF terminated (Windows, DOS) / LF terminated (Unix) / CR terminated / No new-line terminator

Encoding

Specify the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode specify UTF-8 or UTF-16. On first use, Spoon will search your system for available encodings.

Compression

Allows you to specify the type of compression, .zip or .gzip to use when compressing the output. Note: Only one file is placed in a single archive.

Right pad fields

Add spaces to the end of the fields (or remove characters at the end) until they have the specified length.

Fast data dump (no formatting)

Improves the performance when dumping large amounts of data to a text file by not including any formatting information.

Split every ... rows

If this number N is larger than zero, split the resulting text-file into multiple parts of N rows.

Add Ending line of file

Allows you to specify an alternate ending row to the output file.

Fields Tab

The fields tab is where you define properties for the fields being exported. The table below describes each of the options for configuring the field properties:

Option

Description

Name

The name of the field.

Type

Type of the field can be either String, Date or Number.

Format

The format mask to convert with. See Number Formats for a complete description of format symbols.

Length

The length option depends on the field type follows:

  • Number - Total number of significant figures in a number
  • String - total length of string
  • Date - length of printed output of the string (e.g. 4 only gives back year)

Precision

The precision option depends on the field type as follows:

  • Number - Number of floating point digits
  • String - unused
  • Date - unused

Currency

Symbol used to represent currencies like $10,000.00 or E5.000,00

Decimal

A decimal point can be a "." (10,000.00) or "," (5.000,00)

Group

A grouping can be a "," (10,000.00) or "." (5.000,00)

Trim type

The trimming method to apply on the string. Note: Trimming only works when there is no field length given.

Null

If the value of the field is null, insert this string into the textfile

Get

Click to retrieve the list of fields from the input fields stream(s)

Minimal width

Alter the options in the fields tab in such a way that the resulting width of lines in the text file is minimal. So instead of save 0000001, we write 1, etc. String fields will no longer be padded to their specified length.

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.