Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Option

Description

File type

Can be either CSV or Fixed length. Based on this selection, Spoon will launch a different helper GUI when you press the "get fields" button in the last "fields" tab.

Separator

One or more characters that separate the fields in a single line of text. Typically this is ; or a tab. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].

Enclosure

Some fields can be enclosed by a pair of strings to allow separator characters in fields. The enclosure string is optional. If you use repeat an enclosures allow text line 'Not the nine o''clock news.'. With ' the enclosure string, this gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].

Allow breaks in enclosed fields?

Not implemented, yet. If you need this feature to get implemented, please file a JIRA feature request or
if are a subscription customer, please contact your Customer Support at Pentaho.
Also see PDI-388338.

Escape

Specify an escape character (or characters) if you have these types of characters in your data. If you have \ as an escape character, the text 'Not the nine o\'clock news' (with ' the enclosure) gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].

Header & number of header lines

Enable if your text file has a header row (first lines in the file); you can specify the number of times the header lines appears.

Footer & number of footer lines

Enable if your text file has a footer row (last lines in the file); you can specify the number of times the footer row appears.

Wrapped lines and number of wraps

Use if you deal with data lines that have wrapped beyond a specific page limit; note that headers and footers are never considered wrapped

Paged layout and page size and doc header

Use these options as a last resort when dealing with texts meant for printing on a line printer; use the number of document header lines to skip introductory texts and the number of lines per page to position the data lines

Compression

Enable if your text file is placed in a Zip or GZip archive.Note: At the moment, only the first file in the archive is read.

No empty rows

Do not send empty rows to the next steps.

Include filename in output

Enable if you want the filename to be part of the output

Filename field name

Name of the field that contains the filename

Rownum in output?

Enable if you want the row number to be part of the output

Row number field name

Name of the field that contains the row number

Rownum by file?

Allows the row number to be reset per file

Format

Can be either DOS, UNIX or mixed. UNIX files have lines that are terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done.

Encoding

Specify the text file encoding to use; leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.

Limit

Sets the number of lines that is read from the file; 0 means read all lines.

Be lenient when parsing dates?

Disable if you want strict parsing of data fields; if case-lenient parsing is enabled, dates like Jan 32nd will become Feb 1st.

The date format Locale

This locale is used to parse dates that have been written in full such as "February 2nd, 2006;" parsing this date on a system running in the French (fr_FR) locale would not work because February is called FĂ©vrier in that locale.

Add filenames to result

Adds the filenames to the internal filename result set. This internal result set can be used later on, e.g. to process all read files.

...