RSS Input
Description
This step imports data from an RSS or Atom feed. RSS versions 0.91, 0.92, 1.0, 2.0, and Atom versions 0.3 and 1.0 are supported.
General Tab
The General tab defines which RSS/Atom URLs you want to use, and optionally which fields contain the URLs.
Option |
Description |
---|---|
Step name |
The name of this step in the transformation workspace. |
URL is defined in a field |
If checked, you must specify which field to retrieve the URL from. |
URL Field |
If the previous option is checked, this is where you specify the URL field. |
URL list |
A list of RSS/Atom URLs you want to pull article data from. |
Content tab
The content tab contains options for limiting input and changing output.
Option |
Description |
---|---|
Read articles from |
Specifies a date in yyyy-MM-dd HH:mm:ss format. Only articles published after this date will be read. |
Max number of articles |
Specifies a static number of articles to retrieve, starting at the oldest. |
Include URL in output? |
If checked, specify a field name to pass the URL to. |
Include rownum in output? |
If checked, specify a field name to pass the row number to. |
Fields tab
The Fields tab defines properties for the exported fields.
Option |
Description |
---|---|
Name |
The name of the field. |
Column |
The RSS feed column that references the field. |
Type |
The field's data type; String, Date or Number. |
Format |
The format mask (number type). |
Length |
The length option depends on the field type. Number: total number of significant figures in a number; String: total length of a string; Date: determines how much of the date string is printed or recorded. |
Precision |
The precision option depends on the field type, but only Number is supported; it returns the number of floating point digits. |
Currency |
Symbol used to represent currencies. |
Decimal |
A decimal point; this is either a dot or a comma. |
Grouping |
A method of separating units of thousands in numbers of four digits or larger. This is either a dot or a comma. |
Trim type |
Truncates the field (left, right, both) before processing. Useful for fields that have no static length. |
Repeat |
If set to Y, will repeat this value if the next field is empty. |
Notes on Error Handling
When error handling is turned on for the transformation that includes this step, the full exception message, the field number on which the error occurred, and one or more of the following codes will be sent in an error row to the error stream:
- UnknownError: an unexpected error. Check the "Error description" field for more details.
- XMLError: typically this means that the specified file is not XML.
- FileNotFound: an HTTP 404 error.
- UnknownHost: means that the domain name cannot be resolved; may be caused by network outage.
- TransferError: any non-404 HTTP server error code (401, 403, 500, 502, etc.) can cause this.
- BadURL: means that the URL cannot be understood. It may be missing a protocol or use an unrecognized protocol.
- BadRSSFormat: typically means that the file is valid XML, but is not a supported RSS or Atom doc type.
Note: To see the full stack trace from a handled error, turn on detailed logging.