Streaming XML Input
This step is deprecated. Please use the Get Data From XML or XML Input Stream (StAX) steps.
Description
The purpose of this step is to provide value parsing. This step is based on SAX parser to provide better performances with larger files. It is very similar to Xml Input, there are only differences in content and field tabs. The following sections describe in detail the properties and settings available for the Streaming XML input step.
File Tab
Option |
Description |
---|---|
Step name |
Name of the step. This name has to be unique in a single transformation. |
File or directory |
This field specifies the location and/or name of the input text file.
|
Regular expression |
Specify the regular expression you want to use to select the files in the directory specified in the previous option. |
Selected Files |
This table contains a list of selected files (or wildcard selections) along with a property specifying if file is required or not. If a file is required and it isn't found, an error is generated. Otherwise, the filename is simply skipped. |
Show filenames(s)... |
Displays a list of all files that will be loaded based on the current selected file definitions. |
Content
Option |
Description |
---|---|
Include filename in output & fieldname |
Check this option if you want to have the name of the XML file to which the row belongs in the output stream. You can specify the name of the field where the filename will end up in. |
Rownum in output & fieldname |
Check this option if you want to have a row number (starts at 1) in the output stream. You can specify the name where the integer will end up in. You can specify the maximum number of rows to read here. Specify the path by way of elements to the repeating part of the XML file. The element column is used to specify the element and position as follows:
|
Fields
Option |
Description |
---|---|
Name |
Name of the field |
Type |
Type of the field can be either String, Date or Number |
Format |
See Number Formats for a complete description of format symbols. |
Length |
For Number: Total number of significant figures in a number; |
Precision |
For Number: Number of floating point digits; |
Currency |
Used to interpret numbers like $10,000.00 or E5.000,00 |
Decimal |
A decimal point can be a "." (10;000.00) or "," (5.000,00) |
Group |
A grouping can be a dot "," (10;000.00) or "." (5.000,00) |
Trim type |
type trim this field (left, right, both) before processing |
Null if |
treat this value as NULL |
Repeat |
Y/N: If the corresponding value in this row is empty: repeat the one from the last time it was not empty |
Position |
Position: The position of the XML element or attribute. You use the following syntax to specify the position of an element:
|
Streaming XML Example
Consider the following XML:
Suppose that we are interested in cars we must specify the location of the repeating element like this:
Now lets see the fields, we have different "property" elements that are differentiated by their "name" attribute, we are about to have the following fields "brand", "type" and "power" according to the "name" attribute.
For this, we must specify the association between "property" and "name" in the first grid.
Click Get Fields to retrieve the right fields including properties.
Let us now try leaving the new grid empty.
You can see that in this case the step is working like the original XMLInput and retrieve fields by their position. In this case, it is better to use value parsing, cause you get the right field names, and missing elements will not corrupt results (for example missing <property name="power"> </property> in some rows).