XML Input
This step is deprecated. Please use the Get Data From XML or XML Input Stream (StAX) steps.
Description
This step allows you to read information stored in XML files. The following sections describe the interface for defining the filenames you want to read from, the repeating part of the data part of the XML file and the fields to retrieve.
Note: You specify the fields by the path to the Element or Attribute and by entering conversion masks, data types and other meta-data.
File Tab
The File tab is where you define the location of the Excel files from which you want to read. The table below contains available options:
Option |
Description |
---|---|
Step name |
Name of the step; the name has to be unique in a single transformation |
File or directory |
Specifies the location and/or name of the input text file |
Note: Click Add to add the file/directory/wildcard combination to the list of selected files (grid) below.
Regular expression |
Specifies the regular expression you want to use to select the files in the directory specified in the previous option |
Selected files |
This table contains a list of selected files (or wildcard selections) and a property specifying if file is required or not. If a file is required and it is not found, an error is generated, otherwise, the filename is skipped. |
Show Filename(s) |
This option shows a list of the files the will be generated. Note: This is a simulation and sometimes depends on the number of rows in each file, for example. |
Content
The content tab contains the following options for describing the content being read:
Option |
Description |
---|---|
Include filename in output & fieldname |
Enable if you want to have the name of the XML file to which the row belongs in the output stream. You can specify the name of the field where the file name will end up. |
Rownum in output & fieldname |
Enable if you want to have a row number (starts at 1) in the output stream. You can specify the name where the integer will end up in. |
Limit |
Specifies the maximum number of rows to read here (optional) |
Nr of header rows to skip |
Specifies the number of rows to skip, from the start of an XML document, before starting to process. |
Location |
Specifies the path by way of elements to the repeating part of the XML file. For example, if you are reading rows from this XML [file: <Rows> <Row> <Field1>...</Field1> ... </Row> ... </Rows> Then you set the location to Rows, Row |
Note: You can also set the root (Rows) as a repeating element location. The output will then contain 1 (one) row.
Fields
The Fields tab allows you to define properties for the location and format of the fields being read from the XML document. The table below describes each of the options for configuring the field properties:
Option |
Description |
---|---|
Name |
The name of the field |
Type |
Type of the field can be either String, Date or Number. |
Format |
The format mask to convert with. See Number Formats for a complete description of format specifiers. |
Length |
The length option depends on the field type as follows"
|
Precision |
The precision option depends on the field type as follows:
|
Currency |
Symbol used to represent currencies like $10,000.00 or E5.000,00 |
Decimal |
A decimal point can be a "." (10,000.00) or "," (5.000,00) |
Group |
A grouping can be a "," (10,000.00) or "." (5.000,00) |
Trim |
The trimming method to apply on the string found in |
type |
the XML |
Repeat |
Enable if you want to repeat empty values with the corresponding value from the previous row. |
Position |
The position of the XML element or attribute. You use the following syntax to specify the position of an element, for example: |
Note: Click Get Fields to auto-generate all the possible positions in the XML file.
Note: Pentaho has added support for XML documents where all the information is stored in the Repeating (or Root) element. The special R= locator was added to allow you to grab this information. Click Get fields to find information if it is available.
FAQ
Can you change the XML input step to process my file?
Q: I have to process an XML file which currently can't be processed by KETTLE, e.g. there's one optional field which depends on the value of an element and that should also be included as a field in a row, ... Can you build this functionality in in the XML input step?
A: First of all it would depend what functionality you need. If the functionality is generally useful it can be built in. If it would only be useful for you it wouldn't make sense to build it in.
As alternative solutions: consider processing the XML file via a Javascript step, or if what is required is very complex consider writing your own PDI step which you maintain yourself (outside of the PDI distribution).