PLEASE NOTE: This documentation applies to Pentaho 8.1 and earlier. For Pentaho 8.2 and later, see JSON Input on the Pentaho Enterprise Edition documentation site.
Description
The JSON Input step extracts relevant portions out of JSON structures, files or incoming fields, and outputs rows.
Options
File Tab
The File tab is where you enter basic connection information for accessing a resource.
Option | Definition |
---|---|
Step name | Name of this step as it appears in the transformation workspace |
Source is from a previous step | Retrieves the source from a previously defined field |
Select field | The field name to use as a source from a previous step |
Use field as file names | Indicates source is a filename |
Read source as URL | Indicates a source should be accessed as a URL |
Do not pass field downstream | The source field will be removed from the output stream. This improves performance and memory utilization with large JSON fields. |
File or directory | Indicates the location of the source if the source is not defined in a field |
Regular expression | All filenames that match this regular expression are selected if a directory is specified |
Exclude regular expression | All filenames that match this regular expression are excluded if a directory is specified |
Show filename | Displays the file names of the connected source |
Content Tab
The Content tab enables you to configure which data to collect.
Option | Definition |
---|---|
Ignore empty file | When checked, indicates to skip empty files---when unchecked, instances of empty files causes the process fail and stop |
Do not raise an error if no files | When unchecked, causes the transformation to fail when there is no file to process---then checked, avoids failure when there is no file to process |
Ignore missing path | When unchecked, causes the transformation to fail when the JSON path is missing---then checked, avoids failure when there is no JSON path |
Limit | Sets a limit on the number of records generated from the step when set greater than zero |
Include filename in output | Adds a string field with the filename in the result |
Rownum in output | Adds an integer field with the row number in the result |
Add files to result filesname | If checked, adds processed files to the result file list |
Fields Tab
The Fields tab displays field definitions to extract values from the JSON structure. This step uses JSONPath to extract fields from JSON structures.
Additional Output Fields Tab
The Additional output fields tab enables you to provide additional information about the file to process.
Examples
Pentaho Data Integration ships with sample transformations you can run to demonstrate step functionality. To open a sample transformation, from within the Spoon interface, go to the File menu and select Open. Browse to pentaho\design-tools\data-integration\samples\transformations, then select the sample transformation you want to run. Within this directory are several sample transformations to demonstrate the functionality of this step.
JsonInput - read a dynamic file.ktr
JsonInput - read a file.ktr
JsonInput - read incoming stream.ktr
Metadata Injection Support (7.x and later)
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.