Description
The Avro Input step decodes binary or JSON Avro data and extracts fields from the structure it defines, either from flat files or incoming fields.
Options
Main Configuration
Option |
Definition |
Example |
||
---|---|---|---|---|
Step name |
The name of this step as it appears in the transformation workspace. |
|
||
Location |
File system type of where the Avro output data will be written |
Local, Hadoop Cluster, S3, HDFS, MapRFS |
||
File name |
The fully qualified URL where the Avro input data will be read from. URL will be of different format depending on file system type (Location field). |
|
|
Fields Tab
The Fields tab defines the fields that will make up the Avro schema that will be created by this step. These fields can be defined manually. Or users can click "Get Fields" button to populate these fields from the incoming PDI stream.
Option |
Definition |
Editable |
Example(s) |
---|---|---|---|
Avro path |
The name of the field as it will appear in the Avro Schema and avro file |
No |
|
Name |
The name of the PDI field |
Yes |
User is able to change the name of the field for further processing |
Type |
The data type of the field |
Yes |
String, Integer, etc. The types can be changed by the user, if the user does not think the column types were determined correctly. |
Get Fields (button) |
If user clicks this button, the fields table will be populated with fields from the incoming PDI stream. |
|
|
Note
The default format mask for date type is yyyy-MM-dd. The default format mask for timestamp type is yyyy-MM-dd HH:mm:ss.SSS. If the data is stored is any other format and was stored as a string data type, it will not be possible to retrieve the column data. In that case, null will be returned for that column.
Schema Tab
The schema tab defines the location of the Avro schema file that will be used. Reference: https://avro.apache.org/docs/1.8.1/spec.html
Option |
Definition |
Example(s) |
---|---|---|
File Name |
The fully qualified URL of the Avro schema. URL will be of different format depending on file system type (Location field). The shema file name is not required here. If the user does not provide the schema filename, the fields will be retrieved from the embedded schema in the avro data file. |
Metadata Injection Support (7.x and later)
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.