Pentaho Data Integration Steps

Pentaho Data Integration Steps

Introduction

This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.

NOTE

You may not be viewing the most up-to-date documentation for these steps. View the most recent Pentaho documentation here.

Name

Category

ID

Description

Metadata Java class
opdts = org.pentaho.di.trans.steps

Name

Category

ID

Description

Metadata Java class
opdts = org.pentaho.di.trans.steps

Abort

Flow

Abort

Abort a transformation

opdts.abort.AbortMeta

Add a checksum

Transform

CheckSum

Add a checksum column for each input row

opdts.checksum.CheckSumMeta

Add constants

Transform

Constant

Add one or more constants to the input rows

opdts.constant.ConstantMeta

Add sequence

Transform

Sequence

Get the next value from an sequence

opdts.addsequence.AddSequenceMeta

Add value fields changing sequence

Transform

FieldsChangeSequence

Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence.

opdts.fieldschangesequence.FieldsChangeSequenceMeta

Add XML

Transform

AddXML

Encode several fields into an XML fragment

opdts.addxml.AddXMLMeta

Aggregate Rows

Deprecated

 

 

 

Analytic Query

Statistics

AnalyticQuery

Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST)

opdts.analyticquery.AnalyticQueryMeta

Append streams

Flow

Append

Append 2 streams in an ordered way

opdts.append.AppendMeta

ARFF Output

Data Mining

Arff Output

Writes data in ARFF format to a file

opdts.append.arff.ArffOutputMeta

Automatic Documentation Output

Output

AutoDoc

This step automatically generates documentation based on input in the form of a list of transformations and jobs

opdts.autodoc.AutoDocMeta

Avro Input (Deprecated)

Deprecated (pre- v.8.0)

Input (v.8.0 and after)

AvroInput

Decode binary or Json Avro data from a file or a field

opdts.avroinput.AvroInputMeta

Avro Output

Output

AvroOutput

Encode binary or Json Avro data to a file

opdts.avrooutput.AvroOutputMeta

 

 

Block this step until steps finish

Flow

BlockUntilStepsFinish

Block this step until selected steps finish.

opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta

Blocking Step

Flow

BlockingStep

This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step.

opdts.blockingstep.BlockingStepMeta

Calculator

Transform

Calculator

Create new fields by performing simple calculations

opdts.calculator.CalculatorMeta

Call DB Procedure

Lookup

DBProc

Get back information by calling a database procedure.

opdts.dbproc.DBProcMeta

Call Endpoint

BA Server

CallEndpointStep

Calls API endpoints from the BA server within a PDI transformation.

org.pentaho.di.baserver.utils.CallEndpointMeta

Change file encoding

Utility

ChangeFileEncoding

Change file encoding and create a new file

opdts.changefileencoding.ChangeFileEncodingMeta

Cassandra input

Big Data

CassandraInput

Read from a Cassandra column family

opdts.cassandrainput.CassandraInputMeta

Cassandra output

Big Data

CassandraOutput

Write to a Cassandra column family

opdts.cassandraoutput.CassandraOutputMeta

Check if a column exists

Lookup

ColumnExists

Check if a column exists in a table on a specified connection.

opdts.columnexists.ColumnExistsMeta

Check if file is locked

Lookup

FileLocked

Check if a file is locked by another process

opdts.filelocked.FileLockedMeta

Check if webservice is available

Lookup

WebServiceAvailable

Check if a webservice is available

opdts.webserviceavailable.WebServiceAvailableMeta

Clone row

Utility

CloneRow

Clone a row as many times as needed

opdts.clonerow.CloneRowMeta

Closure Generator

Transform

ClosureGenerator

This step allows you to generates a closure table using parent-child relationships.

opdts.closure.ClosureGeneratorMeta

Combination lookup/update

Data Warehouse

CombinationLookup

Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields.

opdts.combinationlookup.CombinationLookupMeta

Concat Fields

Transform

ConcatFields

The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.

opdts.concatfields.ConcatFieldsMeta

Copy rows to result

Job

RowsToResult

Use this step to write rows to the executing job. The information will then be passed to the next entry in this job.

opdts.rowstoresult.RowsToResultMeta

CouchDB Input

Big Data

CouchDbInput

Retrieves all documents from a given view in a given design document from a given database

opdts.couchdbinput.CouchDbInputMeta

Credit card validator

Validation

CreditCardValidator

The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...)

opdts.creditcardvalidator.CreditCardValidatorMeta

CSV file input

Input

CsvInput

Simple CSV file input

opdts.csvinput.CsvInputMeta

Data Grid

Input

DataGrid

Enter rows of static data in a grid, usually for testing, reference or demo purpose

opdts.datagrid.DataGridMeta

Data Validator

Validation

Validator

Validates passing data based on a set of rules

opdts.validator.ValidatorMeta

Database join

Lookup

DBJoin

Execute a database query using stream values as parameters

opdts.databasejoin.DatabaseJoinMeta

Database lookup

Lookup

DBLookup

Look up values in a database using field values

opdts.databaselookup.DatabaseLookupMeta

De-serialize from file

Input

CubeInput

Read rows of data from a data cube.

opdts.cubeinput.CubeInputMeta

Delay row

Utility

Delay

Output each input row after a delay

opdts.delay.DelayMeta

Delete

Output

Delete

Delete data in a database table based upon keys

opdts.delete.DeleteMeta

Detect empty stream

Flow

DetectEmptyStream

This step will output one empty row if input stream is empty (ie when input stream does not contain any row)

opdts.detectemptystream.DetectEmptyStreamMeta

Dimension lookup/update

Data Warehouse

DimensionLookup

Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension.

opdts.dimensionlookup.DimensionLookupMeta

Dummy (do nothing)

Flow

Dummy

This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams.

opdts.dummytrans.DummyTransMeta

Dynamic SQL row

Lookup

DynamicSQLRow

Execute dynamic SQL statement build in a previous field

opdts.dynamicsqlrow.DynamicSQLRowMeta

Edi to XML

Utility

TypeExitEdi2XmlStep

Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds)

opdts.edi2xml.Edi2XmlMeta

ElasticSearch Bulk Insert

Bulk loading

ElasticSearchBulk

Performs bulk inserts into ElasticSearch

opdts.elasticsearchbulk.ElasticSearchBulkMeta

Email messages input

Input

MailInput

Read POP3/IMAP server and retrieve messages

opdts.mailinput.MailInputMeta

ESRI Shapefile Reader

Input

ShapeFileReader

Reads shape file data from an ESRI shape file and linked DBF file

org.pentaho.di.shapefilereader.ShapeFileReaderMeta

ETL Metadata Injection

Flow

MetaInject

This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions.

opdts.metainject.MetaInjectMeta

Example Step (Deprecated)

Deprecated

 

 

 

Execute a process

Utility

ExecProcess

Execute a process and return the result

opdts.execprocess.ExecProcessMeta

Execute row SQL script

Scripting

ExecSQLRow

Execute SQL script extracted from a field created in a previous step.

opdts.execsqlrow.ExecSQLRowMeta

Execute SQL script

Scripting

ExecSQL

Execute an SQL script, optionally parameterized using input rows

opdts.sql.ExecSQLMeta

File exists

Lookup

FileExists

Check if a file exists

opdts.fileexists.FileExistsMeta

Filter Rows

Flow

FilterRows

Filter rows using simple equations

opdts.filterrows.FilterRowsMeta

Fixed file input

Input

FixedInput

Fixed file input

opdts.fixedinput.FixedInputMeta

Formula

Scripting

Formula

Calculate a formula using Pentaho's libformula

opdts.formula.FormulaMeta

Fuzzy match

Lookup

FuzzyMatch

Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream.

opdts.fuzzymatch.FuzzyMatchMeta

Generate random credit card numbers

Input

RandomCCNumberGenerator

Generate random valide (luhn check) credit card numbers

opdts.randomccnumber.RandomCCNumberGeneratorMeta

Generate random value

Input

RandomValue

Generate random value

opdts.randomvalue.RandomValueMeta

Generate Rows

Input

RowGenerator

Generate a number of empty or equal rows.

opdts.rowgenerator.RowGeneratorMeta