Pentaho Data Integration Steps
Introduction
This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.
NOTE
You may not be viewing the most up-to-date documentation for these steps. View the most recent Pentaho documentation here.
Name | Category | ID | Description | Metadata Java class | ||
---|---|---|---|---|---|---|
Flow | Abort | Abort a transformation | opdts.abort.AbortMeta | |||
Transform | CheckSum | Add a checksum column for each input row | opdts.checksum.CheckSumMeta | |||
Transform | Constant | Add one or more constants to the input rows | opdts.constant.ConstantMeta | |||
Transform | Sequence | Get the next value from an sequence | opdts.addsequence.AddSequenceMeta | |||
Transform | FieldsChangeSequence | Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence. | opdts.fieldschangesequence.FieldsChangeSequenceMeta | |||
Transform | AddXML | Encode several fields into an XML fragment | opdts.addxml.AddXMLMeta | |||
Deprecated | ||||||
Statistics | AnalyticQuery | Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST) | opdts.analyticquery.AnalyticQueryMeta | |||
Flow | Append | Append 2 streams in an ordered way | opdts.append.AppendMeta | |||
Data Mining | Arff Output | Writes data in ARFF format to a file | opdts.append.arff.ArffOutputMeta | |||
Output | AutoDoc | This step automatically generates documentation based on input in the form of a list of transformations and jobs | opdts.autodoc.AutoDocMeta | |||
Deprecated (pre- v.8.0) Input (v.8.0 and after) | AvroInput | Decode binary or Json Avro data from a file or a field | opdts.avroinput.AvroInputMeta | |||
Output | AvroOutput | Encode binary or Json Avro data to a file | opdts.avrooutput.AvroOutputMeta | |||
Flow | BlockUntilStepsFinish | Block this step until selected steps finish. | opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta | |||
Flow | BlockingStep | This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step. | opdts.blockingstep.BlockingStepMeta | |||
Transform | Calculator | Create new fields by performing simple calculations | opdts.calculator.CalculatorMeta | |||
Lookup | DBProc | Get back information by calling a database procedure. | opdts.dbproc.DBProcMeta | |||
BA Server | CallEndpointStep | Calls API endpoints from the BA server within a PDI transformation. | org.pentaho.di.baserver.utils.CallEndpointMeta | |||
Utility | ChangeFileEncoding | Change file encoding and create a new file | opdts.changefileencoding.ChangeFileEncodingMeta | |||
Big Data | CassandraInput | Read from a Cassandra column family | opdts.cassandrainput.CassandraInputMeta | |||
Big Data | CassandraOutput | Write to a Cassandra column family | opdts.cassandraoutput.CassandraOutputMeta | |||
Lookup | ColumnExists | Check if a column exists in a table on a specified connection. | opdts.columnexists.ColumnExistsMeta | |||
Lookup | FileLocked | Check if a file is locked by another process | opdts.filelocked.FileLockedMeta | |||
Lookup | WebServiceAvailable | Check if a webservice is available | opdts.webserviceavailable.WebServiceAvailableMeta | |||
Utility | CloneRow | Clone a row as many times as needed | opdts.clonerow.CloneRowMeta | |||
Transform | ClosureGenerator | This step allows you to generates a closure table using parent-child relationships. | opdts.closure.ClosureGeneratorMeta | |||
Data Warehouse | CombinationLookup | Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields. | opdts.combinationlookup.CombinationLookupMeta | |||
Transform | ConcatFields | The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step. | opdts.concatfields.ConcatFieldsMeta | |||
Job | RowsToResult | Use this step to write rows to the executing job. The information will then be passed to the next entry in this job. | opdts.rowstoresult.RowsToResultMeta | |||
Big Data | CouchDbInput | Retrieves all documents from a given view in a given design document from a given database | opdts.couchdbinput.CouchDbInputMeta | |||
Validation | CreditCardValidator | The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...) | opdts.creditcardvalidator.CreditCardValidatorMeta | |||
Input | CsvInput | Simple CSV file input | opdts.csvinput.CsvInputMeta | |||
Input | DataGrid | Enter rows of static data in a grid, usually for testing, reference or demo purpose | opdts.datagrid.DataGridMeta | |||
Validation | Validator | Validates passing data based on a set of rules | opdts.validator.ValidatorMeta | |||
Lookup | DBJoin | Execute a database query using stream values as parameters | opdts.databasejoin.DatabaseJoinMeta | |||
Lookup | DBLookup | Look up values in a database using field values | opdts.databaselookup.DatabaseLookupMeta | |||
Input | CubeInput | Read rows of data from a data cube. | opdts.cubeinput.CubeInputMeta | |||
Utility | Delay | Output each input row after a delay | opdts.delay.DelayMeta | |||
Output | Delete | Delete data in a database table based upon keys | opdts.delete.DeleteMeta | |||
Flow | DetectEmptyStream | This step will output one empty row if input stream is empty (ie when input stream does not contain any row) | opdts.detectemptystream.DetectEmptyStreamMeta | |||
Data Warehouse | DimensionLookup | Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension. | opdts.dimensionlookup.DimensionLookupMeta | |||
Flow | Dummy | This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams. | opdts.dummytrans.DummyTransMeta | |||
Lookup | DynamicSQLRow | Execute dynamic SQL statement build in a previous field | opdts.dynamicsqlrow.DynamicSQLRowMeta | |||
Utility | TypeExitEdi2XmlStep | Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds) | opdts.edi2xml.Edi2XmlMeta | |||
Bulk loading | ElasticSearchBulk | Performs bulk inserts into ElasticSearch | opdts.elasticsearchbulk.ElasticSearchBulkMeta | |||
Input | MailInput | Read POP3/IMAP server and retrieve messages | opdts.mailinput.MailInputMeta | |||
Input | ShapeFileReader | Reads shape file data from an ESRI shape file and linked DBF file | org.pentaho.di.shapefilereader.ShapeFileReaderMeta | |||
Flow | MetaInject | This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions. | opdts.metainject.MetaInjectMeta | |||
Deprecated | ||||||
Utility | ExecProcess | Execute a process and return the result | opdts.execprocess.ExecProcessMeta | |||
Scripting | ExecSQLRow | Execute SQL script extracted from a field created in a previous step. | opdts.execsqlrow.ExecSQLRowMeta | |||
Scripting | ExecSQL | Execute an SQL script, optionally parameterized using input rows | opdts.sql.ExecSQLMeta | |||
Lookup | FileExists | Check if a file exists | opdts.fileexists.FileExistsMeta | |||
Flow | FilterRows | Filter rows using simple equations | opdts.filterrows.FilterRowsMeta | |||
Input | FixedInput | Fixed file input | opdts.fixedinput.FixedInputMeta | |||
Scripting | Formula | Calculate a formula using Pentaho's libformula | opdts.formula.FormulaMeta | |||
Lookup | FuzzyMatch | Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream. | opdts.fuzzymatch.FuzzyMatchMeta | |||
Input | RandomCCNumberGenerator | Generate random valide (luhn check) credit card numbers | opdts.randomccnumber.RandomCCNumberGeneratorMeta | |||
Input | RandomValue | Generate random value | opdts.randomvalue.RandomValueMeta | |||
Input | RowGenerator | Generate a number of empty or equal rows. | opdts.rowgenerator.RowGeneratorMeta | |||
Input | getXMLData | Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field. | opdts.getxmldata.GetXMLDataMeta | |||
Input | GetFileNames | Get file names from the operating system and send them to the next step. | opdts.getfilenames.GetFileNamesMeta | |||
Job | FilesFromResult | This step allows you to read filenames used or generated in a previous entry in a job. | opdts.filesfromresult.FilesFromResultMeta | |||
Input | GetFilesRowsCount | Get Files Rows Count | opdts.getfilesrowscount.GetFilesRowsCountMeta | |||
Transform | GetSlaveSequence | Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file. | opdts.getslavesequence.GetSlaveSequenceMeta | |||
Deprecated | ||||||
Input | GetRepositoryNames | Lists detailed information about transformations and/or jobs in a repository | opdts.getrepositorynames.GetRepositoryNamesMeta | |||
Job | RowsFromResult | This allows you to read rows from a previous entry in a job | opdts.rowsfromresult.RowsFromResultMeta | |||
BA Server | GetSessionVariableStep | Retrieves the value of a session variable | org.pentaho.di.baserver.utils.GetSessionVariableMeta | |||
Input | GetSubFolders | Read a parent folder and return all subfolders | opdts.getsubfolders.GetSubFoldersMeta | |||
Input | SystemInfo | Get information from the system like system date, arguments, etc. | opdts.systemdata.SystemDataMeta | |||
Input | GetTableNames | Get table names from database connection and send them to the next step | opdts.gettablenames.GetTableNamesMeta | |||
Job | GetVariable | Determine the values of certain (environment or Kettle) variables and put them in field values. | opdts.getvariable.GetVariableMeta | |||
Input | TypeExitGoogleAnalyticsInputStep | Fetches data from google analytics account | opdts.googleanalytics.GaInputStepMeta | |||
Input | ||||||
Deprecated | GPBulkLoader | Greenplum Bulk Loader | opdts.gpbulkloader.GPBulkLoaderMeta | |||
Bulk loading | GPLoad | Greenplum Load | ||||
Statistics | GroupBy | Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. | opdts.groupby.GroupByMeta | |||
Input | ParallelGzipCsvInput | Parallel GZIP CSV file input reader | opdts.parallelgzipcsv.ParGzipCsvInputMeta | |||
Big Data | HadoopFileInputPlugin | Read data from a variety of different text-file types stored on a Hadoop cluster | opdts.hadoopfileinput.HadoopFileInputMeta | |||
Big Data | HadoopFileOutputPlugin | Write data to a variety of different text-file types stored on a Hadoop cluster | opdts.hadoopfileoutput.HadoopFileOutputMeta | |||
Big Data | HbaseInput | Read from an HBase column family | opdts.hbaseinput.HBaseInputMeta | |||
Big Data | HbaseOutput | Write to an HBase column family | opdts.hbaseoutput.HBaseOutputMeta | |||
Big Data | HBaseRowDecoder | Decodes an incoming key and HBase result object according to a mapping | opdts.hbaserowdecoder.HBaseRowDecoderMeta | |||
Input | HL7Input | Read data from HL7 data streams. | opdt.hl7.plugins.hl7input | |||
Lookup | HTTP | Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.http.HTTPMeta | |||
Lookup | HTTPPOST | Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.httppost.HTTPPOSTMeta | |||
Deprecated | MQInput | Receive messages from any IBM Websphere MQ Server | ||||
Deprecated | MQOutput | Send messages to any IBM Websphere MQ Server | ||||
Flow | DetectLastRow | Last row will be marked | opdts.detectlastrow.DetectLastRowMeta | |||
Utility | IfNull | Sets a field value to a constant if it is null. | opdts.ifnull.IfNullMeta | |||
Bulk loading | InfobrightOutput | Load data to an Infobright database table | opdts.infobrightoutput.InfobrightLoaderMeta | |||
Bulk loading | VectorWiseBulkLoader | This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command. | opdts.ivwloader.IngresVectorwiseLoaderMeta | |||