Pentaho Data Integration Steps
Introduction
This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.
NOTE
You may not be viewing the most up-to-date documentation for these steps. View the most recent Pentaho documentation here.
Name | Category | ID | Description | Metadata Java class | ||
---|---|---|---|---|---|---|
Flow | Abort | Abort a transformation | opdts.abort.AbortMeta | |||
Transform | CheckSum | Add a checksum column for each input row | opdts.checksum.CheckSumMeta | |||
Transform | Constant | Add one or more constants to the input rows | opdts.constant.ConstantMeta | |||
Transform | Sequence | Get the next value from an sequence | opdts.addsequence.AddSequenceMeta | |||
Transform | FieldsChangeSequence | Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence. | opdts.fieldschangesequence.FieldsChangeSequenceMeta | |||
Transform | AddXML | Encode several fields into an XML fragment | opdts.addxml.AddXMLMeta | |||
Deprecated | ||||||
Statistics | AnalyticQuery | Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST) | opdts.analyticquery.AnalyticQueryMeta | |||
Flow | Append | Append 2 streams in an ordered way | opdts.append.AppendMeta | |||
Data Mining | Arff Output | Writes data in ARFF format to a file | opdts.append.arff.ArffOutputMeta | |||
Output | AutoDoc | This step automatically generates documentation based on input in the form of a list of transformations and jobs | opdts.autodoc.AutoDocMeta | |||
Deprecated (pre- v.8.0) Input (v.8.0 and after) | AvroInput | Decode binary or Json Avro data from a file or a field | opdts.avroinput.AvroInputMeta | |||
Output | AvroOutput | Encode binary or Json Avro data to a file | opdts.avrooutput.AvroOutputMeta | |||
Flow | BlockUntilStepsFinish | Block this step until selected steps finish. | opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta | |||
Flow | BlockingStep | This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step. | opdts.blockingstep.BlockingStepMeta | |||
Transform | Calculator | Create new fields by performing simple calculations | opdts.calculator.CalculatorMeta | |||
Lookup | DBProc | Get back information by calling a database procedure. | opdts.dbproc.DBProcMeta | |||
BA Server | CallEndpointStep | Calls API endpoints from the BA server within a PDI transformation. | org.pentaho.di.baserver.utils.CallEndpointMeta | |||
Utility | ChangeFileEncoding | Change file encoding and create a new file | opdts.changefileencoding.ChangeFileEncodingMeta | |||
Big Data | CassandraInput | Read from a Cassandra column family | opdts.cassandrainput.CassandraInputMeta | |||
Big Data | CassandraOutput | Write to a Cassandra column family | opdts.cassandraoutput.CassandraOutputMeta | |||
Lookup | ColumnExists | Check if a column exists in a table on a specified connection. | opdts.columnexists.ColumnExistsMeta | |||
Lookup | FileLocked | Check if a file is locked by another process | opdts.filelocked.FileLockedMeta | |||
Lookup | WebServiceAvailable | Check if a webservice is available | opdts.webserviceavailable.WebServiceAvailableMeta | |||
Utility | CloneRow | Clone a row as many times as needed | opdts.clonerow.CloneRowMeta | |||
Transform | ClosureGenerator | This step allows you to generates a closure table using parent-child relationships. | opdts.closure.ClosureGeneratorMeta | |||
Data Warehouse | CombinationLookup | Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields. | opdts.combinationlookup.CombinationLookupMeta | |||
Transform | ConcatFields | The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step. | opdts.concatfields.ConcatFieldsMeta | |||
Job | RowsToResult | Use this step to write rows to the executing job. The information will then be passed to the next entry in this job. | opdts.rowstoresult.RowsToResultMeta | |||
Big Data | CouchDbInput | Retrieves all documents from a given view in a given design document from a given database | opdts.couchdbinput.CouchDbInputMeta | |||
Validation | CreditCardValidator | The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...) | opdts.creditcardvalidator.CreditCardValidatorMeta | |||
Input | CsvInput | Simple CSV file input | opdts.csvinput.CsvInputMeta | |||
Input | DataGrid | Enter rows of static data in a grid, usually for testing, reference or demo purpose | opdts.datagrid.DataGridMeta | |||
Validation | Validator | Validates passing data based on a set of rules | opdts.validator.ValidatorMeta | |||
Lookup | DBJoin | Execute a database query using stream values as parameters | opdts.databasejoin.DatabaseJoinMeta | |||
Lookup | DBLookup | Look up values in a database using field values | opdts.databaselookup.DatabaseLookupMeta | |||
Input | CubeInput | Read rows of data from a data cube. | opdts.cubeinput.CubeInputMeta | |||
Utility | Delay | Output each input row after a delay | opdts.delay.DelayMeta | |||
Output | Delete | Delete data in a database table based upon keys | opdts.delete.DeleteMeta | |||
Flow | DetectEmptyStream | This step will output one empty row if input stream is empty (ie when input stream does not contain any row) | opdts.detectemptystream.DetectEmptyStreamMeta | |||
Data Warehouse | DimensionLookup | Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension. | opdts.dimensionlookup.DimensionLookupMeta | |||
Flow | Dummy | This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams. | opdts.dummytrans.DummyTransMeta | |||
Lookup | DynamicSQLRow | Execute dynamic SQL statement build in a previous field | opdts.dynamicsqlrow.DynamicSQLRowMeta | |||
Utility | TypeExitEdi2XmlStep | Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds) | opdts.edi2xml.Edi2XmlMeta | |||
Bulk loading | ElasticSearchBulk | Performs bulk inserts into ElasticSearch | opdts.elasticsearchbulk.ElasticSearchBulkMeta | |||
Input | MailInput | Read POP3/IMAP server and retrieve messages | opdts.mailinput.MailInputMeta | |||
Input | ShapeFileReader | Reads shape file data from an ESRI shape file and linked DBF file | org.pentaho.di.shapefilereader.ShapeFileReaderMeta | |||
Flow | MetaInject | This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions. | opdts.metainject.MetaInjectMeta | |||
Deprecated | ||||||
Utility | ExecProcess | Execute a process and return the result | opdts.execprocess.ExecProcessMeta | |||
Scripting | ExecSQLRow | Execute SQL script extracted from a field created in a previous step. | opdts.execsqlrow.ExecSQLRowMeta | |||
Scripting | ExecSQL | Execute an SQL script, optionally parameterized using input rows | opdts.sql.ExecSQLMeta | |||
Lookup | FileExists | Check if a file exists | opdts.fileexists.FileExistsMeta | |||
Flow | FilterRows | Filter rows using simple equations | opdts.filterrows.FilterRowsMeta | |||
Input | FixedInput | Fixed file input | opdts.fixedinput.FixedInputMeta | |||
Scripting | Formula | Calculate a formula using Pentaho's libformula | opdts.formula.FormulaMeta | |||
Lookup | FuzzyMatch | Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream. | opdts.fuzzymatch.FuzzyMatchMeta | |||
Input | RandomCCNumberGenerator | Generate random valide (luhn check) credit card numbers | opdts.randomccnumber.RandomCCNumberGeneratorMeta | |||
Input | RandomValue | Generate random value | opdts.randomvalue.RandomValueMeta | |||
Input | RowGenerator | Generate a number of empty or equal rows. | opdts.rowgenerator.RowGeneratorMeta | |||
Input | getXMLData | Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field. | opdts.getxmldata.GetXMLDataMeta | |||
Input | GetFileNames | Get file names from the operating system and send them to the next step. | opdts.getfilenames.GetFileNamesMeta | |||
Job | FilesFromResult | This step allows you to read filenames used or generated in a previous entry in a job. | opdts.filesfromresult.FilesFromResultMeta | |||
Input | GetFilesRowsCount | Get Files Rows Count | opdts.getfilesrowscount.GetFilesRowsCountMeta | |||
Transform | GetSlaveSequence | Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file. | opdts.getslavesequence.GetSlaveSequenceMeta | |||
Deprecated | ||||||
Input | GetRepositoryNames | Lists detailed information about transformations and/or jobs in a repository | opdts.getrepositorynames.GetRepositoryNamesMeta | |||
Job | RowsFromResult | This allows you to read rows from a previous entry in a job | opdts.rowsfromresult.RowsFromResultMeta | |||
BA Server | GetSessionVariableStep | Retrieves the value of a session variable | org.pentaho.di.baserver.utils.GetSessionVariableMeta | |||
Input | GetSubFolders | Read a parent folder and return all subfolders | opdts.getsubfolders.GetSubFoldersMeta | |||
Input | SystemInfo | Get information from the system like system date, arguments, etc. | opdts.systemdata.SystemDataMeta | |||
Input | GetTableNames | Get table names from database connection and send them to the next step | opdts.gettablenames.GetTableNamesMeta | |||
Job | GetVariable | Determine the values of certain (environment or Kettle) variables and put them in field values. | opdts.getvariable.GetVariableMeta | |||
Input | TypeExitGoogleAnalyticsInputStep | Fetches data from google analytics account | opdts.googleanalytics.GaInputStepMeta | |||
Input | ||||||
Deprecated | GPBulkLoader | Greenplum Bulk Loader | opdts.gpbulkloader.GPBulkLoaderMeta | |||
Bulk loading | GPLoad | Greenplum Load | ||||
Statistics | GroupBy | Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. | opdts.groupby.GroupByMeta | |||
Input | ParallelGzipCsvInput | Parallel GZIP CSV file input reader | opdts.parallelgzipcsv.ParGzipCsvInputMeta | |||
Big Data | HadoopFileInputPlugin | Read data from a variety of different text-file types stored on a Hadoop cluster | opdts.hadoopfileinput.HadoopFileInputMeta | |||
Big Data | HadoopFileOutputPlugin | Write data to a variety of different text-file types stored on a Hadoop cluster | opdts.hadoopfileoutput.HadoopFileOutputMeta | |||
Big Data | HbaseInput | Read from an HBase column family | opdts.hbaseinput.HBaseInputMeta | |||
Big Data | HbaseOutput | Write to an HBase column family | opdts.hbaseoutput.HBaseOutputMeta | |||
Big Data | HBaseRowDecoder | Decodes an incoming key and HBase result object according to a mapping | opdts.hbaserowdecoder.HBaseRowDecoderMeta | |||
Input | HL7Input | Read data from HL7 data streams. | opdt.hl7.plugins.hl7input | |||
Lookup | HTTP | Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.http.HTTPMeta | |||
Lookup | HTTPPOST | Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.httppost.HTTPPOSTMeta | |||
Deprecated | MQInput | Receive messages from any IBM Websphere MQ Server | ||||
Deprecated | MQOutput | Send messages to any IBM Websphere MQ Server | ||||
Flow | DetectLastRow | Last row will be marked | opdts.detectlastrow.DetectLastRowMeta | |||
Utility | IfNull | Sets a field value to a constant if it is null. | opdts.ifnull.IfNullMeta | |||
Bulk loading | InfobrightOutput | Load data to an Infobright database table | opdts.infobrightoutput.InfobrightLoaderMeta | |||
Bulk loading | VectorWiseBulkLoader | This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command. | opdts.ivwloader.IngresVectorwiseLoaderMeta | |||
Inline | Injector | Injector step to allow to inject rows into the transformation through the java API | opdts.injector.InjectorMeta | |||
Output | InsertUpdate | Update or insert rows in a database based upon keys. | opdts.insertupdate.InsertUpdateMeta | |||
Flow | JavaFilter | Filter rows using java code | opdts.javafilter.JavaFilterMeta | |||
Deprecated (pre- v.8.1) Input (v.8.1 and after) | JmsInput | Receive messages from a JMS server | ||||
Deprecated (pre- v.8.1) Output (v.8.1 and after) | JmsOutput | Send messages to a JMS server | ||||
Flow | JobExecutor | This step executes a Pentaho Data Integration Job, passes parameters and rows. | opdts.jobexecutor.JobExecutorMeta | |||
Joins | JoinRows | The output of this step is the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams. | opdts.joinrows.JoinRowsMeta | |||
Input | JsonInput | Extract relevant portions out of JSON structures (file or incoming field) and output rows | opdts.jsoninput.JsonInputMeta | |||
Output | JsonOutput | Create Json bloc and output it in a field ou a file. | opdts.jsonoutput.JsonOutputMeta | |||
Data Mining | KF | Executes a Knowledge Flow data mining process | org.pentaho.di.kf.KFMeta | |||
Input | LDAPInput | Read data from LDAP host | opdts.ldapinput.LDAPInputMeta | |||
Output | LDAPOutput | Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name). | opdts.ldapoutput.LDAPOutputMeta | |||
Input | LDIFInput | Read data from LDIF files | opdts.ldifinput.LDIFInputMeta | |||
Input | LoadFileInput | Load file content in memory | opdts.loadfileinput.LoadFileInputMeta | |||
Deprecated | ||||||
Deprecated | LucidDBStreamingLoader | Load data into LucidDB by using Remote Rows UDX. | opdts.luciddbstreamingloader.LucidDBStreamingLoaderMeta | |||
Utility | Send eMail. | opdts.mail.MailMeta | ||||
Validation | MailValidator | Check if an email address is valid. | opdts.mailvalidator.MailValidatorMeta | |||
Mapping | Mapping | Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface | opdts.mapping.MappingMeta | |||
Mapping | MappingInput | Specify the input interface of a mapping | opdts.mappinginput.MappingInputMeta | |||
Mapping | MappingOutput | Specify the output interface of a mapping | opdts.mappingoutput.MappingOutputMeta | |||
Big Data | HadoopEnterPlugin | Key Value pairs enter here from Hadoop MapReduce | opdts.hadoopenter.HadoopEnterMeta | |||
Big Data | HadoopExitPlugin | Key Value pairs exit here and are pushed into Hadoop MapReduce | opdts.hadoopexit.HadoopExitMeta | |||
Lookup | MaxMindGeoIPLookup | Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization. | com.maxmind.geoip.MaxMindGeoIPLookupMeta | |||
Statistics | MemoryGroupBy | Builds aggregates in a group by fashion. This step doesn't require sorted input. | opdts.memgroupby.MemoryGroupByMeta | |||
Joins | MergeJoin | Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key | opdts.mergejoin.MergeJoinMeta | |||
Joins | MergeRows | Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged. | opdts.mergerows.MergeRowsMeta | |||
Utility | StepMetastructure | This is a step to read the metadata of the incoming stream. | opdts.stepmeta.StepMetastructureMeta | |||
Input | AccessInput | Read data from a Microsoft Access file | opdts.accessinput.AccessInputMeta | |||
Output | AccessOutput | Stores records into an MS-Access database table. | opdts.accessoutput.AccessOutputMeta | |||
Input | ExcelInput | Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS). | opdts.excelinput.ExcelInputMeta | |||
Output | ExcelOutput | Stores records into an Excel (XLS) document with formatting information. | opdts.exceloutput.ExcelOutputMeta | |||
Output | TypeExitExcelWriterStep | Writes or appends data to an Excel file | opdts.excelwriter.ExcelWriterStepMeta | |||
Scripting | ScriptValueMod | This steps allows the execution of JavaScript programs (and much more) | opdts.scriptvalues_mod.ScriptValuesMetaMod | |||
Input | MondrianInput | Execute and retrieve data using an MDX query against a Pentaho Analyses OLAP server (Mondrian) | opdts.mondrianinput.MondrianInputMeta | |||
Agile | ||||||
Bulk loading | MonetDBBulkLoader | Load data into MonetDB by using their bulk load command in streaming mode. | opdts.monetdbbulkloader.MonetDBBulkLoaderMeta | |||
Big Data | MongoDbInput | Reads all entries from a MongoDB collection in the specified database. | opdts.mongodbinput.MongoDbInputMeta | |||
Big Data | MongoDbOutput | Write to a MongoDB collection. | opdts.mongodboutput.MongoDbOutputMeta | |||
Joins | MultiwayMergeJoin | Multiway Merge Join | opdts.multimerge.MultiMergeJoinMeta | |||
Bulk loading | MySQLBulkLoader | MySQL bulk loader step, loading data over a named pipe (not available on MS Windows) | opdts.mysqlbulkloader.MySQLBulkLoaderMeta | |||
Utility | NullIf | Sets a field value to null if it is equal to a constant value | opdts.nullif.NullIfMeta | |||
Transform | NumberRange | Create ranges based on numeric field | opdts.numberrange.NumberRangeMeta | |||
Input | OlapInput | Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j | opdts.olapinput.OlapInputMeta | |||
Deprecated | OpenERPObjectDelete | Deletes data from the OpenERP server using the XMLRPC interface with the 'unlink' function. | opdts.openerp.objectdelete.OpenERPObjectDeleteMeta | |||
Deprecated | OpenERPObjectInput | Retrieves data from the OpenERP server using the XMLRPC interface with the 'read' function. | opdts.openerp.objectinput.OpenERPObjectInputMeta | |||
Deprecated | OpenERPObjectOutputImport | Updates data on the OpenERP server using the XMLRPC interface and the 'import' function | opdts.openerp.objectoutput.OpenERPObjectOutputMeta | |||
Bulk loading | OraBulkLoader | Use Oracle Bulk Loader to load data | opdts.orabulkloader.OraBulkLoaderMeta | |||
Statistics | StepsMetrics | Return metrics for one or several steps | opdts.stepsmetrics.StepsMetricsMeta | |||
Deprecated | PaloCellInput | Retrieves all cell data from a Palo cube | opdts.palo.cellinput | |||
Deprecated | PaloCellOutput | Updates cell data in a Palo cube | opdts.palo.celloutput | |||
Deprecated | PaloDimInput | Returns elements from a dimension in a Palo database | opdts.palo.diminput | |||
Deprecated | PaloDimOutput | Creates/updates dimension elements and element consolidations in a Palo database | opdts.palo.dimoutput | |||
Output | PentahoReportingOutput | Executes an existing report (PRPT) | opdts.pentahoreporting.PentahoReportingOutputMeta | |||
Bulk loading | PGBulkLoader | PostgreSQL Bulk Loader | opdts.pgbulkloader.PGBulkLoaderMeta | |||
Flow | PrioritizeStreams | Prioritize streams in an order way. | opdts.prioritizestreams.PrioritizeStreamsMeta | |||
Utility | ProcessFiles | Process one file per row (copy or move or delete). This step only accept filename in input. | opdts.processfiles.ProcessFilesMeta | |||
Output | PropertyOutput | Write data to properties file | opdts.propertyoutput.PropertyOutputMeta | |||
Input | PropertyInput | Read data (key, value) from properties files. | opdts.propertyinput.PropertyInputMeta | |||
Statistics | RScriptExecutor | Executes an R script within a PDI transformation | ||||
Scripting | RegexEval | Regular expression Evaluation. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups. | opdts.regexeval.RegexEvalMeta | |||
Transform | ReplaceString | Replace all occurences a word in a string with another word. | opdts.replacestring.ReplaceStringMeta | |||
Statistics | ReservoirSampling | Transform Samples a fixed number of rows from the incoming stream | opdts.reservoirsampling.ReservoirSamplingMeta | |||
Lookup | Rest | Consume RESTfull services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs | opdts.rest.RestMeta | |||
Transform | Denormaliser | Denormalises rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields | opdts.denormaliser.DenormaliserMeta | |||
Transform | Flattener | Flattens consecutive rows based on the order in which they appear in the input stream | opdts.flattener.FlattenerMeta | |||
Transform | Normaliser | De-normalised information can be normalised using this step type. | opdts.normaliser.NormaliserMeta | |||
Input | RssInput | Read RSS feeds | opdts.rssinput.RssInputMeta | |||
Output | RssOutput | Read RSS stream. | opdts.rssoutput.RssOutputMeta | |||
Scripting | RuleExecutor | Execute a rule against each row (using Drools) | opdts.rules.RulesExecutorMeta | |||
Scripting | RuleAccumulator | Execute a rule against a set of rows (using Drools) | opdts.rules.RulesAccumulatorMeta | |||
Utility | SSH | Run SSH commands and returns result. | opdts.ssh.SSHMeta | |||
Input | S3CSVINPUT | S3 CSV Input | opdts.s3csvinput.S3CsvInputMeta | |||
Output | S3FileOutputPlugin | Exports data to a text file on an Amazon Simple Storage Service (S3) | com.pentaho.amazon.s3.S3FileOutputMeta | |||
Bulk loading | HanaBulkLoader | Bulk load data into SAP HANA | org.pentaho.di.trans.steps.hanabulkloader.HanaBulkLoaderMeta | |||
Output | SalesforceDelete | Delete records in Salesforce module. | opdts.salesforcedelete.SalesforceDeleteMeta | |||
Input | SalesforceInput | Reads information from SalesForce | opdts.salesforceinput.SalesforceInputMeta | |||
Output | SalesforceInsert | Insert records in Salesforce module. | opdts.salesforceinsert.SalesforceInsertMeta | |||
Output | SalesforceUpdate | Update records in Salesforce module. | opdts.salesforceupdate.SalesforceUpdateMeta | |||
Output | SalesforceUpsert | Insert or update records in Salesforce module. | opdts.salesforceupsert.SalesforceUpsertMeta | |||
Statistics | SampleRows | Filter rows based on the line number. | opdts.samplerows.SampleRowsMeta | |||
Deprecated | SapInput | Read data from SAP ERP, optionally with parameters | opdts.sapinput.SapInputMeta | |||
Input | SASInput | This step reads files in sas7bdat (SAS) native format | opdts.sasinput.SasInputMeta | |||
Experimental | ||||||
Cryptography | SecretKeyGenerator | Generate secrete key for algorithms such as DES, AES, TripleDES. | opdts.symmetriccrypto.secretkeygenerator.SecretKeyGeneratorMeta | |||
Transform | SelectValues | Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision. | opdts.selectvalues.SelectValuesMeta | |||
Utility | SyslogMessage | Send message to Syslog server | opdts.syslog.SyslogMessageMeta | |||
Output | CubeOutput | Write rows of data to a data cube | opdts.cubeoutput.CubeOutputMeta | |||
Transform | SetValueField | Replace value of a field with another value field | opdts.setvaluefield.SetValueFieldMeta | |||
Transform | SetValueConstant | Replace value of a field to a constant | opdts.setvalueconstant.SetValueConstantMeta | |||
Job | FilesToResult | This step allows you to set filenames in the result of this transformation. Subsequent job entries can then use this information. | opdts.filestoresult.FilesToResultMeta | |||
BA Server | SetSessionVariableStep | Allows you to set the value of session variable | org.pentaho.di.baserver.utils.SetSessionVariableMeta | |||
Job | SetVariable | Set environment variables based on a single input row. | opdts.setvariable.SetVariableMeta | |||
Experimental | ||||||
Mapping | SimpleMapping | Turn a repetitive, re-usable part of a transformation (a sequence of steps) into a mapping (sub-transformation). | opdts.simplemapping.SimpleMapping | |||
Flow | SingleThreader | Executes a transformation snippet in a single thread. You need a standard mapping or a transformation with an Injector step where data from the parent transformation will arive in blocks. | opdts.singlethreader.SingleThreaderMeta | |||
Inline | SocketReader | Socket reader. A socket client that connects to a server (Socket Writer step). | opdts.socketreader.SocketReaderMeta | |||
Inline | SocketWriter | Socket writer. A socket server that can send rows of data to a socket reader. | opdts.socketwriter.SocketWriterMeta | |||
Transform | SortRows | Sort rows based upon field values (ascending or descending) | opdts.sort.SortRowsMeta | |||
Joins | SortedMerge | Sorted Merge | opdts.sortedmerge.SortedMergeMeta | |||
Transform | SplitFieldToRows3 | Splits a single string field by delimiter and creates a new row for each split term | opdts.splitfieldtorows.SplitFieldToRowsMeta | |||
Transform | FieldSplitter | When you want to split a single field into more then one, use this step type. | opdts.fieldsplitter.FieldSplitterMeta | |||
Transform | SplunkInput | Reads data from Splunk. | opdts.splunk.SplunkInputMeta | |||
Transform | SplunkOutput | Writes data to Splunk. | opdts.splunk.SplunkOutputMeta | |||
Output | SQLFileOutput | Output SQL INSERT statements to file | opdts.sqlfileoutput.SQLFileOutputMeta | |||
Lookup | StreamLookup | Look up values coming from another stream in the transformation. | opdts.streamlookup.StreamLookupMeta | |||
Big Data | SSTableOutput | writes to a filesystem directory as a Cassandra SSTable | opdts.cassandrasstableoutput.SSTableOutputMeta | |||
Deprecated | ||||||
Transform | StringOperations | Apply certain operations like trimming, padding and others to string value. | opdts.stringoperations.StringOperationsMeta | |||
Transform | StringCut | Strings cut (substring). | opdts.stringcut.StringCutMeta | |||
Flow | SwitchCase | Switch a row to a certain target step based on the case value in a field. | opdts.switchcase.SwitchCaseMeta | |||
Cryptography | SymmetricCryptoTrans | Encrypt or decrypt a string using symmetric encryption. Available algorithms are DES, AEC, TripleDES. | opdts.symmetriccrypto.symmetriccryptotrans.SymmetricCryptoTransMeta | |||
Output | SynchronizeAfterMerge | This step perform insert/update/delete in one go based on the value of a field. | opdts.synchronizeaftermerge.SynchronizeAfterMergeMeta | |||
Agile | ||||||
Utility | TableCompare | This step compares the data from two tables (provided they have the same lay-out). It'll find differences between the data in the two tables and log it. | opdts.tablecompare.TableCompareMeta | |||
Lookup | TableExists | Check if a table exists on a specified connection | opdts.tableexists.TableExistsMeta | |||
Input | TableInput | Read information from a database table. | opdts.tableinput.TableInputMeta | |||
Output | TableOutput | Write information to a database table | opdts.tableoutput.TableOutputMeta | |||
Bulk loading | TeraFast | The Teradata Fastload Bulk loader | opdts.terafast.TeraFastMeta | |||
Bulk loading | TeraDataBulkLoader | Bulk loading via TPT using the tbuild command. | ||||
Input | TextFileInput | Read data from a text file in several formats. This data can then be passed on to the next step(s)... | opdts.textfileinput.TextFileInputMeta | |||
Deprecated | TextFileOutput | Write rows to a text file. | opdts.textfileoutput.TextFileOutputMeta | |||
Flow | This step executes a Pentaho Data Integration transformation, sets parameters, and passes rows. | |||||
Transform | Unique | Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. | opdts.uniquerows.UniqueRowsMeta | |||
Transform | UniqueRowsByHashSet | Remove double rows and leave only unique occurrences by using a HashSet. | opdts.uniquerowsbyhashset.UniqueRowsByHashSetMeta | |||
Statistics | UnivariateStats | This step computes some simple stats based on a single input field | opdts.univariatestats.UnivariateStatsMeta | |||
Output | Update | Update data in a database table based upon keys | opdts.update.UpdateMeta | |||
Scripting | UserDefinedJavaClass | This step allows you to program a step using Java code | opdts.userdefinedjavaclass.UserDefinedJavaClassMeta | |||
Scripting | Janino | Calculate the result of a Java Expression using Janino | opdts.janino.JaninoMeta | |||
Transform | ValueMapper | Maps values of a certain field from one value to another | opdts.valuemapper.ValueMapperMeta | |||
Bulk loading | VerticaBulkLoader | Bulk loads data into a Vertica table using their high performance COPY feature | opdts.verticabulkload.VerticaBulkLoaderMeta | |||
Lookup | WebServiceLookup | Look up information using web services (WSDL) | opdts.webservices.WebServiceMeta | |||
Data Mining | KF | Executes a Knowledge Flow data mining process | org.pentaho.di.kf.KFMeta | |||
Utility | WriteToLog | Write data to log | opdts.writetolog.WriteToLogMeta | |||
Input | XBaseInput | Reads records from an XBase type of database file (DBF) | opdts.xbaseinput.XBaseInputMeta | |||
Input | XMLInputStream | This step is capable of processing very large and complex XML files very fast. | opdts.xmlinputstream.XMLInputStreamMeta | |||
Deprecated | ||||||
Joins | XMLJoin | Joins a stream of XML-Tags into a target XML string | opdts.xmljoin.XMLJoinMeta | |||
Output | XMLOutput | Write data to an XML file | opdts.xmloutput.XMLOutputMeta | |||
Validation | XSDValidator | Validate XML source (files or streams) against XML Schema Definition. | opdts.xsdvalidator.XsdValidatorMeta | |||
Transform | XSLT | Transform XML stream using XSL (eXtensible Stylesheet Language). | opdts.xslt.XsltMeta | |||
Input | YamlInput | Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output. | opdts.yamlinput.YamlInputMeta | |||
Utility | ZipFile | Creates a standard ZIP archive from the data stream fields | opdts.zipfile.ZipFileMeta |