Pentaho Data Mining Community Documentation
Quick Start and Overview
Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules, and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.
There are two versions of Weka:
Weka 3.8 - current stable version. This branch receives bug fixes to core Weka; new features are released through packages that can be installed via the built-in package manager.
Weka 3.9 - development branch. This is a continuation of the 3.8 code line that receives both bug fixes and new features/improvements to core Weka. It also takes advantage of new features released in packages.
Pentaho Data Mining Home Page (News, Downloads, Forums, Bug tracking etc.)
A nice introductory article on data mining with Weka at IBM Developerworks by Michael Abernethy
Documentation
Pentaho Data Mining (Weka)
English documentation for Weka 3.6.14 (stable book 3rd ed. version)
English documentation for Weka 3.8.0 (latest stable version)
There is a book that has been written to accompany Weka - Data Mining: Practical Machine Learning Tools and Techniques (Fourth Edition).
Plugins for Pentaho Data Integration (Kettle)
Using the Reservoir Sampling Plugin (included as a first class step in recent Kettle distributions)
Using the Univariate Statistics Plugin (included as a first class step in recent Kettle distributions)
Using the Knowledge Flow Plugin (enterprise edition)
Time Series Analysis and Forecasting with Weka (available as a PDI Spoon perspective as well as a Weka plugin)
Weka time series forecasting plugin for PDI 4 (enterprise edition)
Developing with Weka
Awards and Publications
Under Development/Roadmap
Complete Knowledge Flow rewrite - new engine, refactored UI etc.
Distributed Weka for Hadoop and for Spark
Incremental dictionary creation and vectorisation (StringToWordVector filter) for text documents