SimpleKMeans
Package
weka.clusterers
Synopsis
Cluster data using the k means algorithm. Can use either the Euclidean distance (default) or the Manhattan distance. If the Manhattan distance is used, then centroids are computed as the component-wise median rather than mean. For more information see:
D. Arthur, S. Vassilvitskii: k-means++: the advantages of carefull seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.
Options
The table below describes the options available for SimpleKMeans.
Option |
Description |
---|---|
displayStdDevs |
Display std deviations of numeric attributes and counts of nominal attributes. |
distanceFunction |
The distance function to use for instances comparison (default: weka.core.EuclideanDistance). |
dontReplaceMissingValues |
Replace missing values globally with mean/mode. |
fastDistanceCalc |
Uses cut-off values for speeding up distance calculation, but suppresses also the calculation and output of the within cluster sum of squared errors/sum of distances. |
initializeUsingKMeansPlusPlusMethod |
Initialize cluster centers using the probabilistic farthest first method of the k-means++ algorithm |
maxIterations |
set maximum number of iterations |
numClusters |
set number of clusters |
preserveInstancesOrder |
Preserve order of instances. |
seed |
The random number seed to be used. |
Capabilities
The table below describes the capabilities of SimpleKMeans.
Capability |
Supported |
---|---|
Class |
No class |
Attributes |
Nominal attributes, Numeric attributes, Missing values, Binary attributes, Empty nominal attributes, Unary attributes |
Min # of instances |
1 |