SpreadSubsample
Package
weka.filters.supervised.instance
Synopsis
Produces a random subsample of a dataset. The original dataset must fit entirely in memory. This filter allows you to specify the maximum "spread" between the rarest and most common class. For example, you may specify that there be at most a 2:1 difference in class frequencies. When used in batch mode, subsequent batches are NOT resampled.
Options
The table below describes the options available for SpreadSubsample.
Option |
Description |
---|---|
adjustWeights |
Wether instance weights will be adjusted to maintain total weight per class. |
distributionSpread |
The maximum class distribution spread. (0 = no maximum spread, 1 = uniform distribution, 10 = allow at most a 10:1 ratio between the classes). |
maxCount |
The maximum count for any class value (0 = unlimited). |
randomSeed |
Sets the random number seed for subsampling. |
Capabilities
The table below describes the capabilites of SpreadSubsample.
Capability |
Supported |
---|---|
Class |
Nominal class, Binary class |
Attributes |
Binary attributes, Missing values, Nominal attributes, Numeric attributes, Unary attributes, String attributes, Empty nominal attributes, Relational attributes, Date attributes |
Min # of instances |
0 |