SGDText
Package
weka.classifiers.functions
Synopsis
Implements stochastic gradient descent for learning a linear binary class SVM or binary class logistic regression on text data. Operates directly on String attributes. From Weka 3.7.5.
Options
The table below describes the options available for SGDText.
Option |
Description |
---|---|
LNorm |
The LNorm to use for document length normalization. |
debug |
If set to true, classifier may output additional info to the console. |
epochs |
The number of epochs to perform (batch learning). The total number of iterations is epochs * num instances. |
lambda |
The regularization constant. (default = 0.0001) |
learningRate |
The learning rate. |
lossFunction |
The loss function to use. Hinge loss (SVM), log loss (logistic regression) or squared loss (regression). |
lowercaseTokens |
Whether to convert all tokens to lowercase |
minWordFrequency |
Ignore any words that don't occur at least min frequency times in the training data. If periodic pruning is turned on, then the dictionary is pruned according to this value |
norm |
The norm of the instances after normalization. |
periodicPruning |
How often (number of instances) to prune the dictionary of low frequency terms. 0 means don't prune. Setting a positive integer n means prune after every n instances |
seed |
The random number seed to be used. |
stemmer |
The stemming algorithm to use on the words. |
stopwords |
The file containing the stopwords (if this is a directory then the default ones are used). |
tokenizer |
The tokenizing algorithm to use on the strings. |
useStopList |
If true, ignores all words that are on the stoplist. |
useWordFrequencies |
Use word frequencies rather than binary bag of words representation |
Capabilities
The table below describes the capabilities of SGDText.
Capability |
Supported |
---|---|
Class |
Binary class, Missing class values |
Attributes |
String attributes, Missing values |
Min # of instances |
0 |