...
The probability for class j with the exception of the last class is
No Format |
---|
Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1) |
...
The last class has probability |
...
1-(sum[j=1..(k-1)]Pj(Xi)) |
...
= 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1) |
...
The (negative) multinomial log-likelihood is thus: |
...
L = -sum[i=1..n]{ |
...
sum[j=1..(k-1)](Yij * ln(Pj(Xi))) |
...
+(1 - (sum[j=1..(k-1)]Yij)) |
...
* ln(1 - sum[j=1..(k-1)]Pj(Xi)) |
...
} + ridge * (B^2) |
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.
...