How decision values are calculated in libsvm - matlab

Considering only linear kernel, how decision values are calculated in LIBSVM?
Generally, for a two class problem predictions are done based on sign(w*z+b) but in LIBSVM predictions are done based on sign(decision value).
I calculated wz+b which is coming out to be different than decision value.
Is there any relation between wz+b and decision value.

Decision function is exactly <w, x> + b, the only thing that might be missleading is that in matlab's structure rho is actualy -b (notice change of the sign) thus the decision function is <w, x> - rho

Related

How to specify non linear regression model in python

I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',)
`
The scipy.optimize code is basically just copy-pasted from their tutorial
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit https://pypi.org/project/lmfit/
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.

How can we model independent noise for every output dimension of a multi-output GP in GPflow?

Say I have a problem having D outputs with isotopic data, I would like to use independent noise for each output dimension of a multi-output GP model (Intrinsic Coregionalisation Model) in gpflow, which is the most general case like:
I have seen some example of using multi-output GPs in GPflow, like this notebook and this question
However, it seems for the GPR model class in gpflow, the likelihood variance ($\Sigma$) is still one number instead of D numbers even if a product kernel (i.e. Kernel * Coregionalization) is specified.
Is there any way to achieve that?
Just like you can augment X with a column that designates for each data point (row) which output it relates to (the column is specified by the active_dims keyword argument to the Coregion kernel; note that it is zero-based indexing), you can augment Y with a column to specify different likelihoods (the SwitchedLikelihood is hard-coded to require the index to be in the last column of Y) - there is an example (Demo 2) in the varying noise notebook in the GPflow tutorials. You just have to combine the two, use a Coregion kernel and a SwitchedLikelihood, and augment both X and Y with the same column indicating outputs!
However, as plain GPR only works with a Gaussian likelihood, the GPR model has been hard-coded for a Gaussian likelihood. It would certainly be possible to write a version of it that can deal with different Gaussian likelihoods for the different outputs, but you would have to do it all manually in the _build_likelihood method of a new model (incorporating the stitching code from the SwitchedLikelihood).
It would be much easier to simply use a VGP model that can handle any likelihood - for Gaussian likelihoods the optimisation problem is very simple and should be easy to optimise using ScipyOptimizer.

How to interpret coefficients and p-values in multiple linear regression with two categorical variables and interaction

I am new to linear regression so I hope you can help me with interpreting the output of a multiple linear regression with two categorical predictor variables and an interaction term.
I did the following linear regression:
lm(H1A1c ~ Vowel * Speaker, data=data)
Vowel and Speaker are both categorical variables. Vowel can be "breathy", "modal" or "creaky" and there are four different speakers (F01, F02, M01, M02). I want to see if a combination of those two categories can predict the values for H1A1c.
My output is this:
Output of lm
Please correct me if I am wrong but I think we can see from this output that the relationship between most of my variables can't be characterised as linear. What I don't really understand is how to interpret the first p-value. When I googled I found that all the other p-values refer to the relationship of the respective coefficient and what this coefficient relates to. E.g. the p-value in the third line refers to the relationship of the coefficient of the third line to the first one, i.e. 23.1182-9.6557.
What about the p-value of the first coefficient, though? There can't be a linear relationship if there is no relationship? What does this p-value refer to?
Thanks in advance for your answers!
The first p-value(Intercept) tells you how likely the y-intercept of your fitted line is going to be zero(pass through the origin). Since the p-value in your result is way lower than 0.05, you can say the y-intercept is certainly not zero.
Other p-values are to be interpreted differently. Your interpretation is correct that they give an idea whether the coefficients of the variables they represent are likely to be zero or not.
the p-value in the third line refers to the relationship of the coefficient of the third line to the first one, i.e. 23.1182-9.6557
(-9.6557) means that on an average, the predicted value of H1A1c will be 9.6557 units lower if GlottalContext=creaky(i.e. GlottalContextcreaky = 1) compared to when GlottalContext=breathy(since breathy is your reference category here) keeping all other predictors unchanged. This is obviously when the corresponding p-value is less than 0.05 which, I see, is the case for GlottalContextcreaky.
(Additionally, if I were to assume that H1A1c is a continuous variable, I am not sure if choosing a linear regression to predict H1A1c would be the best way to go since both your predictors are categorical. You might want to explore other algorithms e.g. transform your dependent variable to categorical and do a binary/multinomial logistic regression or a decision tree)

Interpretation of coefficients in multinomial LogisticRegressionWithLBFGS model output in scala

I am attempting to do some post processing of the outputs of a multinomial LogisticRegressionWithLBFGS model. The model matrix is created in R and then exported to scala spark for model fitting.
The documentation states that there is "standard feature scaling and L2 regularization". The outputs of the multinomial model from the multinom() function in R's {nnet} package is clear as log-odds between a given outcome and a base outcome. There is however not sufficient detailed information in the documentation about how the weights of the LogisticRegressionWithLBFGS can be transformed to obtain a standard set of coefficients.
The term "standard feature scaling" means different things to different people. It could mean that the model matrix is scaled as (x - mean(x))/sd(x) or (x - min(x))/(max(x) - min(x)) or a set of other possibilities. In addition the weights output is a string of numbers that is a multiple of the features that could be folded in different ways to obtain a coefficients matrix - for example by row, by column, or some other arbitrary way.
How do I process the outputs from the LogisticRegressionWithLBFGS().weights to obtain a standard set of coefficients that I can do some post processing, basic inference and predictions with the original model matrix?

Simple binary logistic regression using MATLAB

I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct).
I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm having some trouble running this in MATLAB.
My approach is as follows: I have one column vector X that contains the values of the continuous variable, and another equally-sized column vector Y that contains the known classification of each value of X (e.g. 0 or 1). I'm using the following code:
[b,dev,stats] = glmfit(X,Y,'binomial','link','logit');
However, this gives me nonsensical results with a p = 1.000, coefficients (b) that are extremely high (-650.5, 1320.1), and associated standard error values on the order of 1e6.
I then tried using an additional parameter to specify the size of my binomial sample:
glm = GeneralizedLinearModel.fit(X,Y,'distr','binomial','BinomialSize',size(Y,1));
This gave me results that were more in line with what I expected. I extracted the coefficients, used glmval to create estimates (Y_fit = glmval(b,[0:0.01:1],'logit');), and created an array for the fitting (X_fit = linspace(0,1)). When I overlaid the plots of the original data and the model using figure, plot(X,Y,'o',X_fit,Y_fit'-'), the resulting plot of the model essentially looked like the lower 1/4th of the 'S' shaped plot that is typical with logistic regression plots.
My questions are as follows:
1) Why did my use of glmfit give strange results?
2) How should I go about addressing my initial question: given some input value, what's the probability that its classification is correct?
3) How do I get confidence intervals for my model parameters? glmval should be able to input the stats output from glmfit, but my use of glmfit is not giving correct results.
Any comments and input would be very useful, thanks!
UPDATE (3/18/14)
I found that mnrval seems to give reasonable results. I can use [b_fit,dev,stats] = mnrfit(X,Y+1); where Y+1 simply makes my binary classifier into a nominal one.
I can loop through [pihat,lower,upper] = mnrval(b_fit,loopVal(ii),stats); to get various pihat probability values, where loopVal = linspace(0,1) or some appropriate input range and `ii = 1:length(loopVal)'.
The stats parameter has a great correlation coefficient (0.9973), but the p values for b_fit are 0.0847 and 0.0845, which I'm not quite sure how to interpret. Any thoughts? Also, why would mrnfit work over glmfit in my example? I should note that the p-values for the coefficients when using GeneralizedLinearModel.fit were both p<<0.001, and the coefficient estimates were quite different as well.
Finally, how does one interpret the dev output from the mnrfit function? The MATLAB document states that it is "the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares." Is this useful as a stand-alone value, or is this only compared to dev values from other models?
It sounds like your data may be linearly separable. In short, that means since your input data is one dimensional, that there is some value of x such that all values of x < xDiv belong to one class (say y = 0) and all values of x > xDiv belong to the other class (y = 1).
If your data were two-dimensional this means you could draw a line through your two-dimensional space X such that all instances of a particular class are on one side of the line.
This is bad news for logistic regression (LR) as LR isn't really meant to deal with problems where the data are linearly separable.
Logistic regression is trying to fit a function of the following form:
This will only return values of y = 0 or y = 1 when the expression within the exponential in the denominator is at negative infinity or infinity.
Now, because your data is linearly separable, and Matlab's LR function attempts to find a maximum likelihood fit for the data, you will get extreme weight values.
This isn't necessarily a solution, but try flipping the labels on just one of your data points (so for some index t where y(t) == 0 set y(t) = 1). This will cause your data to no longer be linearly separable and the learned weight values will be dragged dramatically closer to zero.