How can we model independent noise for every output dimension of a multi-output GP in GPflow? - gpflow

Say I have a problem having D outputs with isotopic data, I would like to use independent noise for each output dimension of a multi-output GP model (Intrinsic Coregionalisation Model) in gpflow, which is the most general case like:
I have seen some example of using multi-output GPs in GPflow, like this notebook and this question
However, it seems for the GPR model class in gpflow, the likelihood variance ($\Sigma$) is still one number instead of D numbers even if a product kernel (i.e. Kernel * Coregionalization) is specified.
Is there any way to achieve that?

Just like you can augment X with a column that designates for each data point (row) which output it relates to (the column is specified by the active_dims keyword argument to the Coregion kernel; note that it is zero-based indexing), you can augment Y with a column to specify different likelihoods (the SwitchedLikelihood is hard-coded to require the index to be in the last column of Y) - there is an example (Demo 2) in the varying noise notebook in the GPflow tutorials. You just have to combine the two, use a Coregion kernel and a SwitchedLikelihood, and augment both X and Y with the same column indicating outputs!
However, as plain GPR only works with a Gaussian likelihood, the GPR model has been hard-coded for a Gaussian likelihood. It would certainly be possible to write a version of it that can deal with different Gaussian likelihoods for the different outputs, but you would have to do it all manually in the _build_likelihood method of a new model (incorporating the stitching code from the SwitchedLikelihood).
It would be much easier to simply use a VGP model that can handle any likelihood - for Gaussian likelihoods the optimisation problem is very simple and should be easy to optimise using ScipyOptimizer.


Obtaining the SHAP values for a prediction made with kNN

If I want to obtain the SHAP values with kernel SHAP for a kNN classifier with n variables, do I have to recalculate the prediction 2^n times?
(I'm not using python, but MATLAB, so I need to know the inside of the algorithm)
For those who use python find the following script to get shap values from a knn model. For step by step modeling follow this link:
# Initialize model
knn = sklearn.neighbors.KNeighborsClassifier()
# Fit the model, Y_train)
# Get the model explainer object
explainer = shap.KernelExplainer(knn.predict_proba, X_train)
# Get shap values for the test data observation whose index is 0, i.e. first observation in the test set
shap_values = explainer.shap_values(X_test.iloc[0,:])
# Generate a force plot for this first observation using the derived shap values
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])
Yes, I believe so according to the paper entitled 'A Unified Approach to Interpreting Model
You will need to iterate through all feature coalitions: hence the 2^n (where n in number of features).

How to specify non linear regression model in python

I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
plt.plot(xdata, func(xdata, *popt), 'g--',)
The scipy.optimize code is basically just copy-pasted from their tutorial
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.

Interpretation of coefficients in multinomial LogisticRegressionWithLBFGS model output in scala

I am attempting to do some post processing of the outputs of a multinomial LogisticRegressionWithLBFGS model. The model matrix is created in R and then exported to scala spark for model fitting.
The documentation states that there is "standard feature scaling and L2 regularization". The outputs of the multinomial model from the multinom() function in R's {nnet} package is clear as log-odds between a given outcome and a base outcome. There is however not sufficient detailed information in the documentation about how the weights of the LogisticRegressionWithLBFGS can be transformed to obtain a standard set of coefficients.
The term "standard feature scaling" means different things to different people. It could mean that the model matrix is scaled as (x - mean(x))/sd(x) or (x - min(x))/(max(x) - min(x)) or a set of other possibilities. In addition the weights output is a string of numbers that is a multiple of the features that could be folded in different ways to obtain a coefficients matrix - for example by row, by column, or some other arbitrary way.
How do I process the outputs from the LogisticRegressionWithLBFGS().weights to obtain a standard set of coefficients that I can do some post processing, basic inference and predictions with the original model matrix?

How decision values are calculated in libsvm

Considering only linear kernel, how decision values are calculated in LIBSVM?
Generally, for a two class problem predictions are done based on sign(w*z+b) but in LIBSVM predictions are done based on sign(decision value).
I calculated wz+b which is coming out to be different than decision value.
Is there any relation between wz+b and decision value.
Decision function is exactly <w, x> + b, the only thing that might be missleading is that in matlab's structure rho is actualy -b (notice change of the sign) thus the decision function is <w, x> - rho

leave-one-out regression using lasso in Matlab

I have 300 data samples with around 4000 dimension feature each. Each input has a 5 dim. output which is in the range of -2 to 2. I am trying to fit a lasso model to it. I went through a few posts which talk about cross validation strategies like this one: Leave one out cross validation algorithm in matlab
But I saw that lasso does not support leaveout in Matlab!
How can I train a model using leave one out cross validation and fit a model using lasso on my dataset? I am trying to do this in matlab. I would like to get a set of weights which I will be able to use for future predictions on other data.
I tried using glmnet: but I couldn't compile it on my machine due to lack of proper mex compiler.
Any solutions to my problem? Thanks :)
I am also trying to use lasso function in-built with MATLAB. It has an option to perform cross validation. It outputs B and Fit Statistics, where B is Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.
Now given a new test sample, how can I calculate the output using this model?
You can use a leave-one-out approach regardless of your training method. As explained here, you can use crossvalind to split the data into training and test sets.
[Train, Test] = crossvalind('LeaveMOut', N, M)