Covariance of predictions made from mgcv::gam - confidence-interval

suppose i use mgcv’s gam to predict with some newdata producing two outputs(say g1,g2) . How can ask gam.predict to return the covariance of the two predictions?
I would like to compute the confidence interval of f(g1,g2) for some f. Assuming two predictions follow bivariate normal , i could use the delta theorem to compute the confidence interval.
Or is there an alternative method to compute this?

I am not really sure what you are asking, but if you need to calculate the covariance of any two data sets g1, g2 (independently if they are output of function gam.predict) you can simply use the function cov() after calling gam.predict, i.e., cov(g1,g2).

Related

MATLAB Multinomial Logistic Regression Inputs

This is my first time attempting to use multinomial logistic regression, and I'm having a hard time getting started. I currently have a dataset of 203 observations with 22 independent variables and 1 dependent variable, all of which are numerical and continuous. My goal is to use MATLAB mnrfit function to predict the probabilities of future observations having a dependent variable falling into one of three intervals (y<0, 0<y<5, and 5<y).
How would I input my data into the mnrfit function to get these results? I believe that I would have to use this function to get the coefficients and then use the mnrval function to determine the probabilities for future observations. Thanks for the help!
Given http://se.mathworks.com/help/stats/mnrfit.html
It seems all you have to do is turn your Y variable to an integer array, something like
say Yord = (Y>0) + (Y>5) + 1
then call B = mnrfit(X, Yord)
where X is the matrix of predictors/features
reshape B in the way suggested in the example on the link above and finally call
mnrval(B, X) to get the probabilites of being less than zero, between zero and five or above zero

Matlab: Determinant of VarianceCovariance matrix

When solving the log likelihood expression for autoregressive models, I cam across the variance covariance matrix Tau given under slide 9 Parameter estimation of time series tutorial. Now, in order to use
fminsearch
to maximize the likelihood function expression, I need to express the likelihood function where the variance covariance matrix arises. Can somebody please show with an example how I can implement (determinant of Gamma)^-1/2 ? Any other example apart from autoregressive model will also do.
How about sqrt(det(Gamma)) for the sqrt-determinant and inv(Gamma) for inverse?
But if you do not want to implement it yourself you can look at yulewalkerarestimator
UPD: For estimation of autocovariance matrix use xcov
also, this topic is a bit more explained here

How To Fit Multivariate Normal Distribution To Data In MATLAB?

I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it.
I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).
How can I do something similar for a multivariate normal distribution?
Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?
There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.
Estimate the mean with mean and the variance-covariance matrix with cov.
Then you can generate random numbers with mvnrnd.
It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.
Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.
You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.
Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.
https://www.mathworks.com/help/stats/normfit.html
https://www.mathworks.com/help/stats/mvnpdf.html

weighted correlation for case of matrix

i have question how to calculate weighted correlations for matrices,from wikipedia i have created three following codes
1.weighted mean calculation
function [y]= weighted_mean(x,w);
n=length(x);
%assume that weight vector and input vector have same length
sum=0.0;
sum_weight=0.0;
for i=1:n
sum=sum+ x(i)*w(i);
sum_weight=sum_weight+w(i);
end
y=sum/sum_weight;
end
2.weighted covariance
function result=cov_weighted(x,y,w)
n=length(x);
sum_covar=0.0;
sum_weight=0;
for i=1:n
sum_covar=sum_covar+w(i)*(x(i)-weighted_mean(x,w))*(y(i)-weighted_mean(y,w));
sum_weight=sum_weight+w(i);
end
result=sum_covar/sum_weight;
end
and finally weighted correlation
3.
function corr_weight=weighted_correlation(x,y,w);
corr_weight=cov_weighted(x,y,w)/sqrt(cov_weighted(x,x,w)*cov_weighted(y,y,w));
end
now i want to apply weighted correlation method for matrices,related to this link
http://www.mathworks.com/matlabcentral/fileexchange/20846-weighted-correlation-matrix/content/weightedcorrs.m
i did not understand anything how to apply,that why i have created my self,but need in case of input are matrices,thanks very much
#dato-datuashvili Maybe I am providing too much information...
1) I would like to stress that the evaluation of Weighted Correlation matrices are very uncommon. This happens because you have to provide beforehand the weights. Unless you have a clear reason to choose the weights, there is no clear way to provide them.
How can you tell that a measurement of your sample is more or less important than another measurement?
Having said that, the weights are up to you! Yo have to choose them!
So, people usually consider just the correlation matrix (no weights or all weights are the same e.g w_i=1).
If you have a clear way to choose good weights, just do not consider this part.
2) I understand that you want to test your code. So, in order to that, you have to have correlated random variables. How to generate them?
Multivariate normal distributions are the simplest case. See the wikipedia page about them: Multivariate Normal Distribution (see the item "Drawing values from the distribution". Wikipedia shows you how to generate the random numbers from this distribution using Choleski Decomposition). The 2-variate case is much simpler. See for instance Generate Correlated Normal Random Variables
The good news is that if you are using Matlab there is a function for you. See Matlab: Random numbers from the multivariate normal distribution.]
In order to use this function you have to provide the desired means and covariances. [Note that you are making the role of nature here. You are generating the data! In real life, you are going to apply your function to the real data. What I am trying to say is that this step is only useful for tests. Furthermore, pay attencion to the fact that in the Matlab function you are providing the variances and evaluating the correlations (covariances normalized by standard errors). In the 2-dimensional case (that is the case of your function it is possible to provide directly the correlation. See the page above that I provided to you of Math.Stackexchange]
3) Finally, you can apply them to your function. Generate X and Y from a normal multivarite distribution and provide the vector of weights w to your function corr_weight_correlation and you are done!
I hope I provide what you need!
Daniel
Update:
% From the matlab page
mu = [2 3];
SIGMA = [1 1.5; 1.5 3];
n=100;
[x,y] = mvnrnd(mu,SIGMA,n);
% Using your code
w=ones(n,1);
corr_weight=weighted_correlation(x,y,w); % Remember that Sigma is covariance and Corr_weight is correlation. In order to calculate the same thing, just use result=cov_weighted instead.

How to vectorize the intersection kernel function in MATLAB?

I need to pre-compute the histogram intersection kernel matrices for using LIBSVM in MATLAB.
Assume x, y are two vectors. The kernel function is K(x, y) = sum(min(x, y)). In order to be efficient, the best practice in most cases is to vectorize the operations.
What I want to do is like calculate the kernel matrices like calculating the euclidean distance between two matrices, like pdist2(A, B, 'euclidean'). After defining function 'intKernel', I could calculate the intersection kernel by calling pdist2(A, B, intKernel).
I know the function 'pdist2' may be an option. But I have no idea how to write the self-defined distance function. While, I do not know how to code the intersection kernel between vector(1-by-M) and matrix(M-by-N) in one condense expression.
'repmat' may not be feasible, because the matrix is really large, let us say, 20000-by-360000.
Any help would be appreciated.
Regards,
Peiyun
I think pdist2 is a good option, so I help you to define your distance function.
According to the doc, the self-defined distance function must have 2 inputs: first one is a 1-by-N vector; second one is a M-by-N matrix (be careful of the order!).
To avoid the use of repmat which is indeed memory-consumant, you can use bsxfun to apply some basic operations on data with expansion over singleton dimensions. In your case, you can do the following thing:
distance_kernel = #(x,Y) sum(bsxfun(#min,x,Y),2);
Summation is done over the columns to get a column vector as output.
Then just call pdist2 and you are done.