How to find the standard deviation s of simple linear regression coefficients Alpha and Beta in Matlab? - matlab

I have data and I need to do a linear regression on the data to obtain
y=Alpha*x+Beta
Alpha and Beta are estimators given by the regression, polyfit can give me those with no problem but this is a physical science report and I need to give error estimators on those values
I know from statistics that standard deviation exists for simple linear regression coefficients.
How can I calculate then in Matlab
Thank you

The second output of the regress command will give you 95% confidence intervals for the regression coefficients. Here's an example:
>> x = [ones(100,1), (1:100)'];
>> alpha = 3; beta = 2;
>> y = x*[alpha; beta]+randn(100,1);
>> [coeffs, coeffints] = regress(y,x);
>> coeffs
coeffs =
2.9712
1.9998
>> coeffints
coeffints =
2.5851 3.3573
1.9932 2.0064
Here alpha has been estimated 2.9712, with 95% confidence interval of between 2.5851 and 3.3573, and beta has been estimated as 1.9998, with 95% confidence interval between 1.9932 and 2.0064.

The easiest solution is to use LSCOV:
%# create some data
x = 1:10;
y = 3*x+randn(size(x))*0.1;
%# create the design matrix
A = [x(:),ones(length(x),1)];
[u,std_u] = lscov(A,y(:));
u =
3.0241
-0.070209
std_u =
0.018827
0.11682

Or just use the relationship that the length of the 95% Confidence Interval is 2*1.9654 standard errors so the standard errors in the example with regress above are given by:
st errors = (coeffints(:,2)-coeffints(:,1))/(2*1.9654).
The number 1.9654 appears because of the normality assumption in the regress function

Related

Matlab : Convolution and deconvolution results weird

Data x is input to an autoregreesive model (AR) model. The output of the AR model is corrupted with Additive White Gaussian Noise at SNR = 30 dB. The observations are denoted by noisy_y.
Let there be close estimates h_hat of the AR model (these are obtained from Least Squares estimation). I want to see how close the input obtained from deconvolution with h_hat and the measurements is to the known x.
My confusion is which variable to use for deconvolution -- clean y or noisy y?
Upon deconvolution, I should get x_hat. I am not sure if the correct way to perform deconvolution is using the noisy_y or using the y before adding noise. I have used the following code.
Can somebody please help in what is the correct method to plot x and x_hat.
Below is the plot of x vs x_hat. As can be seen, that these do not match. Where is my understand wrong? Please help.
The code is:
clear all
N = 200; %number of data points
a1=0.1650;
b1=-0.850;
h = [1 a1 b1]; %true coefficients
x = rand(1,N);
%%AR model
y = filter(1,h,x); %transmitted signal through AR channel
noisy_y = awgn(y,30,'measured');
hat_h= [1 0.133 0.653];
x_hat = filter(hat_h,1,noisy_y); %deconvolution
plot(1:50,x(1:50),'b');
hold on;
plot(1:50,x_hat(1:50),'-.rd');
A first issue is that the coefficients h of your AR model correspond to an unstable system since one of its poles is located outside the unit circle:
>> abs(roots(h))
ans =
1.00814
0.84314
Parameter estimation techniques are then quite likely to fail to converge given a diverging input sequence. Indeed, looking at the stated hat_h = [1 0.133 0.653] it is pretty clear that the parameter estimation did not converge anywhere near the actual coefficients. In your specific case you did not provide the code illustrating how you obtained hat_h (other than specifying that it was "obtained from Least Squares estimation"), so it isn't possible to further comment on what went wrong with your estimation.
That said, the standard formulation of Least Mean Squares (LMS) filters is given for an MA model. A common method for AR parameter estimation is to solve the Yule-Walker equations:
hat_h = aryule(noisy_y - mean(noisy_y), length(h)-1);
If we were to use this estimation method with the stable system defined by:
h = [1 -a1 -b1];
x = rand(1,N);
%%AR model
y = filter(1,h,x); %transmitted signal through AR channel
noisy_y = awgn(y,30,'measured');
hat_h = aryule(noisy_y - mean(noisy_y), length(h)-1);
x_hat = filter(hat_h,1,noisy_y); %deconvolution
The plot of x and x_hat would look like:

Calculate bias and variance in ridge regression MATLAB

I can't get my mind around the concept of how to calculate bias and variance from a random set.
I have created the code to generate a random normal set of numbers.
% Generate random w, x, and noise from standard Gaussian
w = randn(10,1);
x = randn(600,10);
noise = randn(600,1);
and then extract the y values
y = x*w + noise;
After that I split my data into a training (100) and test (500) set
% Split data set into a training (100) and a test set (500)
x_train = x([ 1:100],:);
x_test = x([101:600],:);
y_train = y([ 1:100],:);
y_test = y([101:600],:);
train_l = length(y_train);
test_l = length(y_test);
Then I calculated the w for a specific value of lambda (1.2)
lambda = 1.2;
% Calculate the optimal w
A = x_train'*x_train+lambda*train_l*eye(10,10);
B = x_train'*y_train;
w_train = A\B;
Finally, I am computing the square error:
% Compute the mean squared error on both the training and the
% test set
sum_train = sum((x_train*w_train - y_train).^2);
MSE_train = sum_train/train_l;
sum_test = sum((x_test*w_train - y_test).^2);
MSE_test = sum_test/test_l;
I know that if I create a vector of lambda (I have already done that) over some iterations I can plot the average MSE_train and MSE_test as a function of lambda, where then I will be able to verify that large differences between MSE_test and MSE_train indicate high variance, thus overfit.
But, what I want to do extra, is to calculate the variance and the bias^2.
Taken from Ridge Regression Notes at page 7, it guides us how to calculate the bias and the variance.
My questions is, should I follow its steps on the whole random dataset (600) or on the training set? I think the bias^2 and the variance should be calculated on the training set. Also, in Theorem 2 (page 7 again) the bias is calculated by the negative product of lambda, W, and beta, the beta is my original w (w = randn(10,1)) am I right?
Sorry for the long post, but I really want to understand how the concept works in practice.
UPDATE 1:
Ok, so following the previous paper didn't generate any good results. So, I took the standard form of Ridge Regression Bias-Variance which is:
Based on that, I created (I used the test set):
% Bias and Variance
sum_bias=sum((y_test - mean(x_test*w_train)).^2);
Bias = sum_bias/test_l;
sum_var=sum((mean(x_test*w_train)- x_test*w_train).^2);
Variance = sum_var/test_l;
But, after 200 iterations and for 10 different lambdas this is what I get, which is not what I expected.
Where in fact, I was hoping for something like this:
sum_bias=sum((y_test - mean(x_test*w_train)).^2); Bias = sum_bias/test_l
Why have you squared the difference between y_test and y_predicted = x_test*w_train?
I don't believe your formula for bias is correct. In your question, the 'bias term' above in blue is the bias^2 however surely your formula is neither the bias nor the bias^2 since you have only squared the residuals, not the entire bias?

Calculating Empirical Risk using LIBSVM and MATLAB

I am trying to understand how to calculate empirical risk using MATLAB and LIBSVM's MATLAB bindings. I have Y outcomes (1,100) labeled as either -1 or +1, and 10D observations given by X (100,10). I then call svmtrain to get my model. The empirical risk is given by the following equation:
based on the values I receive from svmtrain how do I get f(xi, alpha)?
Here is what I have so far:
params = sprintf('-s 0 -t 0 -c %d', C);
%X1 and Y1 are values I generate
m1 = svmtrain(Y1, X1, params);
y = diag(Y1(m1.sv_indices));
x = X1(m1.sv_indices, :);
alpha = m1.sv_coef;
w = alpha'*y*x;
The empirical risk is just the ratio of misclassified training data points. yi is the actual label and f(xi, alpha) is the predicted label of xi based on the trained SVM with support vectors alpha. The formula assumes that labels are either +1 or -1.

MATLAB - How to calculate 2D least squares regression based on both x and y. (regression surface)

I have a set of data with independent variable x and y. Now I'm trying to build a two dimensional regression model that has a regression surface cutting through my data points. However, I couldn't find a way to achieve this. Can anyone give me some assistance?
You could use my favorite, polyfitn for linear or polynomial models. If you would like a different model, please edit your question or add a comment. HTH!
EDIT
Also, take a look here under Multiple Regression, likely can help you as well.
EDIT AGAIN
Sorry, I'm having too much fun with this, here's an example of multivariate regression using least squares with stock Matlab:
t = (1:10)';
x = t;
y = exp(-t);
A = [ y x ];
z = 10*y + 0.5*x;
A\z
ans =
10.0000
0.5000
If you are performing linear regression, the best tool is the regress function. Note that, if you are fitting a model of the form y(x1,x2) = b1.f(x1) + b2.g(x2) + b3 this is still a linear regression, as long as you know the functions f and g.
Nsamp = 100; %number of samples
X1 = randn(Nsamp,1); %regressor 1 (could also be some computed f(x1) )
X2 = randn(Nsamp,1); %regressor 2 (could also be some computed g(x2) )
Y = X1 + X2 + randn(Nsamp,1); %generate some data to be regressed
%now run the regression
[b,bint,r,rint,stats] = regress(Y,[X1 X2 ones(Nsamp,1)]);
% 'b' contains the coefficients, b1,b2,b3 of the fit; can be used to plot regression surface)
% 'r' contains residuals of the fit
% 'stats' contains the overall regression R^2, F stat, p-value and error variance

PCA using princomp in MATLAB (for face recognition)

I'm trying to do dimensionality reduction using MATLAB's princomp, but I'm not sure I'm doing it right.
Here is the my code just for testing, but I'm not sure that I'm doing projection right:
A = rand(4,3)
AMean = mean(A)
[n m] = size(A)
Ac = (A - repmat(AMean,[n 1]))
pc = princomp(A)
k = 2; %Number of first principal components
A_pca = Ac * pc(1:k,:)' %Not sure I'm doing projection right
reconstructedA = A_pca * pc(1:k,:)
error = reconstructedA- Ac
And my code for face recognition using ORL dataset:
%load orl_data 400x768 double matrix (400 images 768 features)
%make labels
orl_label = [];
for i = 1:40
orl_label = [orl_label;ones(10,1)*i];
end
n = size(orl_data,1);
k = randperm(n);
s = round(0.25*n); %Take 25% for train
%Raw pixels
%Split on test and train sets
data_tr = orl_data(k(1:s),:);
label_tr = orl_label(k(1:s),:);
data_te = orl_data(k(s+1:end),:);
label_te = orl_label(k(s+1:end),:);
tic
[nn_ind, estimated_label] = EuclDistClassifier(data_tr,label_tr,data_te);
toc
rate = sum(estimated_label == label_te)/size(label_te,1)
%Using PCA
tic
pc = princomp(data_tr);
toc
mean_face = mean(data_tr);
pc_n = 100;
f_pc = pc(1:pc_n,:)';
data_pca_tr = (data_tr - repmat(mean_face, [s,1])) * f_pc;
data_pca_te = (data_te - repmat(mean_face, [n-s,1])) * f_pc;
tic
[nn_ind, estimated_label] = EuclDistClassifier(data_pca_tr,label_tr,data_pca_te);
toc
rate = sum(estimated_label == label_te)/size(label_te,1)
If I choose enough principal components it gives me equal recognition rates. If I use a small number of principal components (PCA) then the rate using PCA is poorer.
Here are some questions:
Is princomp function the best way to calculate first k principal components using MATLAB?
Using PCA projected features vs raw features don't give extra accuracy, but only smaller features vector size? (faster to compare feature vectors).
How to automatically choose min k (number of principal components) that give the same accuracy vs raw feature vector?
What if I have very big set of samples can I use only subset of them with comparable accuracy? Or can I compute PCA on some set and later "add" some other set (I don't want to recompute pca for set1+set2, but somehow iteratively add information from set2 to existing PCA from set1)?
I also tried the GPU version simply using gpuArray:
%Test using GPU
tic
A_cpu = rand(30000,32*24);
A = gpuArray(A_cpu);
AMean = mean(A);
[n m] = size(A)
pc = princomp(A);
k = 100;
A_pca = (A - repmat(AMean,[n 1])) * pc(1:k,:)';
A_pca_cpu = gather(A_pca);
toc
clear;
tic
A = rand(30000,32*24);
AMean = mean(A);
[n m] = size(A)
pc = princomp(A);
k = 100;
A_pca = (A - repmat(AMean,[n 1])) * pc(1:k,:)';
toc
clear;
It is working faster, but it's not suitable for big matrices. Maybe I'm wrong?
If I use a big matrix, it gives me:
Error using gpuArray Out of memory on device.
"Is princomp function the best way to calculate first k principal components using MATLAB?"
It's computing a full SVD, so it will be slow on large datasets. You can speed this up significantly by specifying the number of dimensions you need at the start and computing a partial svd. The matlab functions for a partial svd is svds.
If svds' not fast enough for you there's a more modern implementation here:
http://cims.nyu.edu/~tygert/software.html (matlab version: http://code.google.com/p/framelet-mri/source/browse/pca.m )
(cf the paper describing the algorithm http://cims.nyu.edu/~tygert/blanczos.pdf )
You can control the error of your approximation by increasing the number of singular vectors computed, there's precise bounds on that in the linked paper. Here's an example:
>> A = rand(40,30); %random rank-30 matrix
>> [U,S,V] = pca(A,2); %compute a rank-2 approximation to A
>> norm(A-U*S*V',2)/norm(A,2) %relative error
ans =
0.1636
>> [U,S,V] = pca(A,25); %compute a rank-25 approximation to A
>> norm(A-U*S*V',2)/norm(A,2) %relative error
ans =
0.0410
When you have large data and a sparse matrix computing a full SVD is often impossible since the factors will never be sparse. In this case you must compute a partial SVD to fit within memory. Example:
>> A = sprandn(5000,5000,10000);
>> tic;[U,S,V]=pca(A,2);toc;
no pivots
Elapsed time is 124.282113 seconds.
>> tic;[U,S,V]=svd(A);toc;
??? Error using ==> svd
Use svds for sparse singular values and vectors.
>> tic;[U,S,V]=princomp(A);toc;
??? Error using ==> svd
Use svds for sparse singular values and vectors.
Error in ==> princomp at 86
[U,sigma,coeff] = svd(x0,econFlag); % put in 1/sqrt(n-1) later
>> tic;pc=princomp(A);toc;
??? Error using ==> eig
Use eigs for sparse eigenvalues and vectors.
Error in ==> princomp at 69
[coeff,~] = eig(x0'*x0);