I have this vector [10000000000 10000000001 10000000002]
and i try to calculate its variance using this formula
i calculate it but the answer that i get is 3.33333333466667e+19
which is wrong, because the correct answer is 1.
What i am doing wrong?
the MATLAB code is
total=0;
m1=data(1);
m2=(data(2)-m1)/2;
q1=0;
q2=q1+(((2-1)/2)*((data(2)-m1)^2));
q3=q2+(((3-1)/3)*((data(3)-m2)^2));
variance=q3/(3-1)
Thanks
M is a mean calculation, it is supposed to be
Mk = ((k-1) M(k-1) + xk)/k
thus
m1=data(1);
m2=(data(2)+m1)/2;
q1=0;
q2=q1+(((2-1)/2)*((data(2)-m1)^2));
q3=q2+(((3-1)/3)*((data(3)-m2)^2));
variance=q3/(3-1)
variance =
1
what the heck, I'm feeling generous, the complete code for a generic size data:
sizle = size(data,2);
M = zeros(1, sizle);
Q = M;
Variance = Q;
M(1)=data(1);
for i = 2:sizle
M(i)=((i-1)*M(i-1) + data(i))/i;
Q(i)=Q(i-1)+(i-1)*((data(i)-M(i-1))^2)/i;
Variance(i) = Q(i)/(i-1);
end
Variance(end)
var(data)
Related
I'm trying to verify if my implementation of Logistic Regression in Matlab is good. I'm doing so by comparing the results I get via my implementation with the results given by the built-in function mnrfit.
The dataset D,Y that I have is such that each row of D is an observation in R^2 and the labels in Y are either 0 or 1. Thus, D is a matrix of size (n,2), and Y is a vector of size (n,1)
Here's how I do my implementation:
I first normalize my data and augment it to include the offset :
d = 2; %dimension of data
M = mean(D) ;
centered = D-repmat(M,n,1) ;
devs = sqrt(sum(centered.^2)) ;
normalized = centered./repmat(devs,n,1) ;
X = [normalized,ones(n,1)];
I will be doing my calculations on X.
Second, I define the gradient and hessian of the likelihood of Y|X:
function grad = gradient(w)
grad = zeros(1,d+1) ;
for i=1:n
grad = grad + (Y(i)-sigma(w'*X(i,:)'))*X(i,:) ;
end
end
function hess = hessian(w)
hess = zeros(d+1,d+1) ;
for i=1:n
hess = hess - sigma(w'*X(i,:)')*sigma(-w'*X(i,:)')*X(i,:)'*X(i,:) ;
end
end
with sigma being a Matlab function encoding the sigmoid function z-->1/(1+exp(-z)).
Third, I run the Newton algorithm on gradient to find the roots of the gradient of the likelihood. I implemented it myself. It behaves as expected as the norm of the difference between the iterates goes to 0. I wrote it based on this script.
I verified that the gradient at the wOPT returned by my Newton implementation is null:
gradient(wOP)
ans =
1.0e-15 *
0.0139 -0.0021 0.2290
and that the hessian has strictly negative eigenvalues
eig(hessian(wOPT))
ans =
-7.5459
-0.0027
-0.0194
Here's the wOPT I get with my implementation:
wOPT =
-110.8873
28.9114
1.3706
the offset being the last element. In order to plot the decision line, I should convert the slope wOPT(1:2) using M and devs. So I set :
my_offset = wOPT(end);
my_slope = wOPT(1:d)'.*devs + M ;
and I get:
my_slope =
1.0e+03 *
-7.2109 0.8166
my_offset =
1.3706
Now, when I run B=mnrfit(D,Y+1), I get
B =
-1.3496
1.7052
-1.0238
The offset is stored in B(1).
I get very different values. I would like to know what I am doing wrong. I have some doubt about the normalization and 'un-normalization' process. But I'm not sure, may be I'm doing something else wrong.
Additional Info
When I tape :
B=mnrfit(normalized,Y+1)
I get
-1.3706
110.8873
-28.9114
which is a rearranged version of the opposite of my wOPT. It contains exactly the same elements.
It seems likely that my scaling back of the learnt parameters is wrong. Otherwise, it would have given the same as B=mnrfit(D,Y+1)
I asked this question in Math Stackexchange, but it seems it didn't get enough attention there so I am asking it here. https://math.stackexchange.com/questions/1729946/why-do-we-say-svd-can-handle-singular-matrx-when-doing-least-square-comparison?noredirect=1#comment3530971_1729946
I learned from some tutorials that SVD should be more stable than QR decomposition when solving Least Square problem, and it is able to handle singular matrix. But the following example I wrote in matlab seems to support the opposite conclusion. I don't have a deep understanding of SVD, so if you could look at my questions in the old post in Math StackExchange and explain it to me, I would appreciate a lot.
I use a matrix that have a large condition number(e+13). The result shows SVD get a much larger error(0.8) than QR(e-27)
% we do a linear regression between Y and X
data= [
47.667483331 -122.1070832;
47.667483331001 -122.1070832
];
X = data(:,1);
Y = data(:,2);
X_1 = [ones(length(X),1),X];
%%
%SVD method
[U,D,V] = svd(X_1,'econ');
beta_svd = V*diag(1./diag(D))*U'*Y;
%% QR method(here one can also use "\" operator, which will get the same result as I tested. I just wrote down backward substitution to educate myself)
[Q,R] = qr(X_1)
%now do backward substitution
[nr nc] = size(R)
beta_qr=[]
Y_1 = Q'*Y
for i = nc:-1:1
s = Y_1(i)
for j = m:-1:i+1
s = s - R(i,j)*beta_qr(j)
end
beta_qr(i) = s/R(i,i)
end
svd_error = 0;
qr_error = 0;
for i=1:length(X)
svd_error = svd_error + (Y(i) - beta_svd(1) - beta_svd(2) * X(i))^2;
qr_error = qr_error + (Y(i) - beta_qr(1) - beta_qr(2) * X(i))^2;
end
You SVD-based approach is basically the same as the pinv function in MATLAB (see Pseudo-inverse and SVD). What you are missing though (for numerical reasons) is using a tolerance value such that any singular values less than this tolerance are treated as zero.
If you refer to edit pinv.m, you can see something like the following (I won't post the exact code here because the file is copyrighted to MathWorks):
[U,S,V] = svd(A,'econ');
s = diag(S);
tol = max(size(A)) * eps(norm(s,inf));
% .. use above tolerance to truncate singular values
invS = diag(1./s);
out = V*invS*U';
In fact pinv has a second syntax where you can explicitly specify the tolerance value pinv(A,tol) if the default one is not suitable...
So when solving a least-squares problem of the form minimize norm(A*x-b), you should understand that the pinv and mldivide solutions have different properties:
x = pinv(A)*b is characterized by the fact that norm(x) is smaller than the norm of any other solution.
x = A\b has the fewest possible nonzero components (i.e sparse).
Using your example (note that rcond(A) is very small near machine epsilon):
data = [
47.667483331 -122.1070832;
47.667483331001 -122.1070832
];
A = [ones(size(data,1),1), data(:,1)];
b = data(:,2);
Let's compare the two solutions:
x1 = A\b;
x2 = pinv(A)*b;
First you can see how mldivide returns a solution x1 with one zero component (this is obviously a valid solution because you can solve both equations by multiplying by zero as in b + a*0 = b):
>> sol = [x1 x2]
sol =
-122.1071 -0.0537
0 -2.5605
Next you see how pinv returns a solution x2 with a smaller norm:
>> nrm = [norm(x1) norm(x2)]
nrm =
122.1071 2.5611
Here is the error of both solutions which is acceptably very small:
>> err = [norm(A*x1-b) norm(A*x2-b)]
err =
1.0e-11 *
0 0.1819
Note that use mldivide, linsolve, or qr will give pretty much same results:
>> x3 = linsolve(A,b)
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 2.159326e-16.
x3 =
-122.1071
0
>> [Q,R] = qr(A); x4 = R\(Q'*b)
x4 =
-122.1071
0
SVD can handle rank-deficiency. The diagonal matrix D has a near-zero element in your code and you need use pseudoinverse for SVD, i.e. set the 2nd element of 1./diag(D) to 0 other than the huge value (10^14). You should find SVD and QR have equally good accuracy in your example. For more information, see this document http://www.cs.princeton.edu/courses/archive/fall11/cos323/notes/cos323_f11_lecture09_svd.pdf
Try this SVD version called block SVD - you just set the iterations equal to the accuracy you want - usually 1 is enough. If you want all the factors (this has a default # selected for factor reduction) then edit the line k= to the size(matrix) if I recall my MATLAB correctly
A= randn(100,5000);
A=corr(A);
% A is your correlation matrix
tic
k = 1000; % number of factors to extract
bsize = k +50;
block = randn(size(A,2),bsize);
iter = 2; % could set via tolerance
[block,R] = qr(A*block,0);
for i=1:iter
[block,R] = qr(A*(A'*block),0);
end
M = block'*A;
% Economy size dense SVD.
[U,S] = svd(M,0);
U = block*U(:,1:k);
S = S(1:k,1:k);
% Note SVD of a symmetric matrix is:
% A = U*S*U' since V=U in this case, S=eigenvalues, U=eigenvectors
V=real(U*sqrt(S)); %scaling matrix for simulation
toc
% reduced randomized matrix for simulation
sims = 2000;
randnums = randn(k,sims);
corrrandnums = V*randnums;
est_corr_matrix = corr(corrrandnums');
total_corrmatrix_difference =sum(sum(est_corr_matrix-A))
I am trying to run Ridge regression in Matlab (ridge): L is some matrix, x is some random vector and y=Lx+αn is another vector. I expect ridge(y,L,α) to return the same result as:
(LL'+α^2I)^(−1)L'y, but the latter is significantly better. I can't understand the problem as I think this is exactly what ridge() does. I even tried with α=1.
For example:
n = randn(N^2,1);
n2 = randn(N^2,1);
L = (some N^2*N^2 matrix);
y = L*n + n2;
x_ridge = ridge(y,L,1);
x_ls = (L*L' + eye(N^2))^-1*L'*y;
and x_ls and x_ridge are significantly different.
Appreciate any help!
I'm trying to code Taylor summation for a function in Matlab, I actually evaluate McLaurin making x=0, named a in this code after this notation:
This is the code I've tried out so far:
>> a = -100;
b = 100;
n = 20;
vectorx = linspace(a,b,50);
vectory = [];
sumterms = [];
syms x y a;
y = sin(a);
for i = 1:n
t = (diff(y,i-1) / factorial(i-1)) * (x-0)^(i-1);
sumterms = [sumterms;t];
end;
sumterms
for j = 1:length(vectorx)
x_i = vectorx(j);
aux = 0;
for k = 1:length(sumterms)
sumterm = sumterms(k);
aux = aux + subs(sumterm, [a,x], [0,x_i]);
end
vectory = [vectory;aux];
end
length(vectory)
length(vectorx)
plot (vectorx, vectory)
But I'm not getting correct results
I've step over each sentence and I can't really see what's wrong about it.
This is my plot result for sin(x):
And this is plot for exponential(x)
Sumterms results for each appear in the image capture, and they seem to be ok, however I think error is in evaluation.
Your code works correctly and plots the correct Taylor polynomial. Your mistake is that you are expecting a 20th degree Taylor polynomial to approximate the sine function on the interval [-100,100]. This is much too optimistic. It gives a decent approximation on [-5,5] and that's about it. Here is the output of your code with a=-9, b=9, where the problems already appear:
Indeed, x^n/n! is about (e*x/n)^n, using a rough version of Stirling's formula. So you need |x| < n/e to avoid a catastrophically large error term, and an even smaller interval (like |x| < n/3 ) to have a good approximation. With n=20, n/3 is 6.66...
I have a matrix containing angles and I need to compute the mean and the variance.
for the mean I procede in this way:
for each angles compute sin and cos and sum all sin and all cos
the mean is given by the atan2(sin, cos)
and it works
my question is how to compute the variance of angles knowing the mean?
thank you for the answers
I attach my matlab code:
for i=1:size(im2,1)
for j=1:size(im2,2)
y=y+sin(hue(i, j));
x=x+cos(hue(i, j));
end
end
mean=atan2(y, x);
if mean<0
mean=mean+(2*pi);
end
In order to compute variance of an angle you can not use standard variance. This is the formulation to compute var of an angles:
R = 1 - sqrt((sum(sin(angle)))^2 + (sum(cos(angle)))^2)/n;
There is similar other formulation as well:
var(angle) = var(sin(angle)) + var(cos(angle));
Ref:
http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/nmr_manual/man_cv.html
I am not 100% sure what you are doing, but maybe this would achieve the same thing with the build in MATLAB functions mean and var.
>> [file path] = uigetfile;
>> someImage = imread([path file]);
>> hsv = rgb2hsv(someImage);
>> hue = hsv(:,:,1);
>> m = mean(hue(:))
m =
0.5249
>> v = var(hue(:))
v =
0.2074
EDIT: I am assuming you have an image because of your variable name hue. But it would be the same for any matrix.
EDIT 2: Maybe that is what you are looking for:
>> sumsin = sum(sin(hue(:)));
>> sumcos = sum(cos(hue(:)));
>> meanvalue = atan2(sumsin,sumcos)
meanvalue =
0.5276
>> sumsin = sum(sin((hue(:)-meanvalue).^2));
>> sumcos = sum(cos((hue(:)-meanvalue).^2));
>> variance = atan2(sumsin,sumcos)
variance =
0.2074
The variance of circular data can not be treated like the variance of unbounded data on the real line. (For very small variances, they are effectively equivalent, but for large variances the equivalence breaks down. It should be clear to you why this is.) I recommend Statistical Analysis of Circular Data by N.I. Fisher. This book contains a widely-used definition of circular variance that is computed from the mean resultant length of the unit vectors that correspond to the angles.
>> sumsin = sum(sin((hue(:)-meanvalue).^2));
>> sumcos = sum(cos((hue(:)-meanvalue).^2));
is wrong. You can't subtract angles like that.
By the way, this question really has nothing to do with MATLAB. You can probably get more/better answers posting on the statistics stack exchange
We had the same problem and in python, we could fix that with using scipy.cirvar which computes the circular variance for samples assumed to be in a range. For example:
from scipy.stats import circvar
circvar([0, 2*np.pi/3, 5*np.pi/3])
# 2.19722457734
circvar([0, 2*np.pi])
# -0.0
The problem with the proposed MATLAB code is that the variance of [0, 6.28] should be zero. By looking at the implementation of scipy.circvar it is like:
# Recast samples as radians that range between 0 and 2 pi and calculate the sine and cosine
samples, sin_samp, cos_samp, nmask = _circfuncs_common(samples, high, low)
sin_mean = sin_samp.mean()
cos_mean = cos_samp.mean()
R = np.minimum(1, hypot(sin_mean, cos_mean))
circular_variance = ((high - low)/2.0/pi)**2 * -2 * np.log(R)