Statistical test gives result different than reality? - matlab

Let's define X as :
and objects releated to it:
I want to calculation value of function following, and plot it on the same graph with histogram eigenvalues of Y.
After that I want to perform chi2gof test to judge weather those two distributions converge to each other or not. I want to use parameter expected with is designed to compare distributions of function and histogram.
My work so far
clf;
m=8000;
n=10000;
X=randn(m,n);
Y=X*X'/n;
sd=std(X(:));
l=m/n;
eigs=eig(Y);
lp=sd^2*(1+sqrt(l))^2;
lm=sd^2*(1-sqrt(l))^2;
x=linspace(0.00000001,lp,n);
for i = 1:length(x)
if (and(x(i) <= lp, x(i) >= lm))
dv(i) = sqrt((lp-x(i)).*(x(i)-lm))./(2*pi*sd^2*l.*x(i));
else
dv(i) = 0;
end
end
This code fully calculates my function dv. Now to plot it on histogram, and to add sens to it I normalized histogram to have unit area.
hold on;
[h, centres] = hist(eigs, 50);
% normalise to unit area
norm_h = h / (numel(eigs) * (centres(2)-centres(1)));
bar(centres, norm_h);
plot(x, dv, "r");
hold off;
The result from this code is image following:
As we can see the dv line really nicely fits the histogram. We can be almost sure that chi square test for same distribution should output p value very close to 1 (it means that samples are from the same distribution). However code
[h,p,stats] = chi2gof(dv,'Expected',norm_h);
outputs
p =
0
It means that null hypothesis of the same distributions were rejected. My question is - how ? Am I using something incorrectly, or this pvalue is really 0 ?

Related

Where did i do wrong when i tried to approximatee this data using polynomial?

I am starting to learn numerical analysis using MATLAB in my course. so far we have covered polynomial interpolation (spline, polyfit, constraint spline, etc.) I was doing this practice question and I can not get the correct answer. I have uploaded the code I used and the question, where did I do wrong? thanks in advance!
close all; clear all; clc;
format long e
x = linspace(0,1,8);
xplot = linspace(0,1);
f = #(x) atan(x.*(x+1));
y_val = f(xplot);
c = polyfit(x,f(x),7);
p = polyval(c,0.7);
err = abs(f(0.7)-p)/(f(0.7))
The question I encountered is seen in the picture
After some playing around, it seems to be a matter of computing the absolute error instead of the relative absolute error.
The code below yields the desired answer. And yes, it is pretty unclear from the question which error is intended.
% Definitions
format long e
x = linspace(0,1,8)';
xplot= linspace(0,1);
f = #(x) atan(x.*(x+1));
y_val = f(xplot);
% Degree of polynomial
n = 7;
% Points to evaluate function
point1 = 0.5;
point2 = 0.7;
% Fit
c= polyfit(x,f(x),n);
% Evaluate
approxPoint1 = polyval(c, point1);
approxPoint2 = polyval(c, point2);
% Absolute errors
errPoint1 = abs( f(point1) - approxPoint1)
errPoint2 = abs( f(point2) - approxPoint2)
What you did wrong was :
mixing absolute and relative values when calculating errors to feed resulting variable err.
incorrectly placing abs() parentheses when calculating err: Your abs() only fixes the numerator, but then the denominator. To obtain |f(0.7)| you also need another abs(f(0.7))
for point x=0.7 instead of
err = abs(f(0.7)-p)/(f(0.7))
could well simply be
err = abs(f(.7)-p));
you only calculate err for assessment point 0.5 . In order to choose among the possible candidates of what seems to be a MATLAB Associate test multichoice answer, one needs err on 0.5 and err on 0.7 and then match the pair in the correct order, among all offered possible answers.
Although it's common practice to approach polynomial approximation of N points with an N-1 degree polynomial, it's often possible to approximate below satisfactory error with lower degree polynomials than N-1.
Lower degree polynomials means less calculations, less time spent approximating points. If one obtains a fair enough approximation with for instance a degree 4 polynomial, why waste time calculating an approximation with a higher degree all the way up to N-1?
Since the question does not tell what degree should have the approximating polynomial, you have to find it sweeping all polynomial degrees from 1 to up to a reasonable order.
The last error I found is that you have used linspace without specifying amount of points, MATLAB then takes by default 100 points hoping it's going to be ok.
Well, in your question 100 points is way too low an amount of points as I am going to show after supplying the following lines, as mentioned in previous point, sweeping all possible approximating polynomials, NOT on default 100 points you tacitly chose.
np=8; % numel(x) amount supplied points
N=1e3; % amount points x grid we build to measure f(x)
x= linspace(0,1,np);
xplot = linspace(0,1,N);
f = #(x) atan(x.*(x+1)); % function to approximate
y_val = f(xplot); % f(x)
xm=[.5 .7]; % points where to asses error
% finding poly coeffs
err1=zeros(2,np+2); % 1st row are errors on x=.5, 2nd row errors on x=.7
figure(1);
ax1=gca
hp1=plot(xplot,y_val)
grid on;
hp1.LineWidth=3;
hp1.Color='r';
hold on
for k=1:1:np+2
c = polyfit(x,f(x),k);
p_01 = polyval(c,xm(1));
err1(1,k) = abs(f(xm(1))-p_01);
% err(1,k) = abs((f(0.5)-p_05)/(f(0.5)))
p_02 = polyval(c,xm(2));
err1(2,k) = abs(f(xm(2))-p_02);
% err(2,k) = abs((f(0.7)-p_07)/(f(0.7)))
plot(x,polyval(c,x),'LineWidth',1.5); %'Color','b');
end
err1
.
.
The only pair of errors matching in the correct order are indeed those of polynomial order 7, but the total smallest error corresponds to the approximating polynomial of order 6.
What happens when taking linspace without defining a large enough amount of points? let's have a look:
np=8; % numel(x) amount supplied points
% N=1e3; % amount points x grid we build to measure f(x)
x= linspace(0,1,np);
xplot = linspace(0,1); %,N);
f = #(x) atan(x.*(x+1)); % function to approximate
y_val = f(xplot); % f(x)
xm=[.5 .7]; % points where to asses error
% finding poly coeffs
err1=zeros(2,np+2); % 1st row are errors on x=.5, 2nd row errors on x=.7
figure(1);
ax1=gca
hp1=plot(xplot,y_val)
grid on;
hp1.LineWidth=3;
hp1.Color='r';
hold on
for k=1:1:np+2
c = polyfit(x,f(x),k);
p_01 = polyval(c,xm(1));
err1(1,k) = abs(f(xm(1))-p_01);
% err(1,k) = abs(f(0.5)-p_05)/abs(f(0.5))
p_02 = polyval(c,xm(2));
err1(2,k) = abs(f(xm(2))-p_02);
% err(2,k) = abs(f(0.7)-p_07)/abs(f(0.7))
plot(x,polyval(c,x),'LineWidth',1.5); %'Color','b');
end
err1
With only 100 points all errors come up way too large, not a single error anywhere near 1e-5 or 1e-6.
This is why one couldn't tell which pair to go for, because all obtained values where at least 5 orders of magnitude away from landing zone.
I was about to include a plot with legend, but the visualization of this particular approach is in this case and in my opinion at best misleading, as in both plots for 100 and 1000 points, at 1st glance, both look as if the errors should be similar regardless of the amount of grid points used.
But as shown above 1e2 points cannot approximate the function,
it's like pitch dark, looking for something and pointing torch 180 from where we should be aiming at, not a chance to spot it.
Yet 1e3 grid points produce a pair of errors matching one of the possible answers, this is option D.
I hope it helps, thanks for reading my answer.

Finding values of x for a given y when approaching a limit

I'm trying to find two x values for each y value on a plot that is very similar to a Gaussian fn. The difficulty is that I need to be able to find the values of x for several values of y even when the gaussian fn is very close to zero.
I can't post an image due to being a new user, however think of a gaussian function and then the regions where it is close to zero on either side of the peak. This part where the fn is very close to reaching zero is where I need to find the x values for a given y.
What I've tried:
When the fn is discrete: I have tried interp1, however I get the error that it is not strictly monotonic increasing because of the many values that are close to zero.
When I fit a two-term gaussian:
I use fzero (fzero(function-yvalue)) however I get a lot of NaN's. These might be from me not having a close enough 'guess' value??
Does anyone have any other suggestions for me to try? Or how to improve what I've already attempted?
Thanks everyone
EDIT:
I've added a picture below. The data that I actually have is the blue line, while the fitted eqn is in red. The eqn should be accurate enough.
Again, I'm trying to pick out x values for a given y where y is very small (approaching 0).
I've tried splitting the function into left and right halves for the interpolation and fzero method.
Thanks for your responses anyway, I'll have a look at bisection.
Fitting a Gaussian seems to be uneffective, as its deviation (in the x-coordinate) from the real data is noticeable.
Since your data is already presented as a numeric vector y, the straightforward find(y<y0) seems adequate. Here is a sample code, in which the y-values are produced from a perturbed Gaussian.
x = 0:1:700;
y = 2000*exp(-((x-200)/50).^2 - sin(x/100).^2); % imitated data
plot(x,y)
y0 = 1e-2; % the y-value to look for
i = min(find(y>y0)); % first entry above y0
if i == 1
x1 = x(i);
else
x1 = x(i) - y(i)*(x(i)-x(i-1))/(y(i)-y(i-1)); % linear interpolation
end
i = max(find(y>y0)); % last entry above y0
if i == numel(y)
x2 = x(i);
else
x2 = x(i) - y(i)*(x(i)-x(i+1))/(y(i)-y(i+1)); % linear interpolation
end
fprintf('Roots: %g, %g \n', x1, x2)
Output: Roots: 18.0659, 379.306
The curve looks much like your plot.

MATLAB: 3d reconstruction using eight point algorithm

I am trying to achieve 3d reconstruction from 2 images. Steps I followed are,
1. Found corresponding points between 2 images using SURF.
2. Implemented eight point algo to find "Fundamental matrix"
3. Then, I implemented triangulation.
I have got Fundamental matrix and results of triangulation till now. How do i proceed further to get 3d reconstruction? I'm confused reading all the material available on internet.
Also, This is code. Let me know if this is correct or not.
Ia=imread('1.jpg');
Ib=imread('2.jpg');
Ia=rgb2gray(Ia);
Ib=rgb2gray(Ib);
% My surf addition
% collect Interest Points from Each Image
blobs1 = detectSURFFeatures(Ia);
blobs2 = detectSURFFeatures(Ib);
figure;
imshow(Ia);
hold on;
plot(selectStrongest(blobs1, 36));
figure;
imshow(Ib);
hold on;
plot(selectStrongest(blobs2, 36));
title('Thirty strongest SURF features in I2');
[features1, validBlobs1] = extractFeatures(Ia, blobs1);
[features2, validBlobs2] = extractFeatures(Ib, blobs2);
indexPairs = matchFeatures(features1, features2);
matchedPoints1 = validBlobs1(indexPairs(:,1),:);
matchedPoints2 = validBlobs2(indexPairs(:,2),:);
figure;
showMatchedFeatures(Ia, Ib, matchedPoints1, matchedPoints2);
legend('Putatively matched points in I1', 'Putatively matched points in I2');
for i=1:matchedPoints1.Count
xa(i,:)=matchedPoints1.Location(i);
ya(i,:)=matchedPoints1.Location(i,2);
xb(i,:)=matchedPoints2.Location(i);
yb(i,:)=matchedPoints2.Location(i,2);
end
matchedPoints1.Count
figure(1) ; clf ;
imshow(cat(2, Ia, Ib)) ;
axis image off ;
hold on ;
xbb=xb+size(Ia,2);
set=[1:matchedPoints1.Count];
h = line([xa(set)' ; xbb(set)'], [ya(set)' ; yb(set)']) ;
pts1=[xa,ya];
pts2=[xb,yb];
pts11=pts1;pts11(:,3)=1;
pts11=pts11';
pts22=pts2;pts22(:,3)=1;pts22=pts22';
width=size(Ia,2);
height=size(Ib,1);
F=eightpoint(pts1,pts2,width,height);
[P1new,P2new]=compute2Pmatrix(F);
XP = triangulate(pts11, pts22,P2new);
eightpoint()
function [ F ] = eightpoint( pts1, pts2,width,height)
X = 1:width;
Y = 1:height;
[X, Y] = meshgrid(X, Y);
x0 = [mean(X(:)); mean(Y(:))];
X = X - x0(1);
Y = Y - x0(2);
denom = sqrt(mean(mean(X.^2+Y.^2)));
N = size(pts1, 1);
%Normalized data
T = sqrt(2)/denom*[1 0 -x0(1); 0 1 -x0(2); 0 0 denom/sqrt(2)];
norm_x = T*[pts1(:,1)'; pts1(:,2)'; ones(1, N)];
norm_x_ = T*[pts2(:,1)';pts2(:,2)'; ones(1, N)];
x1 = norm_x(1, :)';
y1= norm_x(2, :)';
x2 = norm_x_(1, :)';
y2 = norm_x_(2, :)';
A = [x1.*x2, y1.*x2, x2, ...
x1.*y2, y1.*y2, y2, ...
x1, y1, ones(N,1)];
% compute the SVD
[~, ~, V] = svd(A);
F = reshape(V(:,9), 3, 3)';
[FU, FS, FV] = svd(F);
FS(3,3) = 0; %rank 2 constrains
F = FU*FS*FV';
% rescale fundamental matrix
F = T' * F * T;
end
triangulate()
function [ XP ] = triangulate( pts1,pts2,P2 )
n=size(pts1,2);
X=zeros(4,n);
for i=1:n
A=[-1,0,pts1(1,i),0;
0,-1,pts1(2,i),0;
pts2(1,i)*P2(3,:)-P2(1,:);
pts2(2,i)*P2(3,:)-P2(2,:)];
[~,~,va] = svd(A);
X(:,i) = va(:,4);
end
XP(:,:,1) = [X(1,:)./X(4,:);X(2,:)./X(4,:);X(3,:)./X(4,:); X(4,:)./X(4,:)];
end
function [ P1,P2 ] = compute2Pmatrix( F )
P1=[1,0,0,0;0,1,0,0;0,0,1,0];
[~, ~, V] = svd(F');
ep = V(:,3)/V(3,3);
P2 = [skew(ep)*F,ep];
end
From a quick look, it looks correct. Some notes are as follows:
You normalized code in eightpoint() is no ideal.
It is best done on the points involved. Each set of points will have its scaling matrix. That is:
[pts1_n, T1] = normalize_pts(pts1);
[pts2_n, T2] = normalize-pts(pts2);
% ... code
% solution
F = T2' * F * T
As a side note (for efficiency) you should do
[~,~,V] = svd(A, 0);
You also want to enforce the constraint that the fundamental matrix has rank-2. After you compute F, you can do:
[U,D,v] = svd(F);
F = U * diag([D(1,1),D(2,2), 0]) * V';
In either case, normalization is not the only key to make the algorithm work. You'll want to wrap the estimation of the fundamental matrix in a robust estimation scheme like RANSAC.
Estimation problems like this are very sensitive to non Gaussian noise and outliers. If you have a small number of wrong correspondence, or points with high error, the algorithm will break.
Finally, In 'triangulate' you want to make sure that the points are not at infinity prior to the homogeneous division.
I'd recommend testing the code with 'synthetic' data. That is, generate your own camera matrices and correspondences. Feed them to the estimate routine with varying levels of noise. With zero noise, you should get an exact solution up to floating point accuracy. As you increase the noise, your estimation error increases.
In its current form, running this on real data will probably not do well unless you 'robustify' the algorithm with RANSAC, or some other robust estimator.
Good luck.
Good luck.
Which version of MATLAB do you have?
There is a function called estimateFundamentalMatrix in the Computer Vision System Toolbox, which will give you the fundamental matrix. It may give you better results than your code, because it is using RANSAC under the hood, which makes it robust to spurious matches. There is also a triangulate function, as of version R2014b.
What you are getting is sparse 3D reconstruction. You can plot the resulting 3D points, and you can map the color of the corresponding pixel to each one. However, for what you want, you would have to fit a surface or a triangular mesh to the points. Unfortunately, I can't help you there.
If what you're asking is how to I proceed from fundamental Matrix + corresponding points to a dense model then you still have a lot of work ahead of you.
relative camera locations (R,T) can be calculated from a fundamental matrix assuming you know the internal camera params (up to scale, rotation, translation). To get a full dense matrix there are a few ways to go. you can try using an existing library (PMVS for example). I'd look into OpenMVG but I'm not sure about matlab interface.
Another way to go, you can compute a dense optical flow (many available for matlab). Look for a epipolar OF (It takes a fundamental matrix and restricts the solution to lie on the epipolar lines). Then you can triangulate every pixel to get a depthmap.
Finally you will have to play with format conversions to get from a depthmap to VRML (You can look at meshlab)
Sorry my answer isn't more Matlab oriented.

empirical mean and variance plot in matlab with the normal distribution

I'am new in matlab programming,I should Write a script to generate a random sequence (x1,..., XN) of size N following the normal distribution N (0, 1) and calculate the empirical mean mN and variance σN^2
Then,I should plot them:
this is my essai:
function f = normal_distribution(n)
x =randn(n);
muem = 1./n .* (sum(x));
muem
%mean(muem)
vaem = 1./n .* (sum((x).^2));
vaem
hold on
plot(x,muem,'-')
grid on
plot(x,vaem,'*')
NB:those are the formules that I have used:
I have obtained,a Figure and I don't know if is it correct or not ,thanks for Help
From your question, it seems what you want to do is calculate the mean and variance from a sample of size N (nor an NxN matrix) drawn from a standard normal distribution. So you may want to use randn(n, 1), instead of randn(n). Also as #ThP pointed out, it does not make sense to plot mean and variance vs. x. What you could do is to calculate means and variances for inceasing sample sizes n1, n2, ..., nm, and then plot sample size vs. mean or variance, to see them converge to 0 and 1. See the code below:
function [] = plotMnV(nIter)
means = zeros(nIter, 1);
vars = zeros(nIter, 1);
for pow = 1:nIter
n = 2^pow;
x =randn(n, 1);
means(pow) = 1./n * sum(x);
vars(pow) = 1./n * sum(x.^2);
end
plot(1:nIter, means, 'o-');
hold on;
plot(1:nIter, vars, '*-');
end
For example, plotMnV(20) gave me the plot below.

simulating a random walk in matlab

i have variable x that undergoes a random walk according to the following rules:
x(t+1)=x(t)-1; probability p=0.3
x(t+1)=x(t)-2; probability q=0.2
x(t+1)=x(t)+1; probability p=0.5
a) i have to create this variable initialized at zero and write a for loop for 100 steps and that runs 10000 times storing each final value in xfinal
b) i have to plot a probability distribution of xfinal (a histogram) choosing a bin size and normalization!!* i have to report the mean and variance of xfinal
c) i have to recreate the distribution by application of the central limit theorem and plot the probability distribution on the same plot!
help would be appreciated in telling me how to choose the bin size and normalize the histogram and how to attempt part c)
your help is much appreciated!!
p=0.3;
q=0.2;
s=0.5;
numberOfSteps = 100;
maxCount = 10000;
for count=1:maxCount
x=0;
for i = 1:numberOfSteps
random = rand(1, 1);
if random <=p
x=x-1;
elseif random<=(p+q)
x=x-2;
else
x=x+1;
end
end
xfinal(count) = x;
end
[f,x]=hist(xfinal,30);
figure(1)
bar(x,f/sum(f));
xlabel('xfinal')
ylabel('frequency')
mean = mean(xfinal)
variance = var(xfinal)
For the first question, check the help for hist on mathworks homepage
[nelements,centers] = hist(data,nbins);
You do not select the bin size, but the number of bins. nelements gives the elements per bin and center is all the bin centers. So to say, it would be the same to call
hist(data,nbins);
as
[nelements,centers] = hist(data,nbins);
plot(centers,nelements);
except that the representation is different (line or pile). To normalize, simply divide nelements with sum(nelements)
For c, here i.i.d. variables it actually is a difference if the variables are real or complex. However for real variables the central limit theorem in short tells you that for a large number of samples the distribution will limit the normal distribution. So if the samples are real, you simply asssumes a normal distribution, calculates the mean and variance and plots this as a normal distribution. If the variables are complex, then each of the variables will be normally distributed which means that you will have a rayleigh distribution instead.
Mathworks is deprecating hist that is being replaced with histogram.
more details in this link
You are not applying the PDF function as expected, the expression Y doesn't work
For instance Y does not have the right X-axis start stop points. And you are using x as input to Y while x already used as pivot inside the double for loop.
When I ran your code Y generates a single value, it is not a vector but just a scalar.
This
bar(x,f/sum(f));
bringing down all input values with sum(f) division? no need.
On attempting to overlap the ideal probability density function, often one has to do additional scaling, to have both real and ideal visually overlapped.
MATLAB can do the scaling for us, and no need to modify input data /sum(f).
With a dual plot using yyaxis
You also mixed variance and standard deviation.
Instead try something like this
y2=1 / sqrt(2*pi*var1)*exp(-(x2-m1).^2 / (2*var1))
ok, the following solves your question(s)
codehere
clear all;
close all;
clc
p=0.3; % thresholds
q=0.2;
s=0.5;
n_step=100;
max_cnt=10000;
n_bin=30; % histogram amount bins
xf=zeros(1,max_cnt);
for cnt=1:max_cnt % runs loop
x=0;
for i = 1:n_step % steps loop
t_rand1 = rand(1, 1);
if t_rand1 <=p
x=x-1;
elseif t_rand1<=(p+q)
x=x-2;
else
x=x+1;
end
end
xf(cnt) = x;
end
% [f,x]=hist(xf,n_bin);
hf1=figure(1)
ax1=gca
yyaxis left
hp1=histogram(xf,n_bin);
% bar(x,f/sum(f));
grid on
xlabel('xf')
ylabel('frequency')
m1 = mean(xf)
var1 = var(xf)
s1=var1^.5 % sigma
%applying central limit theorem %finding the mean
n_x2=1e3 % just enough points
min_x2=min(hp1.BinEdges)
max_x2=max(hp1.BinEdges)
% quite same as
min_x2=hp1.BinLimits(1)
max_x2=hp1.BinLimits(2)
x2=linspace(min_x2,max_x2,n_x2)
y2=1/sqrt(2*pi*var1)*exp(-(x2-m1).^2/(2*var1));
% hold(ax1,'on')
yyaxis right
plot(ax1,x2,y2,'r','LineWidth',2)
.
.
.
note I have not used these lines
% Xp=-1; Xq=-2; Xs=1; mu=Xp.*p+Xq.*q+Xs.*s;
% muN=n_step.*mu;
%
% sigma=(Xp).^2.*p+(Xq).^2.*q+(Xs).^2.s; % variance
% sigmaN=n_step.(sigma-(mu).^2);
People ususally call sigma to variance^.5
This supplied script is a good start point to now take it to wherever you need it to go.