Multivariate Normal Distribution Matlab, probability area - matlab

I have 2 arrays: one with x-coordinates, the other with y-coordinates.
Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:
[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)
However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:
[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');
So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.

For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.
% Suppose you know the distribution params, or you got them from normfit()
mu = [3, 7];
sigma = [1, 2.5
2.5 9];
% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);
% Z values
[X1,X2] = meshgrid(x,y);
Z = mvnpdf([X1(:) X2(:)],mu,sigma);
Z = reshape(Z,length(y),length(x));
% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on
% Add level set
alpha = 0.05;
r = sqrt(-2*log(alpha));
rho = sigma(2)/sqrt(sigma(1)*sigma(end));
M = [sqrt(sigma(1)) rho*sqrt(sigma(end))
0 sqrt(sigma(end)-sigma(end)*rho^2)];
theta = 0:0.1:2*pi;
f = bsxfun(#plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')
Sources
https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf
https://en.wikipedia.org/wiki/Multivariate_normal_distribution

To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.
Then you can get just the top 5% with
Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

Constrained linear least squares not fitting data

I am trying to fit a 3D surface polynomial of n-degrees to some data points in 3D space. My system requires the surface to be monotonically increasing in the area of interest, that is the partial derivatives must be non-negative. This can be achieved using Matlab's built in lsqlin function.
So here's what I've done to try and achieve this:
I have a function that takes in four parameters;
x1 and x2 are my explanatory variables and y is my dependent variable. Finally, I can specify order of polynomial fit. First I build the design matrix A using data from x1 and x2 and the degree of fit I want. Next I build the matrix D that is my container for the partial derivatives of my datapoints. NOTE: the matrix D is double the length of matrix A since all datapoints must be differentiated with respect to both x1 and x2. I specify that Dx >= 0 by setting b to be zeroes.
Finally, I call lsqlin. I use "-D" since Matlab defines the function as Dx <= b.
function w_mono = monotone_surface_fit(x1, x2, y, order_fit)
% Initialize design matrix
A = zeros(length(x1), 2*order_fit+2);
% Adjusting for bias term
A(:,1) = ones(length(x1),1);
% Building design matrix
for i = 2:order_fit+1
A(:,(i-1)*2:(i-1)*2+1) = [x1.^(i-1), x2.^(i-1)];
end
% Initialize matrix containing derivative constraint.
% NOTE: Partial derivatives must be non-negative
D = zeros(2*length(y), 2*order_fit+1);
% Filling matrix that holds constraints for partial derivatives
% NOTE: Matrix D will be double length of A since all data points will have a partial derivative constraint in both x1 and x2 directions.
for i = 2:order_fit+1
D(:,(i-1)*2:(i-1)*2+1) = [(i-1)*x1.^(i-2), zeros(length(x2),1); ...
zeros(length(x1),1), (i-1)*x2.^(i-2)];
end
% Limit of derivatives
b = zeros(2*length(y), 1);
% Constrained LSQ fit
options = optimoptions('lsqlin','Algorithm','interior-point');
% Final weights of polynomial
w_mono = lsqlin(A,y,-D,b,[],[],[],[],[], options);
end
So now I get some weights out, but unfortunately they do not at all capture the structure of the data. I've attached an image so you can just how bad it looks. .
I'll give you my plotting script with some dummy data, so you can try it.
%% Plot different order polynomials to data with constraints
x1 = [-5;12;4;9;18;-1;-8;13;0;7;-5;-8;-6;14;-1;1;9;14;12;1;-5;9;-10;-2;9;7;-1;19;-7;12;-6;3;14;0;-8;6;-2;-7;10;4;-5;-7;-4;-6;-1;18;5;-3;3;10];
x2 = [81.25;61;73;61.75;54.5;72.25;80;56.75;78;64.25;85.25;86;80.5;61.5;79.25;76.75;60.75;54.5;62;75.75;80.25;67.75;86.5;81.5;62.75;66.25;78.25;49.25;82.75;56;84.5;71.25;58.5;77;82;70.5;81.5;80.75;64.5;68;78.25;79.75;81;82.5;79.25;49.5;64.75;77.75;70.25;64.5];
y = [-6.52857142857143;-1.04736842105263;-5.18750000000000;-3.33157894736842;-0.117894736842105;-3.58571428571429;-5.61428571428572;0;-4.47142857142857;-1.75438596491228;-7.30555555555556;-8.82222222222222;-5.50000000000000;-2.95438596491228;-5.78571428571429;-5.15714285714286;-1.22631578947368;-0.340350877192983;-0.142105263157895;-2.98571428571429;-4.35714285714286;-0.963157894736842;-9.06666666666667;-4.27142857142857;-3.43684210526316;-3.97894736842105;-6.61428571428572;0;-4.98571428571429;-0.573684210526316;-8.22500000000000;-3.01428571428571;-0.691228070175439;-6.30000000000000;-6.95714285714286;-2.57232142857143;-5.27142857142857;-7.64285714285714;-2.54035087719298;-3.45438596491228;-5.01428571428571;-7.47142857142857;-5.38571428571429;-4.84285714285714;-6.78571428571429;0;-0.973684210526316;-4.72857142857143;-2.84285714285714;-2.54035087719298];
% Used to plot the surface in all points in the grid
X1 = meshgrid(-10:1:20);
X2 = flipud(meshgrid(30:2:90).');
figure;
for i = 1:4
w_mono = monotone_surface_fit(x1, x2, y, i);
y_nr = w_mono(1)*ones(size(X1)) + w_mono(2)*ones(size(X2));
for j = 1:i
y_nr = w_mono(j*2)*X1.^j + w_mono(j*2+1)*X2.^j;
end
subplot(2,2,i);
scatter3(x1, x2, y); hold on;
axis tight;
mesh(X1, X2, y_nr);
set(gca, 'ZDir','reverse');
xlabel('x1'); ylabel('x2');
zlabel('y');
% zlim([-10 0])
end
I think it may have something to do with the fact that I haven't specified anything about the region of interest, but really I don't know. Thanks in advance for any help.
Alright I figured it out.
The main problem was simply an error in the plotting script. The value of y_nr should be updated and not overwritten in the loop.
Also I figured out that the second derivative should be monotonically decreasing. Here's the updated code if anybody is interested.
%% Plot different order polynomials to data with constraints
x1 = [-5;12;4;9;18;-1;-8;13;0;7;-5;-8;-6;14;-1;1;9;14;12;1;-5;9;-10;-2;9;7;-1;19;-7;12;-6;3;14;0;-8;6;-2;-7;10;4;-5;-7;-4;-6;-1;18;5;-3;3;10];
x2 = [81.25;61;73;61.75;54.5;72.25;80;56.75;78;64.25;85.25;86;80.5;61.5;79.25;76.75;60.75;54.5;62;75.75;80.25;67.75;86.5;81.5;62.75;66.25;78.25;49.25;82.75;56;84.5;71.25;58.5;77;82;70.5;81.5;80.75;64.5;68;78.25;79.75;81;82.5;79.25;49.5;64.75;77.75;70.25;64.5];
y = [-6.52857142857143;-1.04736842105263;-5.18750000000000;-3.33157894736842;-0.117894736842105;-3.58571428571429;-5.61428571428572;0;-4.47142857142857;-1.75438596491228;-7.30555555555556;-8.82222222222222;-5.50000000000000;-2.95438596491228;-5.78571428571429;-5.15714285714286;-1.22631578947368;-0.340350877192983;-0.142105263157895;-2.98571428571429;-4.35714285714286;-0.963157894736842;-9.06666666666667;-4.27142857142857;-3.43684210526316;-3.97894736842105;-6.61428571428572;0;-4.98571428571429;-0.573684210526316;-8.22500000000000;-3.01428571428571;-0.691228070175439;-6.30000000000000;-6.95714285714286;-2.57232142857143;-5.27142857142857;-7.64285714285714;-2.54035087719298;-3.45438596491228;-5.01428571428571;-7.47142857142857;-5.38571428571429;-4.84285714285714;-6.78571428571429;0;-0.973684210526316;-4.72857142857143;-2.84285714285714;-2.54035087719298];
% Used to plot the surface in all points in the grid
X1 = meshgrid(-10:1:20);
X2 = flipud(meshgrid(30:2:90).');
figure;
for i = 1:4
w_mono = monotone_surface_fit(x1, x2, y, i);
% NOTE: Should only have 1 bias term
y_nr = w_mono(1)*ones(size(X1));
for j = 1:i
y_nr = y_nr + w_mono(j*2)*X1.^j + w_mono(j*2+1)*X2.^j;
end
subplot(2,2,i);
scatter3(x1, x2, y); hold on;
axis tight;
mesh(X1, X2, y_nr);
set(gca, 'ZDir','reverse');
xlabel('x1'); ylabel('x2');
zlabel('y');
% zlim([-10 0])
end
And here's the updated function
function [w_mono, w] = monotone_surface_fit(x1, x2, y, order_fit)
% Initialize design matrix
A = zeros(length(x1), 2*order_fit+1);
% Adjusting for bias term
A(:,1) = ones(length(x1),1);
% Building design matrix
for i = 2:order_fit+1
A(:,(i-1)*2:(i-1)*2+1) = [x1.^(i-1), x2.^(i-1)];
end
% Initialize matrix containing derivative constraint.
% NOTE: Partial derivatives must be non-negative
D = zeros(2*length(y), 2*order_fit+1);
for i = 2:order_fit+1
D(:,(i-1)*2:(i-1)*2+1) = [(i-1)*x1.^(i-2), zeros(length(x2),1); ...
zeros(length(x1),1), -(i-1)*x2.^(i-2)];
end
% Limit of derivatives
b = zeros(2*length(y), 1);
% Constrained LSQ fit
options = optimoptions('lsqlin','Algorithm','active-set');
w_mono = lsqlin(A,y,-D,b,[],[],[],[],[], options);
w = lsqlin(A,y);
end
Finally a plot of the fitting (Have used a new simulation, but fit also works on given dummy data).

How to draw random numbers from a gamma distribution without the Statistics Toolbox?

I am varying the signal strength for synthetic images. I need the signal to vary between 0 and 0.1, but I need to do this with a gamma distribution so that more of them fall around the .01/.02 range. The problem is that I am using the 2010 version of Matlab without the Statistics Toolbox that doesn't have the gamrnd function a part of its library.
Any and all help is greatly appreciated.
You can use the Inverse transform sampling method to convert a uniform distribution to any other distribution:
P = rand(1000);
X = gaminv(P(:),2,2); % with k = 2 and theta = 2
Here is a litle demonstration:
for k = [1 3 9]
for theta = [0.5 1 2]
X = gaminv(P(:),k,theta);
histogram(X,50)
hold on
end
end
Which gives:
Edit:
Without the statistics toolbox, you can use the Marsaglia's simple transformation-rejection method to generate random numbers from gamma distribution with rand and randn:
N = 10000; % no. of tries
% distribution parameters:
a = 0.5;
b = 0.1;
% Marsaglia's simple transformation-rejection:
d = a - 1/3;
x = randn(N,1);
U = rand(N,1);
v = (1+x./sqrt(9*d)).^3;
accept = log(U)<(0.5*x.^2+d-d*v+d*log(v));
Y = d*(v(accept)).*b;
Now Y is distributed like gamma(a,b). We can test the result using the gamrnd function:
n = size(Y,1);
X = gamrnd(a,b,n,1);
And the histograms of Y, and X are:
However, keep in mind that gamma distribution might not fit your needs because it has no specific upper bound (i.e. goes to infinity). So you may want to use another (bounded) distribution, like beta divided by 10.

calculating sum of two triangular random variables (Matlab)

I would like to calculate the sum of two triangular random variables,
P(x1+x2 < y)
Is there a faster way to implement the sum of two triangular random variables in Matlab?
EDIT: It seems there's possibly a much easier way, as shown in this minitab demonstration. So it's not impossible. It doesn't explain how the PDF was calculated, sadly. Still looking into how I can do this in matlab.
EDIT2: Following advice, I'm using conv function in Matlab to develop the PDF of the sum of two random variables:
clear all;
clc;
pd1 = makedist('Triangular','a',85,'b',90,'c',100);
pd2 = makedist('Triangular','a',90,'b',100,'c',110);
x = linspace(85,290,200);
x1 = linspace(85,100,200);
x2 = linspace(90,110,200);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
z = median(diff(x))*conv(pdf1,pdf2,'same');
p1 = trapz(x1,pdf1) %probability P(x1<y)
p2 = trapz(x2,pdf2) %probability P(x2<y)
p12 = trapz(x,z) %probability P(x1+x2 <y)
hold on;
plot(x1,pdf1) %plot pdf of dist. x1
plot(x2,pdf2) %plot pdf of dist. x2
plot(x,z) %plot pdf of x1+x2
hold off;
However this code has two problems:
PDF of X1+X2 integrates to much higher than 1.
PDF of X1+X2 varies widely depending on the range of x. Intuitively, if the X1+X2 is larger than 210 (the sum of upper limits "c" of two individual triangular distributions, 100 + 110), shouldn't P(X1+X2 <210) equal to 1? Also, since the lower limits "a" is 85 and 90, P(X1+X2 <85) = 0?
The pdf of the sum of independent variables is the convolution of the pdf's of the variables. So you need to compute the convolution of two variables with trianular pdf's. A triangle is piecewise linear, so the convolution will be piecewise quadratic.
There are a few ways to about it. If a numerical result is acceptable: discretize the pdf's and compute the convolution of the discretized pdf's. I believe there is a function conv in Matlab for that. If not, you can take the fast Fourier transform (via fft), compute the product point by point, then take the inverse transform (ifft if I remember correctly) since fft(convolution(f, g)) = fft(f) fft(g). You will need to be careful about normalization if you use either conv or fft.
If you must have an exact result, the convolution is just an integral, and if you're careful with the limits of integration, you can figure it out by hand. I don't know if the Matlab symbolic toolbox is available to you, and if so, I don't know if it can handle integrals of functions defined piecewise.
Below is the proper implementation for future users. Many thanks to Robert Dodier for guidance.
clear all;
clc;
min1 = 85;
max1 = 100;
min2 = 90;
max2 = 110;
y = 210;
pd1 = makedist('Triangular','a',min1,'b',90,'c',max1);
pd2 = makedist('Triangular','a',min2,'b',100,'c',max2);
dx = 0.01; % to ensure constant spacing
x1 = min1:dx:max1; % Could include some of the region where
x2 = min2:dx:max2; % the pdf is 0, but we don't have to.
x12 = linspace(...
x1(1) + x2(1) , ...
x1(end) + x2(end) , ...
length(x1)+length(x2)-1);
[c,index] = min(abs(x12-y));
x_short = linspace(min1+min2,x12(index),index);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
pdf12 = conv(pdf1,pdf2)*dx;
zz = pdf12(1:index);
zz(index) = 0;
p1 = trapz(x1,pdf1)
p2 = trapz(x2,pdf2)
p12 = trapz(x_short,zz)
plot(x1,pdf1,x2,pdf2,x12,pdf12)
hold on;
fill(x_short,zz,'blue') % plot x1+x2
hold off;

Equally spaced points in a contour

I have a set of 2D points (not ordered) forming a closed contour, and I would like to resample them to 14 equally spaced points. It is a contour of a kidney on an image. Any ideas?
One intuitive approach (IMO) is to create an independent variable for both x and y. Base it on arc length, and interpolate on it.
% close the contour, temporarily
xc = [x(:); x(1)];
yc = [y(:); y(1)];
% current spacing may not be equally spaced
dx = diff(xc);
dy = diff(yc);
% distances between consecutive coordiates
dS = sqrt(dx.^2+dy.^2);
dS = [0; dS]; % including start point
% arc length, going along (around) snake
d = cumsum(dS); % here is your independent variable
perim = d(end);
Now you have an independent variable and you can interpolate to create N segments:
N = 14;
ds = perim / N;
dSi = ds*(0:N).'; %' your NEW independent variable, equally spaced
dSi(end) = dSi(end)-.005; % appease interp1
xi = interp1(d,xc,dSi);
yi = interp1(d,yc,dSi);
xi(end)=[]; yi(end)=[];
Try it using imfreehand:
figure, imshow('cameraman.tif');
h = imfreehand(gca);
xy = h.getPosition; x = xy(:,1); y = xy(:,2);
% run the above solution ...
Say your contour is defined by independent vector x and dependent vector y.
You can get your resampled x vector using linspace:
new_x = linspace(min(x),max(x),14); %14 to get 14 equally spaced points
Then use interp1 to get new_y values at each new_x point:
new_y = interp1(x,y,new_x);
There are a few interpolation methods to choose from - default is linear. See interp1 help for more info.