How to plot and estimate empirical CDF and cdf in matlab - matlab

the question has already been raised several times, but mine differs a little from those previously voiced. There is a table (x value and relative frequencies).
x
150
250
350
450
550
650
750
y
1
2
8
30
18
16
5
I don’t really understand the meaning of the function [f,x] = ecdf(y) built into matlab, since I estimate and plot an empirical distribution function,
however, it is clearly not correct, if you build a histogram based on the selected data (x and y), then the resulting ECDF does not describe the correctly chosen distribution.
Therefore, such a question arose: how to construct correctly ECDF function from the table (empirical distribution function for x and having an array of relative frequencies)for the distribution function and from it directly estimate and plot cumulative distribution function?
My code for plot hist and ECDF:
%% data
y = [1; 2; 8; 30; 18; 16; 5];
x = [150; 250; 350; 450; 550; 650; 750];
%% hist and polygon
figure(1)
bar(x,y,'LineWidth',1,...
'FaceColor',[0.0745098039215686 0.623529411764706 1],...
'EdgeColor',[0.149019607843137 0.149019607843137 0.149019607843137],...
'BarWidth',1,...
'BarLayout','stacked');
hold on
plot(x,y,'-o','Color','red','LineWidth',1)
hold off
%% ecdf
[ff,x] = ecdf(y);
x_e = [0;x];
figure(2)
stairs(x_e,ff,'Marker','o','LineWidth',1,'Color',[0.0745098039215686 0.623529411764706 1]);
set(gca,'GridAlpha',0.25,'GridLineStyle','--','MinorGridLineStyle','--',...
'XGrid','on','XMinorGrid','on','YGrid','on');
xlim([0 780]);

You should not use the ecdf function, because it takes the data values as input. Your inputs, on the other hand, seem to be the population values and their absolute frequencies. So you only need to
normalize the frequencies to make them relative, and then
compute their cumulative sum.
When plotting, I suggest you include some initial and final population values with respective normalized frequencies 0 and 1 for a clearer graph.
x = [150; 250; 350; 450; 550; 650; 750];
y = [1; 2; 8; 30; 18; 16; 5]; % example data
cdf = cumsum(y./sum(y)); % normalize, then compute cumulative sum
stairs([100; x; 900], [0; cdf; 1], 'linewidth', .8), grid on % note two extra values

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

LevelList in contour plots

There is some information I couldn't find neither in the documentation nor in forums:
My Z has 5 orders of magnitude, how can I plot correctly these values? 0.002 all the way to 100-ish
Is it possible to specify this order of magnitude rather than the exact number? In LevelList somehow, I mean. E.g. I want a level at 10^2, which can mean 100 or 190, or 131.34.
Code:
[C,h] = contour(beta,alpha,Coupling)
clabel(C,h)
axis([0 3 0 3])
Let's say you had some random data
% Data Order of magnitude base 10
a = [0.0964 % O(1e-1)
0.0157 % O(1e-2)
0.0970 % O(1e-1)
0.9571 % O(1e+0)
0.4853 % O(1e+0)
0.8002 % O(1e+0)
1.4188 % O(1e+0)
4.2176 % O(1e+1)
9.1573] % O(1e+1)
Where the orders of magnitude are given by
orders = round(log10(a));
You can replace your z values with this formula z2 = 10.^round(log10(z)) to define them by their magnitude. Then produce a contour plot with the distinct orders of magnitude just as you did before, but using z2 not z.
For your example:
CouplingMagnitudes = 10.^(round(log10(Coupling)));
[C,h] = contour(beta, alpha, CouplingMagnitudes)

Plotting the implicit function x+y - log(x) - log(y) -2 = 0 on MATLAB

I wanted to plot the above function on Matlab so I used the following code
ezplot('-log(x)-log(y)+x+y-2',[-10 10 -10 10]);
However I'm just getting a blank screen. But clearly there is at least the point (1,1) that satisfies the equation.
I don't think there is a problem with the plotter settings, as I'm getting graphs for functions like
ezplot('-log(y)+x+y-2',[-10 10 -10 10]);
I don't have enough rep to embed pictures :)
If we use solve on your function, we can see that there are two points where your function is equal to zero. These points are at (1, 1) and (0.3203 + 1.3354i, pi)
syms x y
result = solve(-log(x)-log(y)+x+y-2, x, y);
result.x
% -wrightOmega(log(1/pi) - 2 + pi*(1 - 1i))
% 1
result.y
% pi
% 1
If we look closely at your function, we can see that the values are actually complex
[x,y] = meshgrid(-10:0.01:10, -10:0.01:10);
values = -log(x)-log(y)+x+y-2;
whos values
% Name Size Bytes Class Attributes
% values 2001x2001 64064016 double complex
It seems as though in older versions of MATLAB, ezplot handled complex functions by only considering the real component of the data. As such, this would yield the following plot
However, newer versions consider the magnitude of the data and the zeros will only occur when both the real and imaginary components are zero. Of the two points where this is true, only one of these points is real and is able to be plotted; however, the relatively coarse sampling of ezplot isn't able to display that single point.
You could use contourc to determine the location of this point
imagesc(abs(values), 'XData', [-10 10], 'YData', [-10 10]);
axis equal
hold on
cmat = contourc(abs(values), [0 0]);
xvalues = xx(1, cmat(1,2:end));
yvalues = yy(cmat(2,2:end), 1);
plot(xvalues, yvalues, 'r*')
This is because x = y = 1 is the only solution to the given equation.
Note that the minimum value of x - log(x) is 1 and that happens when x = 1. Obviously, the same is true for y - log(y). So, -log(x)-log(y)+x+y is always greater than 2 except at x = y = 1, where it is exactly equal to 2.
As your equation has only one solution, there is no line on the plot.
To visualize this, let's plot the equation
ezplot('-log(x)-log(y)+x+y-C',[-10 10 -10 10]);
for various values of C.
% choose a set of values between 5 and 2
C = logspace(log10(5), log10(2), 20);
% plot the equation with various values of C
figure
for ic=1:length(C)
ezplot(sprintf('-log(x)-log(y)+x+y-%f', C(ic)),[0 10 0 10]);
hold on
end
title('-log(x)-log(y)+x+y-C = 0, for 5 < C < 2');
Note that the largest curve is obtained for C = 5. As the value of C is decreased, the curve also becomes smaller, until at C = 2 it completely vanishes.

Multivariate Normal Distribution Matlab, probability area

I have 2 arrays: one with x-coordinates, the other with y-coordinates.
Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:
[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)
However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:
[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');
So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.
For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.
% Suppose you know the distribution params, or you got them from normfit()
mu = [3, 7];
sigma = [1, 2.5
2.5 9];
% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);
% Z values
[X1,X2] = meshgrid(x,y);
Z = mvnpdf([X1(:) X2(:)],mu,sigma);
Z = reshape(Z,length(y),length(x));
% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on
% Add level set
alpha = 0.05;
r = sqrt(-2*log(alpha));
rho = sigma(2)/sqrt(sigma(1)*sigma(end));
M = [sqrt(sigma(1)) rho*sqrt(sigma(end))
0 sqrt(sigma(end)-sigma(end)*rho^2)];
theta = 0:0.1:2*pi;
f = bsxfun(#plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')
Sources
https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.
Then you can get just the top 5% with
Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');

Defining windows to find multiple slopes

I need to define several windows for an experimental plot for which slopes can be found. For example, x runs from 0 to 400. I want to find the derivative of each 50x (i.e. 0 to 50, 50 to 100 & so on), and then average all derivatives (8 derivatives in this example). Thanks for any helps!
Assuming you have a vector y of measurements and want to compute the derivative by taking the difference between entry 1 and 50, 51 and 100, and so on, you could do the following:
% generate a signal
x=1:400;
y = x.^2;
nSamples = length(y);
% define number of segments and window size
N = 8;
Winsize = ceil(nSamples/N);
% preallocate the vector of slopes and compute the slopes
slopes = zeros(1,N);
for ii=1:N
slopes(ii) = (y(min(nSamples,Winsize*ii))-y(1+Winsize*(ii-1)))/(x(min(nSamples,Winsize*ii))-x(1+Winsize*(ii-1)));
end
% take the average slope value
Averageslope = mean(slopes);
However, since you are using matlab anyway you could also just take the average derivative of the vector, which should yield a much more accurate average when dealing with noisy data:
% generate a signal
x=1:400;
y = x.^2;
slope = mean(diff(y)/diff(x));