matlab pdf estimation (ksdensity) not working - matlab

I'm estimating a probability density function (pdf) using matlab.
The code is like this
xi = -2:0.1:2;
a1 = normpdf(xi, 1, 0.3);
a2 = normpdf(xi, -1, 0.3);
subplot(211);
plot(xi, a1+a2);
[f, xs] = ksdensity(a1+a2);
subplot(212);
plot(xs, f);
and pics like this
You see the estimation is not working at all.
So what's wrong here? BTW is there other pdf estimation methods in matlab?

Is this closer to what you expect?
The ksdensity function expects a vector of samples from the distribution, whereas you were feeding it the values of the probability density function.
>> xi = -3:0.1:3;
>> p1 = normpdf(xi, 1, 0.3);
>> p2 = normpdf(xi,-1, 0.3);
>> subplot(211)
>> plot(xi, 0.5*p1+0.5*p2)
>> a1 = 1 + 0.3 * randn(10000,1); % construct the same distribution
>> a2 = -1 + 0.3 * randn(10000,1); % construct the same distribution
>> [f, xs] = ksdensity([a1;a2]);
>> subplot(212)
>> plot(xs, f)

ksdensity gives you the probability distribution (100 points by default) of the input values. Your input value a1+a2 has values that range between 0 and 1.5, with a large potion of those close to 0 and a smaller portion near 1.5. The second plot you see reflects this distribution.
If you want to see two similar plots, put as an input to ksdensity a vector with elements concentrated near -1 and 1.

Related

Methods for smoothing contour lines

I have a database P with columns X, Y and Z:
x=0:1:50;
r=3.*rand(1,51);
P=[cos(x')+r',sin(x')+r',sin(x'+r').*cos(x')+r'];
P = sortrows(P,[1,2]);
N = 500;
xv = linspace(min(P(:,1)), max(P(:,1)), N);
yv = linspace(min(P(:,2)), max(P(:,2)), N);
[X,Y] = ndgrid(xv, yv);
Z = griddata(P(:,1), P(:,2), P(:,3), X, Y);
contourf(X, Y, Z, 35)
With the code above, I get the following subplot (right):
This "angularity" arises due to the addition of a vector r of random values ​​to the data. How to smooth out this angularity and make the graph smoother?
I tried to increase N to 2500. It did not give a significant result (in fact, it stopped changing significantly after N=1500).
To smooth your 2D data you can use a 2D convolution, with the operator conv2
With your example data:
n = 10 ;
kernel = ones(n)/n.^2 ;
Zs = conv2(Z, kernel,'same') ;
contourf(X, Y, Zs, 35) ;
title(sprintf('Filter size: n=%d',n))
will yield:
You have to adjust the filter size (the parameter n) until you get the desired result. For example with n=20 you will get:
And for n=50:
A few things to keep in mind:
Since you have NaN in your matrix Z, the filtering will erode the border of you initial domain. The stronger the filtering/smoothing (higher n), the more erosion you will notice until very little non-NaN data is left .
For it to be a smoothing operation and not another transform or filtering, the convolution kernel has to be built so (i) all the elements are of equal value, and (ii) the total sum of the elements should be 1.
A quick demonstration with n=2:
>> n=2
n =
2
>> kernel = ones(n)/n.^2
kernel =
0.25 0.25
0.25 0.25
>> sum(sum(kernel))
ans =
1

Matlab: 2D Discrete Fourier Transform and Inverse

I'm trying to run a program in matlab to obtain the direct and inverse DFT for a grey scale image, but I'm not able to recover the original image after applying the inverse. I'm getting complex numbers as my inverse output. Is like i'm losing information. Any ideas on this? Here is my code:
%2D discrete Fourier transform
%Image Dimension
M=3;
N=3;
f=zeros(M,N);
f(2,1:3)=1;
f(3,1:3)=0.5;
f(1,2)=0.5;
f(3,2)=1;
f(2,2)=0;
figure;imshow(f,[0 1],'InitialMagnification','fit')
%Direct transform
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1)=f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N)));
end
end
end
end
Fab=abs(F);
figure;imshow(Fab,[0 1],'InitialMagnification','fit')
%Inverse Transform
for x=0:1:M-1
for y=0:1:N-1
for u=1:1:M
for v=1:1:N
z(x+1,y+1)=(1/M*N)*F(u,v)*exp(2*pi*(1i)*(((u-1)*x/M)+((v-1)*y/N)));
end
end
end
end
figure;imshow(real(z),[0 1],'InitialMagnification','fit')
There are a couple of issues with your code:
You are not applying the definition of the DFT (or IDFT) correctly: you need to sum over the original variable(s) to obtain the transform. See the formula here; notice the sum.
In the IDFT the normalization constant should be 1/(M*N) (not 1/M*N).
Note also that the code could be made mucho more compact by vectorization, avoiding the loops; or just using the fft2 and ifft2 functions. I assume you want to compute it manually and "low-level" to verify the results.
The code, with the two corrections, is as follows. The modifications are marked with comments.
M=3;
N=3;
f=zeros(M,N);
f(2,1:3)=1;
f(3,1:3)=0.5;
f(1,2)=0.5;
f(3,2)=1;
f(2,2)=0;
figure;imshow(f,[0 1],'InitialMagnification','fit')
%Direct transform
F = zeros(M,N); % initiallize to 0
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1) = F(u+1,v+1) + ...
f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N))); % add term
end
end
end
end
Fab=abs(F);
figure;imshow(Fab,[0 1],'InitialMagnification','fit')
%Inverse Transform
z = zeros(M,N);
for x=0:1:M-1
for y=0:1:N-1
for u=1:1:M
for v=1:1:N
z(x+1,y+1) = z(x+1,y+1) + (1/(M*N)) * ... % corrected scale factor
F(u,v)*exp(2*pi*(1i)*(((u-1)*x/M)+((v-1)*y/N))); % add term
end
end
end
end
figure;imshow(real(z),[0 1],'InitialMagnification','fit')
Now the original and recovered image differ only by very small values, of the order of eps, due to the usual floating-point inaccuacies:
>> f-z
ans =
1.0e-15 *
Columns 1 through 2
0.180411241501588 + 0.666133814775094i -0.111022302462516 - 0.027755575615629i
0.000000000000000 + 0.027755575615629i 0.277555756156289 + 0.212603775716506i
0.000000000000000 - 0.194289029309402i 0.000000000000000 + 0.027755575615629i
Column 3
-0.194289029309402 - 0.027755575615629i
-0.222044604925031 - 0.055511151231258i
0.111022302462516 - 0.111022302462516i
Firstly, the biggest error is that you are computing the Fourier transform incorrectly. When computing F, you need to be summing over x and y, which you are not doing. Here's how to rectify that:
F = zeros(M, N);
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1)=F(u+1,v+1) + f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N)));
end
end
end
end
Secondly, in the inverse transform, your bracketing is incorrect. It should be 1/(M*N) not (1/M*N).
As an aside, at the cost of a bit more memory, you can speed up the computation by not nesting so many loops. Namely, when computing the FFT, do the following instead
x = (1:1:M)'; % x is a column vector
y = (1:1:N) ; % y is a row vector
for u = 0:1:M-1
for v = 0:1:N-1
F2(u+1,v+1) = sum(f .* exp(-2i * pi * (u*(x-1)/M + v*(y-1)/N)), 'all');
end
end
To take this method to the extreme, i.e. not using any loops at all, you would do the following (though this is not recommended, since you would lose code readability and the memory cost would increase exponentially)
x = (1:1:M)'; % x is in dimension 1
y = (1:1:N) ; % y is in dimension 2
u = permute(0:1:M-1, [1, 3, 2]); % x-freqs in dimension 3
v = permute(0:1:N-1, [1, 4, 3, 2]); % y-freqs in dimension 4
% sum the exponential terms in x and y, which are in dimensions 1 and 2.
% If you are using r2018a or older, the below summation should be
% sum(sum(..., 1), 2)
% instead of
% sum(..., [1,2])
F3 = sum(f .* exp(-2i * pi * (u.*(x-1)/M + v.*(y-1)/N)), [1, 2]);
% The resulting array F3 is 1 x 1 x M x N, to make it M x N, simply shiftdim or squeeze
F3 = squeeze(F3);

Integration of bivariate lognormal density function

rho = 0.8;
ff = #(x, y) (exp(-(((log(x)-10).^2 - 2.* rho .* (log(x)-10) .* (log(y)-10)+(log(y)-10).^2)./(2 .* (1-rho.^2))))./(2.*pi.*sqrt(1-rho.^2).*x.*y));
syms x y
vpaintegral(vpaintegral(ff, x, [0 inf]), y, [0 inf])
Why is the above integration of bivariate lognormal density function in Matlab not 1?
Note: the log transformation of this lognormal bivariate random variable is a bivariate normal random variable with a mean (10, 10), and covariance matrix (1, rho, rho, 1).
Using integral2:
we get 0.9994,
% MATLAB R2019a
rho = 0.8;
ff = #(x, y) (exp(-(((log(x)-10).^2-2.*rho.*(log(x)-10).*(log(y)-10)+(log(y)-10).^2)./(2.*(1-rho.^2))))./(2.*pi.*sqrt(1-rho.^2).*x.*y));
area = integral2(ff,0,inf,0,inf) % area = 0.9994
but adjusting the tolerance gives the desired result.
area = integral2(ff,0,inf,0,inf,'Method','iterated','AbsTol',0,'RelTol',1e-10)
ans = 1.0000
format long
area
ans = 0.999999999999998
Not too shabby.
Using vpaintegral from the Symbolic Toolbox:
You can also adjust the tolerance for vpaintegral.
Using a Relative Error Tolerance of 1e-4 got the job done. This parameter greatly affects computation time.
syms x y
area = vpaintegral(vpaintegral(ff, x, [0 inf],'RelTol', 1e-4, 'AbsTol', 0), y, [0 inf],'RelTol', 1e-4, 'AbsTol', 0)
area = 1.0

Testing for Unimodal (Unimodality) or Bimodal (Bimodality) Distribution in MATLAB

Is there a way in MATLAB to check whether the histogram distribution is unimodal or bimodal?
EDIT
Do you think Hartigan's Dip Statistic would work? I tried passing an image to it, and get the value 0. What does that mean?
And, when passing an image, does it test the distribution of the histogram of the image on the gray levels?
Thanks.
Here is a script using Nic Price's implementation of Hartigan's Dip Test to identify unimodal distributions. The tricky point was to calculate xpdf, which is not probability density function, but rather a sorted sample.
p_value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. In this case null hypothesis is that distribution is unimodal.
close all; clear all;
function [x2, n, b] = compute_xpdf(x)
x2 = reshape(x, 1, prod(size(x)));
[n, b] = hist(x2, 40);
% This is definitely not probability density function
x2 = sort(x2);
% downsampling to speed up computations
x2 = interp1 (1:length(x2), x2, 1:1000:length(x2));
end
nboot = 500;
sample_size = [256 256];
% Unimodal
sample2d = normrnd(0.0, 10.0, sample_size);
[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot);
figure;
subplot(1,2,1);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))
% Bimodal
sample2d = sign(sample2d) .* (abs(sample2d) .^ 0.5);
[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot);
subplot(1,2,2);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))
print -dpng modality.png
There are many different ways to do what you are asking. In the most literal sense, "bimodal" means there are two peaks. Usually though, you want the "two peaks" to be separated by some reasonable distance, and you want them to each contain a reasonable proportion of the total counts. Only you know what is "reasonable" for your situation, but the following approach might help.
Create a histogram of the intensities
Form the cumulative distribution with cumsum
For different values of the "cut" between distributions (25%, 30%, 50%, …), compute the mean and standard deviation of the two distributions (above and below the cut).
Compute the distance between the means divided by the sum of the standard deviations of the two distributions
That quantity will be a maximum at the "best cut"
You have to decide what size of that quantity represents "bimodal" for you. Here is some code that demonstrates what I am talking about. It generates bimodal distributions of different degrees of severity - two Gaussians, with increasing delta between them (steps = size of standard deviation). I compute the quantity described above, and plot it for a range of different values of delta. I then fit a parabola through this curve over a range corresponding to +- 1 sigma of the entire distribution. As you can see, when the distribution becomes more bimodal, two things happen:
The curvature of this curve flips (it goes from a valley to a peak)
The maximum increases (it is about 1.33 for a Gaussian).
You can look at these quantities for some of your own distributions, and decide where you want to put the cutoff.
% test for bimodal distribution
close all
for delta = 0:10:50
a1 = randn(100,100) * 10 + 25;
a2 = randn(100,100) * 10 + 25 + delta;
a3 = [a1(:); a2(:)];
[h hb] = hist(a3, 0:100);
cs = cumsum(h);
llimi = find(cs < 0.2 * max(cs(:)));
ulimi = find(cs > 0.8 * max(cs(:)));
llim = hb(llimi(end));
ulim = hb(ulimi(1));
cuts = linspace(llim, ulim, 20);
dmean = mean(a3);
dstd = std(a3);
for ci = 1:numel(cuts)
d1 = a3(a3<cuts(ci));
d2 = a3(a3>=cuts(ci));
m(ci,1) = mean(d1);
m(ci, 2) = mean(d2);
s(ci, 1) = std(d1);
s(ci, 2) = std(d2);
end
q = (m(:, 2) - m(:, 1)) ./ sum(s, 2);
figure;
plot(cuts, q);
title(sprintf('delta = %d', delta))
% compute curvature of plot around mean:
xlims = dmean + [-1 1] * dstd;
indx = find(cuts < xlims(2) && cuts > xlims(1));
pf = polyfit(cuts(indx), q(indx), 2);
m = polyval(pf, dmean);
fprintf(1, 'coefficients: a = %.2e, peak = %.2f\n', pf(1), m);
end
Output values:
coefficients: a = 1.37e-03, peak = 1.32
coefficients: a = 1.01e-03, peak = 1.34
coefficients: a = 2.85e-04, peak = 1.45
coefficients: a = -5.78e-04, peak = 1.70
coefficients: a = -1.29e-03, peak = 2.08
coefficients: a = -1.58e-03, peak = 2.48
Sample plots:
And the histogram for delta = 40:

Numerical integration over non-uniform grid in matlab. Is there any function?

I've got function values in a vector f and also the vector containing values of the argument x. I need to find the define integral value of f. But the argument vector x is not uniform. Is there any function in Matlab that deals with integration over non-uniform grids?
Taken from help :
Z = trapz(X,Y) computes the integral of Y with respect to X using
the trapezoidal method. X and Y must be vectors of the same
length, or X must be a column vector and Y an array whose first
non-singleton dimension is length(X). trapz operates along this
dimension.
As you can see x does not have to be uniform.
For instance:
x = sort(rand(100,1)); %# Create random values of x in [0,1]
y = x;
trapz( x, y)
Returns:
ans =
0.4990
Another example:
x = sort(rand(100,1)); %# Create random values of x in [0,1]
y = x.^2;
trapz( x, y)
returns:
ans =
0.3030
Depending on your function (and how x is distributed), you might get more accuracy by doing a spline interpolation through your data first:
pp = spline(x,y);
quadgk(#(t) ppval(pp,t), [range])
That's the quick-n-dirty way. Ther is a faster and more direct approach, but that is fugly and much less transparent:
result = sum(sum(...
bsxfun(#times, pp.coefs, 1./(4:-1:1)) .*... % coefficients of primitive
bsxfun(#power, diff(pp.breaks).', 4:-1:1)... % all 4 powers of shifted x-values
));
As an example why all this could be useful, I borrow the example from here. The exact answer should be
>> pi/2/sqrt(2)*(17-40^(3/4))
ans =
1.215778726893561e+00
Defining
>> x = [0 sort(3*rand(1,5)) 3];
>> y = (x.^3.*(3-x)).^(1/4)./(5-x);
we find
>> trapz(x,y)
ans =
1.142392438652055e+00
>> pp = spline(x,y);
>> tic; quadgk(#(t) ppval(pp,t), 0, 3), toc
ans =
1.213866446458034e+00
Elapsed time is 0.017472 seconds.
>> tic; result = sum(sum(...
bsxfun(#times, pp.coefs, 1./(4:-1:1)) .*... % coefficients of primitive
bsxfun(#power, diff(pp.breaks).', 4:-1:1)... % all 4 powers of shifted x-values
)), toc
result =
1.213866467945575e+00
Elapsed time is 0.002887 seconds.
So trapz underestimates the value by more than 0.07. With the latter two methods, the error is an order of magnitude less. Also, the less-readable version of the spline approach is an order of magnitude faster.
So, armed with this knowledge: choose wisely :)
You can do Gaussian quadrature over each piecewise pair of x and sum them up to get the complete integral.