Convolution of two dependent distributions in MATLAB - matlab

Assume that I have two discrete random variables X and Y.
X = {1,3,3,5,7,7,7,9,9,9,9,9}
and
Y = {5,5,9,9,10,12,13}
Where their empirical CDFs are given as:
F_x(1) = 0.0833, F_x(3) = 0.25, F_x(5) = 0.33, F_x(7) = 0.5833 and F_x(9) = 1
and
F_y(5) = 0.2857, F_y(9) = 0.5714, F_y(10) = 0.7143, F_y(12) = 0.8571 and F_y(13) = 1
Assuming their joint distribution is
H(x,y) = F_x(x) * F_y(y)
which is actually the "assumption" of X and Y are independent.
How can i calculate the Z = X + Y and F(z) in MATLAB ?
Note: I gave the H(x,y) as a simple product function for the simplicity, but it can be anything in reality which actually models the dependency between X and Y.

Given continuous probability density functions FX and FY, with a joint probability density function FX,Y, we can compute FX+Y as the integral of FX,Y over the line z=x+y. If the probability density functions are discrete, the integral above should be written as the derivative of the integral over the part of the plane given by z<=x+y.
This is fairly simple to do in MATLAB. Let's start with OP's data:
F_x = [0.0833,0.0833,0.25,0.25,0.33,0.33,0.5833,0.5833,1]; % CDF
F_x = diff([0,F_x]); % PDF
F_y = [0,0,0,0,0.2857,0.2857,0.2857,0.2857,0.5714,0.7143,0.7143,0.8571,1]; % CDF
F_y = diff([0,F_y]); % PDF
H = F_x.' .* F_y; % example joint PDF
Now we sum F_cum(z) = sum(H(x,y)) for all values z<=x+y, and then take the derivative F = diff([0,F_cum]):
[m,n] = size(H);
F_cum = zeros(1,m+n-1);
for z = 1:numel(F_cum)
s = 0;
for x = 1:numel(F_x)
y = z-x+1;
y = max(min(y,n),1); % avoid out of bounds indexing
s = s + sum(H(x,1:y));
end
F_cum(z) = s;
end
F = diff([0,F_cum]);
Note that we defined y=z-x+1, meaning z=y+x-1. Thus F(1) corresponds to z=2. This is the lowest possible value that can come out of the sum of the two distributions, which we defined to start at 1.
The above can be simplified by padding H with zeros and shifting each row by one additional element. This lines up the line z=x+y on a column of the matrix, allowing us to use a trivial sum projection:
H = [H,zeros(m)];
for ii=2:m
H(ii,:) = circshift(H(ii,:),ii-1);
end
F_cum = cumsum(sum(H,1));
F_cum = F_cum(1:end-1); % last element we don't need
F2 = diff([0,F_cum]);
But because diff([0,cumsum(F)]) == F (up to numerical precision), we can skip those two operations:
F3 = sum(H,1);
F3 = F3(1:end-1); % last element we don't need
(all(abs(F-F2)<1e-15) and all(abs(F-F3)<1e-16))

Related

Matlab function for cumulative power

Is there a function in MATLAB that generates the following matrix for a given scalar r:
1 r r^2 r^3 ... r^n
0 1 r r^2 ... r^(n-1)
0 0 1 r ... r^(n-2)
...
0 0 0 0 ... 1
where each row behaves somewhat like a power analog of the CUMSUM function?
You can compute each term directly using implicit expansion and element-wise power, and then apply triu:
n = 5; % size
r = 2; % base
result = triu(r.^max((1:n)-(1:n).',0));
Or, maybe a little faster because it doesn't compute unwanted powers:
n = 5; % size
r = 2; % base
t = (1:n)-(1:n).';
u = find(t>=0);
t = t(u);
result = zeros(n);
result(u) = r.^t;
Using cumprod and triu:
% parameters
n = 5;
r = 2;
% Create a square matrix filled with 1:
A = ones(n);
% Assign the upper triangular part shifted by one with r
A(triu(A,1)==1)=r;
% cumprod along the second dimension and get only the upper triangular part
A = triu(cumprod(A,2))
Well, cumsum accumulates the sum of a vector but you are asking for a specially design matrix, so the comparison is a bit problematic....
Anyway, it might be that there is a function for this if this is a common special case triangular matrix (my mathematical knowledge is limited here, sorry), but we can also build it quite easily (and efficiently=) ):
N = 10;
r = 2;
% allocate arry
ary = ones(1,N);
% initialize array
ary(2) = r;
for i = 3:N
ary(i) = ary(i-1)*r;
end
% build matrix i.e. copy the array
M = eye(N);
for i = 1:N
M(i,i:end) = ary(1:end-i+1);
end
This assumes that you want to have a matrix of size NxN and r is the value that you want calculate the power of.
FIX: a previous version stated in line 13 M(i,i:end) = ary(i:end);, but the assignment needs to start always at the first position of the ary

Is there a correlation ratio in MATLAB?

Is there any function in Matlab which calculates the correlation ratio?
Here is an implementation I tried to do, but the results are not right.
function cr = correlation_ratio(X, Y, L)
ni = zeros(1, L);
sigmai = ni;
for i = 0:(L-1)
Yn = Y(X == i);
ni(1, i+1) = numel(Yn);
m = (1/ni(1, i+1))*sum(Yn);
sigmai(1, i+1) = (1/ni(1, i+1))*sum((Yn - m).^2);
end
n = sum(ni);
prod = ni.*sigmai;
cr = (1-(1/n)*sum(prod))^0.5;
This is the equation on the Wikipedia page:
where:
η is the correlation ratio,
yx,i are the sample values (x is the class label, i the sample index),
yx (with the bar on top) is the mean of sample values for class x,
y (with the bar on top) is the mean for all samples across all classes, and
nx is the number of samples in class x.
This is how I interpreted it into code:
function eta = correlation_ratio(X, Y)
X = X(:); % make sure we've got column vectors, simplifies things below a bit
Y = Y(:);
L = max(X);
mYx = zeros(1, L+1); % we'll write mean per class here
nx = zeros(1, L+1); % we'll write number of samples per class here
for i = unique(X).'
Yn = Y(X == i);
if numel(Yn)>1
mYx(i+1) = mean(Yn);
nx(i+1) = numel(Yn);
end
end
mY = mean(Y); % mean across all samples
eta = sqrt(sum(nx .* (mYx - mY).^2) / sum((Y-mY).^2));
The loop could be replaced with accumarray.

adaptive elliptical structuring element in MATLAB

I'm trying to create an adaptive elliptical structuring element for an image to dilate or erode it. I write this code but unfortunately all of the structuring elements are ones(2*M+1).
I = input('Enter the input image: ');
M = input('Enter the maximum allowed semi-major axes length: ');
% determining ellipse parameteres from eigen value decomposition of LST
row = size(I,1);
col = size(I,2);
SE = cell(row,col);
padI = padarray(I,[M M],'replicate','both');
padrow = size(padI,1);
padcol = size(padI,2);
for m = M+1:padrow-M
for n = M+1:padcol-M
a = (l2(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
if e1(m-M,n-M,1)==0
phi = pi/2;
else
phi = atan(e1(m-M,n-M,2)/e1(m-M,n-M,1));
end
% defining structuring element for each pixel of image
x0 = m;
y0 = n;
se = zeros(2*M+1);
row_se = 0;
for i = x0-M:x0+M
row_se = row_se+1;
col_se = 0;
for j = y0-M:y0+M
col_se = col_se+1;
x = j-y0;
y = x0-i;
if ((x*cos(phi)+y*sin(phi))^2)/a^2+((x*sin(phi)-y*cos(phi))^2)/b^2 <= 1
se(row_se,col_se) = 1;
end
end
end
SE{m-M,n-M} = se;
end
end
a, b and phi are semi-major and semi-minor axes length and phi is angle between a and x axis.
I used 2 MATLAB functions to compute the Local Structure Tensor of the image, and then its eigenvalues and eigenvectors for each pixel. These are the matrices l1, l2, e1 and e2.
This is the bit of your code I didn't understand:
a = (l2(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
I simplified the expression for b to (just removing the indexing):
b = (l1+eps/l1+l2+2*eps)*M;
For l1 and l2 in the normal range we get:
b =(approx)= (l1+0/l1+l2+2*0)*M = (l1+l2)*M;
Thus, b can easily be larger than M, which I don't think is your intention. The eps in this case also doesn't protect against division by zero, which is typically the purpose of adding eps: if l1 is zero, eps/l1 is Inf.
Looking at this expression, it seems to me that you intended this instead:
b = (l1+eps)/(l1+l2+2*eps)*M;
Here, you're adding eps to each of the eigenvalues, making them guaranteed non-zero (the structure tensor is symmetric, positive semi-definite). Then you're dividing l1 by the sum of eigenvalues, and multiplying by M, which leads to a value between 0 and M for each of the axes.
So, this seems to be a case of misplaced parenthesis.
Just for the record, this is what you need in your code:
a = (l2(m-M,n-M)+eps ) / ( l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps ) / ( l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
^ ^
added parentheses
Note that you can simplify your code by defining, outside of the loops:
[se_x,se_y] = meshgrid(-M:M,-M:M);
The inner two loops, over i and j, to construct se can then be written simply as:
se = ((se_x.*cos(phi)+se_y.*sin(phi)).^2)./a.^2 + ...
((se_x.*sin(phi)-se_y.*cos(phi)).^2)./b.^2 <= 1;
(Note the .* and .^ operators, these do element-wise multiplication and power.)
A further slight improvement comes from realizing that phi is first computed from e1(m,n,1) and e1(m,n,2), and then used in calls to cos and sin. If we assume that the eigenvector is properly normalized, then
cos(phi) == e1(m,n,1)
sin(phi) == e1(m,n,2)
But you can always make sure they are normalized:
cos_phi = e1(m-M,n-M,1);
sin_phi = e1(m-M,n-M,2);
len = hypot(cos_phi,sin_phi);
cos_phi = cos_phi / len;
sin_phi = sin_phi / len;
se = ((se_x.*cos_phi+se_y.*sin_phi).^2)./a.^2 + ...
((se_x.*sin_phi-se_y.*cos_phi).^2)./b.^2 <= 1;
Considering trigonometric operations are fairly expensive, this should speed up your code a bit.

Optimized matrix operation of matrix with repeating elements in Matlab

I want to get exp() of a large matrix (A) with values that repeat at different indices. To speed-up the exp() operation I only perform it on the unique values of A and then reassemble the matrix. However the reassembly of the matrix is quite slow. The following code provides a working example:
% defintion of a grid
gridSp = 5:5:35*5;
X = repmat(gridSp,35,1);
Z = repmat(gridSp',1,35);
% calculation of the distances
locMat = [X(:) Z(:)];
dist=sqrt(bsxfun(#minus,locMat(:,1),locMat(:,1)').^2 +...
bsxfun(#minus,locMat(:,2),locMat(:,2)').^2);
sizeDist = size(dist);
uniqueDist = unique(dist,'stable');
[~, Locb] = ismember(dist,uniqueDist);
nn_A = exp(1i*2*pi*rand(sizeDist(1),100));
H_A = zeros(size(nn_A));
freq = linspace(10^-3,10,100);
psdA = 4096*length(freq).*10.*4.*22.6./((1 + 6.*freq*22.6).^(5/3));
for jj=1:100
b = exp(-8.8*uniqueDist*sqrt((freq(jj)/15).^2 + 10^-7));
b = b.*psdA(jj);
A = b(Locb);
droptol = max(A(:))*10^-10;
if min(A(:))<droptol
A = sparse(A);
HH_A = ichol(A,struct('type','ict','shape','lower','droptol',droptol));
else
HH_A = chol(A,'lower');
end
H_A(:,jj) = HH_A*nn_A(:,jj);
end
Especially the reassembly of the matrix
A = b(Locb);
and the conversion of the matrix to sparse
A = sparse(A);
in the last for-loop take up a lot of time. Is there a quicker way to do this? Interestingly:
B = A + A;
is much faster than
A = b(Locb);
I have to perfom these operations far more often than the 100 iterations in the example.
Here a condensed version of the code up on request (below).
% defintion of a grid
gridSp = 5:5:28*5;
X = repmat(gridSp,35,1);
Z = repmat(gridSp',1,35);
% calculation of the distances
locMat = [X(:) Z(:)];
dist=sqrt(bsxfun(#minus,locMat(:,1),locMat(:,1)').^2 +bsxfun(#minus,locMat(:,2),locMat(:,2)').^2);
uniqueDist = unique(dist,'stable');
[~, Locb] = ismember(dist,uniqueDist);
for jj=1:100
b = exp(jj.*uniqueDist);
A = b(Locb);
end
In your example, the dimension of dist is just 980 x 980 in which case you would be better off to just perform a dense matrix operation, i.e.
for jj=1:100
A=exp(jj*dist);
end
which is 2 times faster than
for jj=1:100
b = exp(jj.*uniqueDist);
A = b(Locb);
end
for your given example.

Normalizing a histogram in matlab

I have a histogram
hist(A, 801)
that currently resembles a normal curve but with max value at y = 1500, and mean at x = 0.5. I want to normalize it, so I tried
h = hist(A, 801)
h = h ./ sum(h)
bar(h)
now I get a normal curve with max at y = .03, but a mean at x = 450.
how do I decrease the frequency so the sum is 1, while retaining the same x range?
A is derived from
A = walk(50000, 800, .05, 2, .25, 0)
where
function [X_new] = walk(N_sim, N, mu, T, sigma, X_init)
delt = T/N;
up = sigma*sqrt(delt);
down = -sigma*sqrt(delt);
p = 1./2.*(1.+mu/sigma*sqrt(delt));
X_new = zeros(N_sim,1);
X_new(1:N_sim,1) = X_init;
ptest = zeros(N_sim,1);
for i = 1:N
ptest(:,1) = rand(N_sim,1);
ptest(:,1) = (ptest(:,1) <= p);
X_new(:,1) = X_new(:,1) + ptest(:,1)*up + (1.-ptest(:,1))*down;
end
The sum is 1 with your code as it stands.
You may want integral equal to 1 (so that you can compare with the theoretical pdf). In that case:
[h, c] = hist(A, 801); %// c contains bin centers. They are equally spaced
h = h / sum(h) / (c(2)-c(1)); %// normalize to area 1
trapz(c,h) %// compute integral. Should be approximately 1