Normalizing a histogram in matlab - matlab

I have a histogram
hist(A, 801)
that currently resembles a normal curve but with max value at y = 1500, and mean at x = 0.5. I want to normalize it, so I tried
h = hist(A, 801)
h = h ./ sum(h)
bar(h)
now I get a normal curve with max at y = .03, but a mean at x = 450.
how do I decrease the frequency so the sum is 1, while retaining the same x range?
A is derived from
A = walk(50000, 800, .05, 2, .25, 0)
where
function [X_new] = walk(N_sim, N, mu, T, sigma, X_init)
delt = T/N;
up = sigma*sqrt(delt);
down = -sigma*sqrt(delt);
p = 1./2.*(1.+mu/sigma*sqrt(delt));
X_new = zeros(N_sim,1);
X_new(1:N_sim,1) = X_init;
ptest = zeros(N_sim,1);
for i = 1:N
ptest(:,1) = rand(N_sim,1);
ptest(:,1) = (ptest(:,1) <= p);
X_new(:,1) = X_new(:,1) + ptest(:,1)*up + (1.-ptest(:,1))*down;
end

The sum is 1 with your code as it stands.
You may want integral equal to 1 (so that you can compare with the theoretical pdf). In that case:
[h, c] = hist(A, 801); %// c contains bin centers. They are equally spaced
h = h / sum(h) / (c(2)-c(1)); %// normalize to area 1
trapz(c,h) %// compute integral. Should be approximately 1

Related

Convolution of two dependent distributions in MATLAB

Assume that I have two discrete random variables X and Y.
X = {1,3,3,5,7,7,7,9,9,9,9,9}
and
Y = {5,5,9,9,10,12,13}
Where their empirical CDFs are given as:
F_x(1) = 0.0833, F_x(3) = 0.25, F_x(5) = 0.33, F_x(7) = 0.5833 and F_x(9) = 1
and
F_y(5) = 0.2857, F_y(9) = 0.5714, F_y(10) = 0.7143, F_y(12) = 0.8571 and F_y(13) = 1
Assuming their joint distribution is
H(x,y) = F_x(x) * F_y(y)
which is actually the "assumption" of X and Y are independent.
How can i calculate the Z = X + Y and F(z) in MATLAB ?
Note: I gave the H(x,y) as a simple product function for the simplicity, but it can be anything in reality which actually models the dependency between X and Y.
Given continuous probability density functions FX and FY, with a joint probability density function FX,Y, we can compute FX+Y as the integral of FX,Y over the line z=x+y. If the probability density functions are discrete, the integral above should be written as the derivative of the integral over the part of the plane given by z<=x+y.
This is fairly simple to do in MATLAB. Let's start with OP's data:
F_x = [0.0833,0.0833,0.25,0.25,0.33,0.33,0.5833,0.5833,1]; % CDF
F_x = diff([0,F_x]); % PDF
F_y = [0,0,0,0,0.2857,0.2857,0.2857,0.2857,0.5714,0.7143,0.7143,0.8571,1]; % CDF
F_y = diff([0,F_y]); % PDF
H = F_x.' .* F_y; % example joint PDF
Now we sum F_cum(z) = sum(H(x,y)) for all values z<=x+y, and then take the derivative F = diff([0,F_cum]):
[m,n] = size(H);
F_cum = zeros(1,m+n-1);
for z = 1:numel(F_cum)
s = 0;
for x = 1:numel(F_x)
y = z-x+1;
y = max(min(y,n),1); % avoid out of bounds indexing
s = s + sum(H(x,1:y));
end
F_cum(z) = s;
end
F = diff([0,F_cum]);
Note that we defined y=z-x+1, meaning z=y+x-1. Thus F(1) corresponds to z=2. This is the lowest possible value that can come out of the sum of the two distributions, which we defined to start at 1.
The above can be simplified by padding H with zeros and shifting each row by one additional element. This lines up the line z=x+y on a column of the matrix, allowing us to use a trivial sum projection:
H = [H,zeros(m)];
for ii=2:m
H(ii,:) = circshift(H(ii,:),ii-1);
end
F_cum = cumsum(sum(H,1));
F_cum = F_cum(1:end-1); % last element we don't need
F2 = diff([0,F_cum]);
But because diff([0,cumsum(F)]) == F (up to numerical precision), we can skip those two operations:
F3 = sum(H,1);
F3 = F3(1:end-1); % last element we don't need
(all(abs(F-F2)<1e-15) and all(abs(F-F3)<1e-16))

Is there a correlation ratio in MATLAB?

Is there any function in Matlab which calculates the correlation ratio?
Here is an implementation I tried to do, but the results are not right.
function cr = correlation_ratio(X, Y, L)
ni = zeros(1, L);
sigmai = ni;
for i = 0:(L-1)
Yn = Y(X == i);
ni(1, i+1) = numel(Yn);
m = (1/ni(1, i+1))*sum(Yn);
sigmai(1, i+1) = (1/ni(1, i+1))*sum((Yn - m).^2);
end
n = sum(ni);
prod = ni.*sigmai;
cr = (1-(1/n)*sum(prod))^0.5;
This is the equation on the Wikipedia page:
where:
η is the correlation ratio,
yx,i are the sample values (x is the class label, i the sample index),
yx (with the bar on top) is the mean of sample values for class x,
y (with the bar on top) is the mean for all samples across all classes, and
nx is the number of samples in class x.
This is how I interpreted it into code:
function eta = correlation_ratio(X, Y)
X = X(:); % make sure we've got column vectors, simplifies things below a bit
Y = Y(:);
L = max(X);
mYx = zeros(1, L+1); % we'll write mean per class here
nx = zeros(1, L+1); % we'll write number of samples per class here
for i = unique(X).'
Yn = Y(X == i);
if numel(Yn)>1
mYx(i+1) = mean(Yn);
nx(i+1) = numel(Yn);
end
end
mY = mean(Y); % mean across all samples
eta = sqrt(sum(nx .* (mYx - mY).^2) / sum((Y-mY).^2));
The loop could be replaced with accumarray.

Precompute weights for multidimensional linear interpolation

I have a non-uniform rectangular grid along D dimensions, a matrix of logical values V on the grid, and a matrix of query data points X. The number of grid points differs across dimensions.
I run the interpolation multiple times for the same grid G and query X, but for different values V.
The goal is to precompute the indexes and weights for the interpolation and to reuse them, because they are always the same.
Here is an example in 2 dimensions, in which I have to compute indexes and values every time within the loop, but I want to compute them only once before the loop. I keep the data types from my application (mostly single and logical gpuArrays).
% Define grid
G{1} = single([0; 1; 3; 5; 10]);
G{2} = single([15; 17; 18; 20]);
% Steps and edges are reduntant but help make interpolation a bit faster
S{1} = G{1}(2:end)-G{1}(1:end-1);
S{2} = G{2}(2:end)-G{2}(1:end-1);
gpuInf = 1e10;
% It's my workaround for a bug in GPU version of discretize in Matlab R2017a.
% It throws an error if edges contain Inf, realmin, or realmax. Seems fixed in R2017b prerelease.
E{1} = [-gpuInf; G{1}(2:end-1); gpuInf];
E{2} = [-gpuInf; G{2}(2:end-1); gpuInf];
% Generate query points
n = 50; X = gpuArray(single([rand(n,1)*14-2, 14+rand(n,1)*7]));
[G1, G2] = ndgrid(G{1},G{2});
for i = 1 : 4
% Generate values on grid
foo = #(x1,x2) (sin(x1+rand) + cos(x2*rand))>0;
V = gpuArray(foo(G1,G2));
% Interpolate
V_interp = interpV(X, V, G, E, S);
% Plot results
subplot(2,2,i);
contourf(G1, G2, V); hold on;
scatter(X(:,1), X(:,2),50,[ones(n,1), 1-V_interp, 1-V_interp],'filled', 'MarkerEdgeColor','black'); hold off;
end
function y = interpV(X, V, G, E, S)
y = min(1, max(0, interpV_helper(X, 1, 1, 0, [], V, G, E, S) ));
end
function y = interpV_helper(X, dim, weight, curr_y, index, V, G, E, S)
if dim == ndims(V)+1
M = [1,cumprod(size(V),2)];
idx = 1 + (index-1)*M(1:end-1)';
y = curr_y + weight .* single(V(idx));
else
x = X(:,dim); grid = G{dim}; edges = E{dim}; steps = S{dim};
iL = single(discretize(x, edges));
weightL = weight .* (grid(iL+1) - x) ./ steps(iL);
weightH = weight .* (x - grid(iL)) ./ steps(iL);
y = interpV_helper(X, dim+1, weightL, curr_y, [index, iL ], V, G, E, S) +...
interpV_helper(X, dim+1, weightH, curr_y, [index, iL+1], V, G, E, S);
end
end
I found a way to do this and posting it here because (as of now) two more people are interested. It takes only a slight modification to my original code (see below).
% Define grid
G{1} = single([0; 1; 3; 5; 10]);
G{2} = single([15; 17; 18; 20]);
% Steps and edges are reduntant but help make interpolation a bit faster
S{1} = G{1}(2:end)-G{1}(1:end-1);
S{2} = G{2}(2:end)-G{2}(1:end-1);
gpuInf = 1e10;
% It's my workaround for a bug in GPU version of discretize in Matlab R2017a.
% It throws an error if edges contain Inf, realmin, or realmax. Seems fixed in R2017b prerelease.
E{1} = [-gpuInf; G{1}(2:end-1); gpuInf];
E{2} = [-gpuInf; G{2}(2:end-1); gpuInf];
% Generate query points
n = 50; X = gpuArray(single([rand(n,1)*14-2, 14+rand(n,1)*7]));
[G1, G2] = ndgrid(G{1},G{2});
[W, I] = interpIW(X, G, E, S); % Precompute weights W and indexes I
for i = 1 : 4
% Generate values on grid
foo = #(x1,x2) (sin(x1+rand) + cos(x2*rand))>0;
V = gpuArray(foo(G1,G2));
% Interpolate
V_interp = sum(W .* single(V(I)), 2);
% Plot results
subplot(2,2,i);
contourf(G1, G2, V); hold on;
scatter(X(:,1), X(:,2), 50,[ones(n,1), 1-V_interp, 1-V_interp],'filled', 'MarkerEdgeColor','black'); hold off;
end
function [W, I] = interpIW(X, G, E, S)
global Weights Indexes
Weights=[]; Indexes=[];
interpIW_helper(X, 1, 1, [], G, E, S, []);
W = Weights; I = Indexes;
end
function [] = interpIW_helper(X, dim, weight, index, G, E, S, sizeV)
global Weights Indexes
if dim == size(X,2)+1
M = [1,cumprod(sizeV,2)];
Weights = [Weights, weight];
Indexes = [Indexes, 1 + (index-1)*M(1:end-1)'];
else
x = X(:,dim); grid = G{dim}; edges = E{dim}; steps = S{dim};
iL = single(discretize(x, edges));
weightL = weight .* (grid(iL+1) - x) ./ steps(iL);
weightH = weight .* (x - grid(iL)) ./ steps(iL);
interpIW_helper(X, dim+1, weightL, [index, iL ], G, E, S, [sizeV, size(grid,1)]);
interpIW_helper(X, dim+1, weightH, [index, iL+1], G, E, S, [sizeV, size(grid,1)]);
end
end
To do the task the whole process of interpolation ,except computing the interpolated values, should be done. Here is a solution translated from the Octave c++ source. Format of the input is the same as the frst signature of the interpn function except that there is no need to the v array. Also Xs should be vectors and should not be of the ndgrid format. Both the outputs W (weights) and I (positions) have the size (a ,b) that a is the number of neighbors of a points on the grid and b is the number of requested points to be interpolated.
function [W , I] = lininterpnw(varargin)
% [W I] = lininterpnw(X1,X2,...,Xn,Xq1,Xq2,...,Xqn)
n = numel(varargin)/2;
x = varargin(1:n);
y = varargin(n+1:end);
sz = cellfun(#numel,x);
scale = [1 cumprod(sz(1:end-1))];
Ni = numel(y{1});
index = zeros(n,Ni);
x_before = zeros(n,Ni);
x_after = zeros(n,Ni);
for ii = 1:n
jj = interp1(x{ii},1:sz(ii),y{ii},'previous');
index(ii,:) = jj-1;
x_before(ii,:) = x{ii}(jj);
x_after(ii,:) = x{ii}(jj+1);
end
coef(2:2:2*n,1:Ni) = (vertcat(y{:}) - x_before) ./ (x_after - x_before);
coef(1:2:end,:) = 1 - coef(2:2:2*n,:);
bit = permute(dec2bin(0:2^n-1)=='1', [2,3,1]);
%I = reshape(1+scale*bsxfun(#plus,index,bit), Ni, []).'; %Octave
I = reshape(1+sum(bsxfun(#times,scale(:),bsxfun(#plus,index,bit))), Ni, []).';
W = squeeze(prod(reshape(coef(bsxfun(#plus,(1:2:2*n).',bit),:).',Ni,n,[]),2)).';
end
Testing:
x={[1 3 8 9],[2 12 13 17 25]};
v = rand(4,5);
y={[1.5 1.6 1.3 3.5,8.1,8.3],[8.4,13.5,14.4,23,23.9,24.2]};
[W I]=lininterpnw(x{:},y{:});
sum(W.*v(I))
interpn(x{:},v,y{:})
Thanks to #SardarUsama for testing and his useful comments.

Calculate a tricky sum in Matlab

With the following variables:
m = 1:4; n = 1:32;
phi = linspace(0, 2*pi, 100);
theta = linspace(-pi, pi, 50);
S_mn = <a 4x32 coefficient matrix, corresponding to m and n>;
how do I compute the sum over m and n of S_mn*exp(1i*(m*theta + n*phi)), i.e.
I've thought of things like
[m, n] = meshgrid(m,n);
[theta, phi] = meshgrid(theta,phi);
r_mn = S_mn.*exp(1i*(m.*theta + n.*phi));
thesum = sum(r_mn(:));
but that requires theta and phi to have the same number of elements as m and n, and it gives me just one element in return - I want a matrix the the size of meshgrid(theta,phi), regardless of the sizes of theta and phi (i.e. I want to be able to evaluate the sum as a function of theta and phi).
How do I do this calculation in matlab?
Since I don't know what S is...
S = randn(4,32);
[m,n] = ndgrid(1:4,1:32);
fun = #(theta,phi) sum(sum(S.*exp(sqrt(-1)*(m*theta + n*phi))));
Works fine for me.
fun(pi,3*pi/2)
ans =
-15.8643373238676 - 1.45785698818839i
If you now wish to do this for a large set of values phi and theta, a pair of loops now are the trivial solution. Or, you can do it all in one computation, although the arrays will get larger. Still not hard. WTP?
You do realize that both meshgrid and ndgrid take more than just two arguments? So it is time to learn how to use bsxfun, and then squeeze.
[m,n,theta,phi] = ndgrid(1:4,1:32,linspace(-pi, pi, 50),linspace(0, 2*pi, 100));
res = bsxfun(#times,S,exp(sqrt(-1)*(m.*theta + n.*phi)));
res = squeeze(sum(sum(res,1),2));
Or do this, which will be a bit faster. The previous computation took my machine .07 seconds. This last one took .05, so some savings by using bsxfun heavily.
m = (1:4)';
n = 1:32;
[theta,phi] = ndgrid(linspace(-pi, pi, 50),linspace(0, 2*pi, 100));
theta = reshape(theta,[1,1,size(theta)]);
phi = reshape(phi,size(theta));
res = bsxfun(#plus,bsxfun(#times,m,theta*sqrt(-1)),bsxfun(#times,n,phi*sqrt(-1)));
res = bsxfun(#times,S,exp(res));
res = squeeze(sum(sum(res,1),2));
If you need to do the above 2000 times, so it should take 100 seconds to do. WTP? Get some coffee and relax.
First save the size of each variable:
size_m = size(m);
size_n = size(n);
size_theta = size(theta);
size_phi = size(phi);
Use ngrid function like this:
[theta, phi, m, n] = ngrid(theta, phi, m, n)
This will give you an array with 4 dimensions (one for each of your variables: theta, phi, m, n). Now you can calculate this:
m.*theta + n.*phi
Now you need to make S_mn have 4 dimensions with sizes size_theta, size_phi, size_m, size_n like this:
S_tpmn = repmat(S_mn, [size_theta size_phi size_m size_n]);
Now you can calculate your sum like this:
aux_sum = S_tpmn.*exp(1i*(m.*theta + n.*phi));
Finally you can sum along the last 2 dimensions (m and n) to get an array with 2 dimensions with size size_theta by size_phi:
final_sum = sum(sum(aux_sum, 4), 3);
Note: I don't have access to Matlab right now, so I can't test if this actually works.
There are several ways you could go about this.
One way is to create a function(-handle) that returns the sum as a function of theta and phi, and then use arrayfun to do the sums. Another is to fully vectorize the computation, though that will use more memory.
The arrayfun version:
[m, n] = meshgrid(m,n);
sumHandle = #(theta,phi)sum(reshape(...
S_mn.*exp(1i(m*theta + n*phi)),...
[],1))
[theta, phi] = meshgrid(theta,phi);
sumAsFunOfThetaPhi = arrayfun(sumHandle,theta,phi);
The vectorized version:
[m, n] = meshgrid(m,n);
m = permute(m(:),[2 4 1 3]); %# vector along dim 3
n = permute(n(:),[2 3 4 1]); %# vector along dim 4
S_mn = repmat( permute(S_mn,[3 4 1 2]), length(theta),length(phi));
theta = theta(:); %# vector along dim 1 (phi is along dim 2 b/c of linspace)
fullSum = S_mn.* exp( 1i*(...
bsxfun(#plus,...
bsxfun(#times, m, theta),...
bsxfun(#times, n, phi),...
)));
sumAsFunOfThetaPhi = sum(sum( fullSum, 3),4);

Least squares circle fitting using MATLAB Optimization Toolbox

I am trying to implement least squares circle fitting following this paper (sorry I can't publish it). The paper states, that we could fit a circle, by calculating the geometric error as the euclidean distance (Xi'') between a specific point (Xi) and the corresponding point on the circle (Xi'). We have three parametres: Xc (a vector of coordinates the center of circle), and R (radius).
I came up with the following MATLAB code (note that I am trying to fit circles, not spheres as it is indicated on the images):
function [ circle ] = fit_circle( X )
% Kör paraméterstruktúra inicializálása
% R - kör sugara
% Xc - kör középpontja
circle.R = NaN;
circle.Xc = [ NaN; NaN ];
% Kezdeti illesztés
% A köz középpontja legyen a súlypont
% A sugara legyen az átlagos négyzetes távolság a középponttól
circle.Xc = mean( X );
d = bsxfun(#minus, X, circle.Xc);
circle.R = mean(bsxfun(#hypot, d(:,1), d(:,2)));
circle.Xc = circle.Xc(1:2)+random('norm', 0, 1, size(circle.Xc));
% Optimalizáció
options = optimset('Jacobian', 'on');
out = lsqnonlin(#ort_error, [circle.Xc(1), circle.Xc(2), circle.R], [], [], options, X);
end
%% Cost function
function [ error, J ] = ort_error( P, X )
%% Calculate error
R = P(3);
a = P(1);
b = P(2);
d = bsxfun(#minus, X, P(1:2)); % X - Xc
n = bsxfun(#hypot, d(:,1), d(:,2)); % || X - Xc ||
res = d - R * bsxfun(#times,d,1./n);
error = zeros(2*size(X,1), 1);
error(1:2:2*size(X,1)) = res(:,1);
error(2:2:2*size(X,1)) = res(:,2);
%% Jacobian
xdR = d(:,1)./n;
ydR = d(:,2)./n;
xdx = bsxfun(#plus,-R./n+(d(:,1).^2*R)./n.^3,1);
ydy = bsxfun(#plus,-R./n+(d(:,2).^2*R)./n.^3,1);
xdy = (d(:,1).*d(:,2)*R)./n.^3;
ydx = xdy;
J = zeros(2*size(X,1), 3);
J(1:2:2*size(X,1),:) = [ xdR, xdx, xdy ];
J(2:2:2*size(X,1),:) = [ ydR, ydx, ydy ];
end
The fitting however is not too good: if I start with the good parameter vector the algorithm terminates at the first step (so there is a local minima where it should be), but if I perturb the starting point (with a noiseless circle) the fitting stops with very large errors. I am sure that I've overlooked something in my implementation.
For what it's worth, I implemented these methods in MATLAB a while ago. However, I did it apparently before I knew about lsqnonlin etc, as it uses a hand-implemented regression. This is probably slow, but may help to compare with your code.
function [x, y, r, sq_error] = circFit ( P )
%# CIRCFIT fits a circle to a set of points using least sqaures
%# P is a 2 x n matrix of points to be fitted
per_error = 0.1/100; % i.e. 0.1%
%# initial estimates
X = mean(P, 2)';
r = sqrt(mean(sum((repmat(X', [1, length(P)]) - P).^2)));
v_cen2points = zeros(size(P));
niter = 0;
%# looping until convergence
while niter < 1 || per_diff > per_error
%# vector from centre to each point
v_cen2points(1, :) = P(1, :) - X(1);
v_cen2points(2, :) = P(2, :) - X(2);
%# distacnes from centre to each point
centre2points = sqrt(sum(v_cen2points.^2));
%# distances from edge of circle to each point
d = centre2points - r;
%# computing 3x3 jacobean matrix J, and solvign matrix eqn.
R = (v_cen2points ./ [centre2points; centre2points])';
J = [ -ones(length(R), 1), -R ];
D_rXY = -J\d';
%# updating centre and radius
r_old = r; X_old = X;
r = r + D_rXY(1);
X = X + D_rXY(2:3)';
%# calculating maximum percentage change in values
per_diff = max(abs( [(r_old - r) / r, (X_old - X) ./ X ])) * 100;
%# prevent endless looping
niter = niter + 1;
if niter > 1000
error('Convergence not met in 1000 iterations!')
end
end
x = X(1);
y = X(2);
sq_error = sum(d.^2);
This is then run with:
X = [1 2 5 7 9 3];
Y = [7 6 8 7 5 7];
[x_centre, y_centre, r] = circFit( [X; Y] )
And plotted with:
[X, Y] = cylinder(r, 100);
scatter(X, Y, 60, '+r'); axis equal
hold on
plot(X(1, :) + x_centre, Y(1, :) + y_centre, '-b', 'LineWidth', 1);
Giving: