I have points distributed over a square according to some point process, probably Poisson point process. I want to divide the square into smaller squares and calculate the number of points in each sub-square. Is there an easy way to do it in Matlab by probably a built-in function?
As Nras said, the built-in command hist3 does exactly what you want. To demonstrate its use, I generated points with uniformly distributed polar radius and polar angle:
n = 100000;
r = rand(n,1);
theta = 2*pi*rand(n,1);
points = [r.*cos(theta), r.*sin(theta)];
hist3(points,[15,15]); % [15,15] is the number of bins in each direction
The graphic output is below. If you want the actual counts instead of a picture, replace the last command with
counts = hist3(points,[15,15]);
Related
I am trying to randomly generate uniformly distributed vectors, which are of Euclidian length of 1. By uniformly distributed I mean that each entry (coordinate) of the vectors is uniformly distributed.
More specifically, I would like to create a set of, say, 1000 vectors (lets call them V_i, with i=1,…,1000), where each of these random vectors has unit Euclidian length and the same dimension V_i=(v_1i,…,v_ni)' (let’s say n = 5, but the algorithm should work with any dimension). If we then look on the distribution of e.g. v_1i, the first element of each V_i, then I would like that this is uniformly distributed.
In the attached MATLAB example you see that you cannot simply draw random vectors from a uniform distribution and then normalize the vectors to Euclidian length of 1, as the distribution of the elements across the vectors is then no longer uniform.
Is there a way to generate this set of vectors such, that the distribution of the single elements across the vector-set is uniform?
Thank you for any ideas.
PS: MATLAB is our Language of choice, but solutions in any languages are, of course, welcome.
clear all
rng('default')
nvar=5;
sample = 1000;
x = zeros(nvar,sample);
for ii = 1:sample
y=rand(nvar,1);
x(:,ii) = y./norm(y);
end
hist(x(1,:))
figure
hist(x(2,:))
figure
hist(x(3,:))
figure
hist(x(4,:))
figure
hist(x(5,:))
What you want cannot be accomplished.
Vectors with a length of 1 sit on a circle (or sphere or hypersphere depending on the number of dimensions). Let's focus on the 2D case, if it cannot be done there, it will be clear that it cannot be done with more dimensions either.
Because the points are on a circle, their x and y coordinates are dependent, the one can be computed based on the other. Thus, the distributions of x and y coordinates cannot be defined independently. We can define the distribution of the one, generate random values for it, but the other coordinate must be computed from the first.
Let's make points on a half circle with a uniform x coordinate (can be extended to a full circle by adding a random sign to the y coordinate):
N = 1000;
x = 2 * rand(N,1) - 1;
y = sqrt(1 - x.^2);
plot(x,y,'.')
axis equal
histogram(y)
The plot generates shows a clearly non-uniform distribution, with many more samples generated near y=1 than near y=0. If we add a random sign to the y-coordinate we'd have more samples near y=1 and y=-1 than near y=0.
Consider the following draws for a 2x1 vector in Matlab with a probability distribution that is a mixture of two Gaussian components.
P=10^3; %number draws
v=1;
%First component
mu_a = [0,0.5];
sigma_a = [v,0;0,v];
%Second component
mu_b = [0,8.2];
sigma_b = [v,0;0,v];
%Combine
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws
RV_temp = random(obj,P);%Px2
% Transform each component of RV_temp into a uniform in [0,1] by estimating the cdf.
RV1=ksdensity(RV_temp(:,1), RV_temp(:,1),'function', 'cdf');
RV2=ksdensity(RV_temp(:,2), RV_temp(:,2),'function', 'cdf');
Now, if we check whether RV1 and RV2 are uniformly distributed on [0,1] by doing
ecdf(RV1)
ecdf(RV2)
we can see that RV1 is uniformly distributed on [0,1] (the empirical cdf is close to the 45 degree line) while RV2 is not.
I don't understand why. It seems that the more distant are mu_a(2)and mu_b(2), the worse the job done by ksdensity with a reasonable number of draws. Why?
When you have a mixture of N(0.5,v) and N(8.2,v) then the range of the generated data is larger than if you had expectation which were closer, like N(0,v) and N(0,v), as you have in the other dimension. Then you ask ksdensity to approximate a function using P points inside this range.
Like in standard linear interpolation, the denser the points the better approximation of the function (inside the range), this is the same case here. Thus in the N(0.5,v) and N(8.2,v) where the points are "sparse" (or sparser, is that a word?) the approximation is worse than in the N(0,v) and N(0,v) where the points are denser.
As a small side note, are there any reason that you do not apply ksdensity directly on the bivariate data? Also I cannot reproduce your comment where you say that 5e2points are also good. Final comment, 1e3 is typically prefered over 10^3.
I think this is simply about the number of samples you're using. For the first example, the means of the two Gaussians are relatively close, hence a thousand samples are enough to obtain a cdf really close the the U[0,1] cdf. On the second vector though, you have a higher difference, and need more samples. With 100000 samples, I obtained the following result:
With 1000 I obtained this:
Which is clearly farther from the Uniform cdf function. Try to increase the number of samples to a million and check if the result is again getting closer.
I have to plot 10 frequency distributions on one graph. In order to keep things tidy, I would like to avoid making a histogram with bins and would prefer having lines that follow the contour of each histogram plot.
I tried the following
[counts, bins] = hist(data);
plot(bins, counts)
But this gives me a very inexact and jagged line.
I read about ksdensity, which gives me a nice curve, but it changes the scaling of my y-axis and I need to be able to read the frequencies from the y-axis.
Can you recommend anything else?
You're using the default number of bins for your histogram and, I will assume, for your kernel density estimation calculations.
Depending on how many data points you have, that will certainly not be optimal, as you've discovered. The first thing to try is to calculate the optimum bin width to give the smoothest curve while simultaneously preserving the underlying PDF as best as possible. (see also here, here, and here);
If you still don't like how smooth the resulting plot is, you could try using the bins output from hist as a further input to ksdensity. Perhaps something like this:
[kcounts,kbins] = ksdensity(data,bins,'npoints',length(bins));
I don't have your data, so you may have to play with the parameters a bit to get exactly what you want.
Alternatively, you could try fitting a spline through the points that you get from hist and plotting that instead.
Some code:
data = randn(1,1e4);
optN = sshist(data);
figure(1)
[N,Center] = hist(data);
[Nopt,CenterOpt] = hist(data,optN);
[f,xi] = ksdensity(data,CenterOpt);
dN = mode(diff(Center));
dNopt = mode(diff(CenterOpt));
plot(Center,N/dN,'.-',CenterOpt,Nopt/dNopt,'.-',xi,f*length(data),'.-')
legend('Default','Optimum','ksdensity')
The result:
Note that the "optimum" bin width preserves some of the fine structure of the distribution (I had to run this a couple times to get the spikes) while the ksdensity gives a smooth curve. Depending on what you're looking for in your data, that may be either good or bad.
How about interpolating with splines?
nbins = 10; %// number of bins for original histogram
n_interp = 500; %// number of values for interpolation
[counts, bins] = hist(data, nbins);
bins_interp = linspace(bins(1), bins(end), n_interp);
counts_interp = interp1(bins, counts, bins_interp, 'spline');
plot(bins, counts) %// original histogram
figure
plot(bins_interp, counts_interp) %// interpolated histogram
Example: let
data = randn(1,1e4);
Original histogram:
Interpolated:
Following your code, the y axis in the above figures gives the count, not the probability density. To get probability density you need to normalize:
normalization = 1/(bins(2)-bins(1))/sum(counts);
plot(bins, counts*normalization) %// original histogram
plot(bins_interp, counts_interp*normalization) %// interpolated histogram
Check: total area should be approximately 1:
>> trapz(bins_interp, counts_interp*normalization)
ans =
1.0009
I'm trying to write a script so that one can put his hand on the screen, click a few points with ginput, and have matlab generate an outline of the persons hand using splines. However, I'm quite unsure how you can have splines connect points that result from your clicks, as they of course are described by some sort of parametrization. How can you use the spline command built into matlab when the points aren't supposed to be connected 'from left to right'?
The code I have so far is not much, it just makes a box and lets you click some points
FigHandle = figure('Position', [15,15, 1500, 1500]);
rectangle('Position',[0,0,40,40])
daspect([1,1,1])
[x,y] = ginput;
So I suppose my question is really what to do with x and y so that you can spline them in such a way that they are connected 'chronologically'. (And, in the end, connecting the last one to the first one)
look into function cscvn
curve = cscvn(points)
returns a parametric variational, or natural, cubic spline curve (in ppform) passing through the given sequence points(:j), j = 1:end.
An excellent example here:
http://www.mathworks.com/help/curvefit/examples/constructing-spline-curves-in-2d-and-3d.html
I've found an alternative for using the cscvn function.
Using a semi-arclength parametrisation, I can create the spline from the arrays x and y as follows:
diffx = diff(x);
diffy = diff(y);
t = zeros(1,length(x)-1);
for n = 1:length(x)-1
t(n+1) = t(n) + sqrt(diffx(n).^2+diffy(n).^2);
end
tj = linspace(t(1),t(end),300);
xj = interp1(t,x,tj,'spline');
yj = interp1(t,y,tj,'spline');
plot(x,y,'b.',xj,yj,'r-')
This creates pretty decent outlines.
What this does is use the fact that a curve in the plane can be approximated by connecting a finite number of points on the curve using line segments to create a polygonal path. Using this we can parametrize the points (x,y) in terms of t. As we only have a few points to create t from, we create more by adding linearly spaced points in between. Using the function interp1, we then find the intermediate values of x and y that correspond to these linearly spaced t, ti.
Here is an example of how to do it using linear interpolation: Interpolating trajectory from unsorted array of 2D points where order matters. This should get you to the same result as plot(x,y).
The idea in that post is to loop through each consecutive pair of points and interpolate between just those points. You might be able to adapt this to work with splines, you need to give it 4 points each time though which could cause problems since they could double back.
To connect the start and end though just do this before interpolating:
x(end+1) = x(1);
y(end+1) = y(1);
I wanted to generate a set of coordinates distributed uniformly at random within a ball of radius R. Is there any way to do this in Matlab without for loops, in a matrix-like form?
Thanks
UPDATE:
I'm sorry for the confusion. I only need to generate n points uniformly at random over a circle of radius R, not a sphere.
the correct answer is here http://mathworld.wolfram.com/DiskPointPicking.html. The distribution is known as "Disk point picking"
I was about to mark this as a duplicate of a previous question on generating uniform distribution of points in a sphere, but I think you deserve the benefit of doubt here, as although there's a matlab script in the question, most of that thread is python.
This little function given in the question (and I'm pasting it directly from there), is what you need.
function X = randsphere(m,n,r)
% This function returns an m by n array, X, in which
% each of the m rows has the n Cartesian coordinates
% of a random point uniformly-distributed over the
% interior of an n-dimensional hypersphere with
% radius r and center at the origin. The function
% 'randn' is initially used to generate m sets of n
% random variables with independent multivariate
% normal distribution, with mean 0 and variance 1.
% Then the incomplete gamma function, 'gammainc',
% is used to map these points radially to fit in the
% hypersphere of finite radius r with a uniform % spatial distribution.
% Roger Stafford - 12/23/05
X = randn(m,n);
s2 = sum(X.^2,2);
X = X.*repmat(r*(gammainc(s2/2,n/2).^(1/n))./sqrt(s2),1,n);
To learn why you can't just use uniform random variable for all three co-ordinates as one might think is the correct way, give this article a read.
For the sake of completeness, here is some MATLAB code for a point-culling solution. It generates a set of random points within a unit cube, removes points that are outside a unit sphere, and scales the coordinate points up to fill a sphere of radius R:
XYZ = rand(1000,3)-0.5; %# 1000 random 3-D coordinates
index = (sum(XYZ.^2,2) <= 0.25); %# Find the points inside the unit sphere
XYZ = 2*R.*XYZ(index,:); %# Remove points and scale the coordinates
One key drawback to this point-culling method is that it makes it difficult to generate a specific number of points. For example, if you want to generate 1000 points within your sphere, how many do you have to create in the cube before culling them? If you scale up the number of points generated in the cube by a factor of 6/pi (i.e. the ratio of the volume of a unit cube to a unit sphere), then you can get close to the number of desired points in the sphere. However, since we're dealing with (pseudo)random numbers after all, we can never be absolutely certain we will generate enough points that fall in the sphere.
In short, if you want to generate a specific number of points, I'd try out one of the other solutions suggested. Otherwise, the point-culling solution is nice and simple.
Not sure if I understand your question correctly, but can't you just generate any random number inside a sphere by setting φ, θ and r, assigned to random numbers?