I am trying to generate a set of points, where groups of m points are evenly distributed over a large area. I have solved the problem (solution below), but I am looking for a more elegant or at least faster solution.
Say we have 9 points we want to place in groups of 3 in an area specified by x=[0,5] and y=[0,5]. Then I first generate a mesh in this area
meshx = 0:0.01:5;
meshy = 0:0.01:5;
[X,Y] = meshgrid(meshx,meshy);
X = X(:); Y = Y(:);
Then to place the 9/3=3 groups evenly I apply kmeans clustering
idx = kmeans([X,Y],3);
Then for each cluster, I can now draw a random sample of 3 points, which I save to a list:
pos = zeros(9,2);
for i = 1:max(idx)
spaceX = X(idx==i);
spaceY = Y(idx==i);
%on = convhulln([spaceX,spaceY]);
%plot(spaceX(on),spaceY(on),'black')
%hold on
sample = datasample([spaceX,spaceY],3,1);
%plot(sample(:,1),sample(:,2),'black*')
%hold on
pos((i-1)*3+1:i*3,:) = sample;
end
If you uncomment the comments, then the code will also plot the clusters and the location of points within. My problem is as mentioned to primarily avoid having to cluster a rather fine uniform grid to make the code more efficient.
Instead of kmeans you can use atan2 :
x= -10:10;
idx=ceil((bsxfun(#atan2,x,x.')+pi)*(3/(2*pi)));
imshow(idx,[])
Related
Suppose that I have generated some data in matlab as follows:
n = 100;
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];
plot(x,y,'rx')
axis([0 100 0 1])
Now I want to generate an algorithm to classify all these data into some clusters(which are arbitrary) in a way such that a point be a member of a cluster only if the distance between this point and at least one of the members of the cluster be less than 10.How could I generate the code?
The clustering method you are describing is DBSCAN. Note that this algorithm will find only one cluster in provided data, since it's very unlikely that there is a point in the dataset so that its distance to all other points is more than 10.
If this is really what you want, you can use ِDBSCAN, or the one posted in FE, if you are using versions older than 2019a.
% Generating random points, almost similar to the data provided by OP
data = bsxfun(#times, rand(100, 2), [100 1]);
% Adding more random points
for i=1:5
mu = rand(1, 2)*100 -50;
A = rand(2)*5;
sigma = A*A'+eye(2)*(1+rand*2);%[1,1.5;1.5,3];
data = [data;mvnrnd(mu,sigma,20)];
end
% clustering using DBSCAN, with epsilon = 10, and min-points = 1 as
idx = DBSCAN(data, 10, 1);
% plotting clusters
numCluster = max(idx);
colors = lines(numCluster);
scatter(data(:, 1), data(:, 2), 30, colors(idx, :), 'filled')
title(['No. of Clusters: ' num2str(numCluster)])
axis equal
The numbers in above figure shows the distance between closest pairs of points in any two different clusters.
The Matlab built-in function clusterdata() works well for what you're asking.
Here is how to apply it to your example:
% number of points
n = 100;
% create the data
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];
% the number of clusters you want to create
num_clusters = 5;
T1 = clusterdata(data,'Criterion','distance',...
'Distance','euclidean',...
'MaxClust', num_clusters)
scatter(x, y, 100, T1,'filled')
In this case, I used 5 clusters and used the Euclidean distance to be the metric to group the data points, but you can always change that (see documentation of clusterdata())
See the result below for 5 clusters with some random data.
Note that the data is skewed (x-values are from 0 to 100, and y-values are from 0 to 1), so the results are also skewed, but you could always normalize your data.
Here is a way using the connected components of graph:
D = pdist2(x, y) < 10;
D(1:size(D,1)+1:end) = 0;
G = graph(D);
C = conncomp(G);
The connected components is vector that shows the cluster numbers.
Use pdist2 to compute distance matrix of x and y.
Use the distance matrix to create a logical adjacency matrix that shows two point are neighbors if distance between them is less than 10.
Set the diagonal elements of the adjacency matrix to 0 to eliminate self loops.
Create a graph from the adjacency matrix.
Compute the connected components of graph.
Note that using pdist2 for large datasets may not be applicable and you need to use other methods to form a sparse adjacency matrix.
I notified after posing my answer the answer provided by #saastn suggested to use DBSCAN algorithm that nearly follows the same approach.
I am generating two histograms using the histogram function from Matlab that are both normalized using the probability argument.
However, once I generate two histograms as shown below, I'd like to be able to find the exact point at which the histograms would cross paths, assuming the histograms were drawn using lines instead of bars. Unfortunately, this form of histogram doesn't allow for lines, it just has bars. There is a hist function which can be manipulated in Matlab to draw a histogram as lines instead of bars, however, it doesn't easily normalize.
Hence, ideally, I'd like to use histogram() to plot the 2 histograms and find where they cross. See image below:
Here's an example of how the graphs can be created:
x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);
h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h2.Normalization = 'probability';
h2.BinWidth = 0.25;
Now from here, I want to find the point where the two histograms cross paths. Note, the intersection value is the intersection (in the mathematical sense). This is not what I'm looking for. I'm looking for the x coordinate of where the two histograms cross at their outer boundaries. For example, in the attached image, the answer would be ~2.5.
From your example data, with a simple modification:
x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);
h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h1.BinLimits=[min([x(:); y(:)]) max([x(:); y(:)])];
h2.Normalization = 'probability';
h2.BinWidth = 0.25;
h2.BinLimits=[min([x(:); y(:)]) max([x(:); y(:)])];
data1=h1.Values;
data2=h2.Values;
intersection_value=find(data2>data1,1); % this is the index, bad variable name
I would like to populate random points on a 2D plot, in such a way that the points fall in proximity of a "C" shaped polyline.
I managed to accomplish this for a rather simple square shaped "C":
This is how I did it:
% Marker color
c = 'k'; % Black
% Red "C" polyline
xl = [8,2,2,8];
yl = [8,8,2,2];
plot(xl,yl,'r','LineWidth',2);
hold on;
% Axis settings
axis equal;
axis([0,10,0,10]);
set(gca,'xtick',[],'ytick',[]);
step = 0.05; % Affects point quantity
coeff = 0.9; % Affects point density
% Top Horizontal segment
x = 2:step:9.5;
y = 8 + coeff*randn(size(x));
scatter(x,y,'filled','MarkerFaceColor',c);
% Vertical segment
y = 1.5:step:8.5;
x = 2 + coeff*randn(size(y));
scatter(x,y,'filled','MarkerFaceColor',c);
% Bottom Horizontal segment
x = 2:step:9.5;
y = 2 + coeff*randn(size(x));
scatter(x,y,'filled','MarkerFaceColor',c);
hold off;
As you can see in the code, for each segment of the polyline I generate the scatter point coordinates artificially using randn.
For the previous example, splitting the polyline into segments and generating the points manually is fine. However, what if I wanted to experiment with a more sophisticated "C" shape like this one:
Note that with my current approach, when the geometric complexity of the polyline increases so does the coding effort.
Before going any further, is there a better approach for this problem?
A simpler approach, which generalizes to any polyline, is to run a loop over the segments. For each segment, r is its length, and m is the number of points to be placed along that segment (it closely corresponds to the prescribed step size, with slight deviation in case the step size does not evenly divide the length). Note that both x and y are subject to random perturbation.
for n = 1:numel(xl)-1
r = norm([xl(n)-xl(n+1), yl(n)-yl(n+1)]);
m = round(r/step) + 1;
x = linspace(xl(n), xl(n+1), m) + coeff*randn(1,m);
y = linspace(yl(n), yl(n+1), m) + coeff*randn(1,m);
scatter(x,y,'filled','MarkerFaceColor',c);
end
Output:
A more complex example, using coeff = 0.4; and xl = [8,4,2,2,6,8];
yl = [8,6,8,2,4,2];
If you think this point cloud is too thin near the endpoints, you can artifically lengthen the first and last segments before running the loop. But I don't see the need: it makes sense that the fuzzied curve is thinning out at the extremities.
With your original approach, two places with the same distance to a line can sampled with a different probability, especially at the corners where two lines meet. I tried to fix this rephrasing the random experiment. The random experiment my code does is: "Pick a random point. Accept it with a probability of normpdf(d)<rand where d is the distance to the next line". This is a rejection sampling strategy.
xl = [8,4,2,2,6,8];
yl = [8,6,8,2,4,2];
resolution=50;
points_to_sample=200;
step=.5;
sigma=.4; %lower value to get points closer to the line.
xmax=(max(xl)+2);
ymax=(max(yl)+2);
dist=zeros(xmax*resolution+1,ymax*resolution+1);
x=[];
y=[];
for n = 1:numel(xl)-1
r = norm([xl(n)-xl(n+1), yl(n)-yl(n+1)]);
m = round(r/step) + 1;
x = [x,round(linspace(xl(n)*resolution+1, xl(n+1)*resolution+1, m*resolution))];
y = [y,round(linspace(yl(n)*resolution+1, yl(n+1)*resolution+1, m*resolution))];
end
%dist contains the lines:
dist(sub2ind(size(dist),x,y))=1;
%dist contains the normalized distance of each rastered pixel to the line.
dist=bwdist(dist)/resolution;
pseudo_pdf=normpdf(dist,0,sigma);
%scale up to have acceptance rate of 1 for most likely pixels.
pseudo_pdf=pseudo_pdf/max(pseudo_pdf(:));
sampled_points=zeros(0,2);
while size(sampled_points,1)<points_to_sample
%sample a random point
sx=rand*xmax;
sy=rand*ymax;
%accept it if criteria based on normal distribution matches.
if pseudo_pdf(round(sx*resolution)+1,round(sy*resolution)+1)>rand
sampled_points(end+1,:)=[sx,sy];
end
end
plot(xl,yl,'r','LineWidth',2);
hold on
scatter(sampled_points(:,1),sampled_points(:,2),'filled');
I have generated a data set in matlab then some outliers embedding in the data. I would like to plot it and since I'm new in matlab I don't know how to specify the outliers from inliers by different sign or different color. The points which are outlyingness with respect to the x axis, y axis and both of them. This is the matlab codes for that;
pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);
a(b)=random(pd1,40,1);
a=reshape(a,[50,2]);
plot(a(:,1),a(:,2),'O')
I would be appreciated if you could help me.
In this example I assumed that the points which distance along OX axis is greater than 3 are outliers and marked them red (whereas normal points are marked blue):
centroid = mean(a);
distx = a(:,1) - centroid(1);
disty = a(:,2) - centroid(2);
outliers_x = distx > 3;
plot(centroid(1), centroid(2), 'xk')
hold on
plot(a(outliers_x,1),a(outliers_x,2),'or')
plot(a(~outliers_x,1),a(~outliers_x,2),'ob')
hold off
Note that I've also displayed the centroid as a black "X" mark.
hold on/hold off are used to "stack" several plots (or images) together
You may want to read hold() reference. Also here you'll find which markers and colors are available.
To answer to my question I have written the following codes, in order to specify 4 groups of observations with different color.
pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);
a(b)=random(pd1,40,1);
a=reshape(a,[50,2]);
hold all;
aa=(a >= 10 | a >= 10);
rep=repmat(0, 1, 50);
aaa=[rep',aa];
n=50;
for i=1:n; plot(a(i,1),a(i,2),'o','col',aaa(i,:));
end
I am having difficulty with calculating 2D area of contours produced from a Kernel Density Estimation (KDE) in Matlab. I have three variables:
X and Y = meshgrid which variable 'density' is computed over (256x256)
density = density computed from the KDE (256x256)
I run the code
contour(X,Y,density,10)
This produces the plot that is attached. For each of the 10 contour levels I would like to calculate the area. I have done this in some other platforms such as R but am having trouble figuring out the correct method / syntax in Matlab.
C = contourc(density)
I believe the above line would store all of the values of the contours allowing me to calculate the areas but I do not fully understand how these values are stored nor how to get them properly.
This little script will help you. Its general for contour. Probably working for contour3 and contourf as well, with adjustments of course.
[X,Y,Z] = peaks; %example data
% specify certain levels
clevels = [1 2 3];
C = contour(X,Y,Z,clevels);
xdata = C(1,:); %not really useful, in most cases delimters are not clear
ydata = C(2,:); %therefore further steps to determine the actual curves:
%find curves
n(1) = 1; %n: indices where the certain curves start
d(1) = ydata(1); %d: distance to the next index
ii = 1;
while true
n(ii+1) = n(ii)+d(ii)+1; %calculate index of next startpoint
if n(ii+1) > numel(xdata) %breaking condition
n(end) = []; %delete breaking point
break
end
d(ii+1) = ydata(n(ii+1)); %get next distance
ii = ii+1;
end
%which contourlevel to calculate?
value = 2; %must be member of clevels
sel = find(ismember(xdata(n),value));
idx = n(sel); %indices belonging to choice
L = ydata( n(sel) ); %length of curve array
% calculate area and plot all contours of the same level
for ii = 1:numel(idx)
x{ii} = xdata(idx(ii)+1:idx(ii)+L(ii));
y{ii} = ydata(idx(ii)+1:idx(ii)+L(ii));
figure(ii)
patch(x{ii},y{ii},'red'); %just for displaying purposes
%partial areas of all contours of the same plot
areas(ii) = polyarea(x{ii},y{ii});
end
% calculate total area of all contours of same level
totalarea = sum(areas)
Example: peaks (by Matlab)
Level value=2 are the green contours, the first loop gets all contour lines and the second loop calculates the area of all green polygons. Finally sum it up.
If you want to get all total areas of all levels I'd rather write some little functions, than using another loop. You could also consider, to plot just the level you want for each calculation. This way the contourmatrix would be much easier and you could simplify the process. If you don't have multiple shapes, I'd just specify the level with a scalar and use contour to get C for only this level, delete the first value of xdata and ydata and directly calculate the area with polyarea
Here is a similar question I posted regarding the usage of Matlab contour(...) function.
The main ideas is to properly manipulate the return variable. In your example
c = contour(X,Y,density,10)
the variable c can be returned and used for any calculation over the isolines, including area.