I need to spectral clustering for two donuts shape dataset.(Matlab) - matlab

I have tried hours but I cannot find solution.
I have "two Donuts" Data sample (variable "X")
you can download file below link
donut dataset(rings.mat)
which spreads to 2D shape like below image
First 250pts are located inside donuts and last 750 pts are located outside donuts.
and I need to perform spectral clustering.
I made (similarity matrix "W") with Gaussian similarity distance.
and I made degree matrix by sum of each raw of "W"
and then I computed eigen value(E) and eigen Vector(V)
and the shape of "V" is not good.
what is wrong with my trial???
I cannot figure out.
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 1;
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);

If you use the Laplacian like it is in your code (the "real" laplacian), then to cluster your points into two sets you will want the eigenvector corresponding to second smallest eigenvalue.
The intuitive idea is to connect all of your points to each other with springs, where the springs are stiffer if the points are near each other, and less stiff for points far away. The eigenvectors of the Laplacian are the modes of vibration if you hit your spring network with a hammer and watch it oscillate - smaller eigenvalues corresponding to lower frequency "bulk" modes, and larger eigenvalues corresponding to higher frequency oscillations. You want the eigenvalue corresponding to the second smallest eigenvalue, which will be like the second mode in a drum, with a positive clustered together, and negative part clustered together.
Now there is some confusion in the comments about whether to use the largest or smallest eigenvalue, and it is because the laplacian in the paper linked there by dave is slightly different, being the identity minus your laplacian. So there they want the largest ones, whereas you want the smallest. The clustering in the paper is also a bit more advanced, and better, but not as easy to implement.
Here is your code, modified to work:
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 0.3; % <--- Changed to be smaller
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);
% New code below this point
cluster1 = find(u(:,end-1) >= 0);
cluster2 = find(u(:,end-1) < 0);
figure
plot(X(1,cluster1),X(2,cluster1),'.b')
hold on
plot(X(1,cluster2),X(2,cluster2),'.r')
hold off
title(sprintf('sigma=%d',sigma))
Here is the result:
Now notice that I changed sigma to be smaller - from 1.0 to 0.3. When I left it at 1.0, I got the following result:
which I assume is because with sigma=1, the points in the inner cluster were able to "pull" on the outer cluster (which they are about distance 1 away from) enough so that it was more energetically favorable to split both circles in half like a solid vibrating drum, rather than have two different circles.

Related

Creating histograms of distance from the origin for 2D Random Walkers

Let's say you can show the distribution in space of the the positions of a large number of random walkers at three different time points. This was provided an answer to my previous question and with some tweaks is beautiful.
clc;
close all;
M = 1000; % The amount of random walks.
steps = [100,200,300]; % here we analyse the step 10,200 and 1000
cc = hsv(length(steps)); % manage the color of the plot
%generation of each random walk
x = sign(randn(max(steps),M));
y = sign(randn(max(steps),M));
xs = cumsum(x);
xval = xs(steps,:);
ys = cumsum(y);
yval = ys(steps,:);
hold on
for n=1:length(steps)
plot(xval(n,:),yval(n,:),'o','markersize',1,'color',cc(n,:),'MarkerFaceColor',cc(n,:));
end
legend('100','200','300')
axis square
grid on;
Now to the question, could I in some way use the hist() and subplot() functions to show the distance from the origin of the random walkers at three separate time points, or more I guess, but three for simplicity.
I'm not sure how to go about this beyond producing distributions of random walkers at the three time points themselves so far.
I hope that I've understand your question, I think that you want to use the bar plot with the stack option.
I've used the answer of #LuisMendo to my question to increase the code efficiency.
steps = [10,200,1000]; % the steps
M = 5000; % Number of random walk
DV = [-1 1]; % Discrete value
p = .5; % probability of DV(2)
% Using the #LuisMendo binomial solution:
for ii = 1:length(steps)
xval(ii,:) = (DV(2)-DV(1))*binornd(steps(ii), p, M, 1)+DV(1)*steps(ii);
yval(ii,:) = (DV(2)-DV(1))*binornd(steps(ii), p, M, 1)+DV(1)*steps(ii);
end
[x, cen] = hist(sqrt(xval.^2+yval.^2).'); %where `sqrt(xval.^2+yval.^2)` is the euclidian distance
bar(cen,x,'stacked');
legend('10','200','1000')
axis square
grid on;
Increase the # of bins in the histogram function to increase the plot precision.
Results:

Morlet wavelet transformation function returns nonsensical plot

I have written a matlab function (Version 7.10.0.499 (R2010a)) to evaluate incoming FT signal and calculate the morlet wavelet for the signal. I have a similar program, but I needed to make it more readable and closer to mathematical lingo. The output plot is supposed to be a 2D plot with colour showing the intensity of the frequencies. My plot seems to have all frequencies the same per time. The program does make an fft per row of time for each frequency, so I suppose another way to look at it is that the same line repeats itself per step in my for loop. The issue is I have checked with the original program, which does return the correct plot, and I cannot locate any difference beyond what I named the values and how I organized the code.
function[msg] = mile01_wlt(FT_y, f_mn, f_mx, K, N, F_s)
%{
Fucntion to perform a full wlt of a morlet wavelett.
optimization of the number of frequencies to be included.
FT_y satisfies the FT(x) of 1 envelope and is our ft signal.
f min and max enter into the analysis and are decided from
the f-image for optimal values.
While performing the transformation there are different scalings
on the resulting "intensity".
Plot is made with a 2D array and a colour code for intensity.
version 05.05.2016
%}
%--------------------------------------------------------------%
%{
tableofcontents:
1: determining nr. of analysis f, prints and readies f's to be used.
2: ensuring correct orientation of FT_y
3:defining arrays
4: declaring waveletdiagram and storage of frequencies
5: for-loop over all frequencies:
6: reducing file to manageable size by truncating time.
7: marking plot to highlight ("randproblemer")
8: plotting waveletdiagram
%}
%--------------------------------------------------------------%
%1: determining nr. of analysis f, prints and readies f's to be used.
DF = floor( log(f_mx/f_mn) / log(1+( 1/(8*K) ) ) ) + 1;% f-spectre analysed
nr_f_analysed = DF %output to commandline
f_step = (f_mx/f_mn)^(1/(DF-1)); % multiplicative step for new f_a
f_a = f_mn; %[Hz] frequency of analysis
T = N/F_s; %[s] total time sampled
C = 2.0; % factor to scale Psi
%--------------------------------------------------------------%
%2: ensuring correct orientation of FT_y
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
%--------------------------------------------------------------%
%3:defining arrays
t = linspace(0, T*(N-1)/N, N); %[s] timespan
f = linspace(0, F_s*(N-1)/N, N); %[Hz] f-specter
%--------------------------------------------------------------%
%4: declaring waveletdiagram and storage of frequencies
WLd = zeros(DF,N); % matrix of DF rows and N columns for storing our wlt
f_store = zeros(1,DF); % horizontal array for storing DF frequencies
%--------------------------------------------------------------%
%5: for-loop over all frequencies:
for jj = 1:DF
o = (K/f_a)*(K/f_a); %factor sigma
Psi = exp(- 0*(f-f_a).*(f-f_a)); % FT(\psi) for 1 envelope
Psi = Psi - exp(-K*K)*exp(- o*(f.*f)); % correctional element
Psi = C*Psi; %factor. not set in stone
%next step fits 1 row in the WLd (3 alternatives)
%WLd(jj,:) = abs(ifft(Psi.*transpose(FT_y)));
WLd(jj,:) = sqrt(abs(ifft(Psi.*transpose(FT_y))));
%WLd(jj,:) = sqrt(abs(ifft(Psi.*FT_y))); %for different array sizes
%and emphasizes weaker parts.
%prep for next round
f_store (jj) = f_a; % storing used frequencies
f_a = f_a*f_step; % determines the next step
end;
%--------------------------------------------------------------%
%6: reducing file to manageable size by truncating time.
P = floor( (K*F_s) / (24*f_mx) );%24 not set in stone
using_every_P_point = P %printout to cmdline for monitoring
N_P = floor(N/P);
points_in_time = N_P %printout to cmdline for monitoring
% truncating WLd and time
WLd2 = zeros(DF,N_P);
for jj = 1:DF
for ii = 1:N_P
WLd2(jj,ii) = WLd(jj,ii*P);
end
end
t_P = zeros(1,N_P);
for ii = 1:N_P % set outside the initial loop to reduce redundancy
t_P(ii) = t(ii*P);
end
%--------------------------------------------------------------%
%7: marking plot to highlight boundary value problems
maxval = max(WLd2);%setting an intensity
mxv = max(maxval);
% marks in wl matrix
for jj= 1:DF
m = floor( K*F_s / (P*pi*f_store(jj)) ); %finding edges of envelope
WLd2(jj,m) = mxv/2; % lower limit
WLd2(jj,N_P-m) = mxv/2;% upper limit
end
%--------------------------------------------------------------%
%8: plotting waveletdiagram
figure;
imagesc(t_P, log10(f_store), WLd2, 'Ydata', [1 size(WLd2,1)]);
set(gca, 'Ydir', 'normal');
xlabel('Time [s]');
ylabel('log10(frequency [Hz])');
%title('wavelet power spectrum'); % for non-sqrt inensities
title('sqrt(wavelet power spectrum)'); %when calculating using sqrt
colorbar('location', 'southoutside');
msg = 'done.';
There are no error message, so I am uncertain what exactly I am doing wrong.
Hope I followed all the guidelines. Otherwise, I apologize.
edit:
my calling program:
% establishing parameters
N = 2^(16); % | number of points to sample
F_s = 3.2e6; % Hz | samplings frequency
T_t = N/F_s; % s | length in seconds of sample time
f_c = 2.0e5; % Hz | carrying wave frequency
f_m = 8./T_t; % Hz | modulating wave frequency
w_c = 2pif_c; % Hz | angular frequency("omega") of carrying wave
w_m = 2pif_m; % Hz | angular frequency("omega") of modulating wave
% establishing parameter arrays
t = linspace(0, T_t, N);
% function variables
T_h = 2*f_m.*t; % dimless | 1/2 of the period for square signal
% combined carry and modulated wave
% y(t) eq. 1):
y_t = 0.5.*cos(w_c.*t).*(1+cos(w_m.*t));
% y(t) eq. 2):
% y_t = 0.5.*cos(w_c.*t)+0.25*cos((w_c+w_m).*t)+0.25*cos((w_c-w_m).*t);
%square wave
sq_t = cos(w_c.*t).*(1 - mod(floor(t./T_h), 2)); % sq(t)
% the following can be exchanged between sq(t) and y(t)
plot(t, y_t)
% plot(t, sq_t)
xlabel('time [s]');
ylabel('signal amplitude');
title('plot of harmonically modulated signal with carrying wave');
% title('plot of square modulated signal with carrying wave');
figure()
hold on
% Fourier transform and plot of freq-image
FT_y = mile01_fftplot(y_t, N, F_s);
% FT_sq = mile01_fftplot(sq_t, N, F_s);
% Morlet wavelet transform and plot of WLdiagram
%determining K, check t-image
K_h = 57*4; % approximation based on 1/4 of an envelope, harmonious
%determining f min and max, from f-image
f_m = 1.995e5; % minimum frequency. chosen to showcase all relevant f
f_M = 2.005e5; % maximum frequency. chosen to showcase all relevant f
%calling wlt function.
name = 'mile'
msg = mile01_wlt(FT_y, f_m, f_M, K_h, N, F_s)
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
name = 'arnt'
msg = arnt_wltransf(FT_y, f_m, f_M, K_h, N, F_s)
The time image has a constant frequency, but the amplitude oscillates resempling a gaussian curve. My code returns a sharply segmented image over time, where each point in time holds only 1 frequency. It should reflect a change in intensity across the spectra over time.
hope that helps and thanks!
I found the error. There is a 0 rather than an o in the first instance of Psi. Thinking I'll maybe rename the value as sig or something. besides this the code works. sorry for the trouble there

3D points linear regression Matlab

I have a set of 3D points (x,y,z) and I would like to fit a straight line using Least absolute deviation method to those data.
I found a function from the internet which works pretty well with 2D data, how could I modify this to adapt 3D data points?
function B = L1LinearRegression(X,Y)
% Determine size of predictor data
[n m] = size(X);
% Initialize with least-squares fit
B = [ones(n,1) X] \ Y;
% Least squares regression
BOld = B;
BOld(1) = BOld(1) + 1e-5;
% Force divergence
% Repeat until convergence
while (max(abs(B - BOld)) > 1e-6) % Move old coefficients
BOld = B; % Calculate new observation weights (based on residuals from old coefficients)
W = sqrt(1 ./ max(abs((BOld(1) + (X * BOld(2:end))) - Y),1e-6)); % Floor to avoid division by zero
% Calculate new coefficients
B = (repmat(W,[1 m+1]) .* [ones(n,1) X]) \ (W .* Y);
end
Thank you very much!
I know that this is not answer to the question but rather to different problem leading to the question.
We can use fit function several times.
% XYZ=[x(:),y(:),z(:)]; % suppose we have data in this format
M=size(XYZ,1); % read size of our data
t=((0:M-1)/(M-1))'; % create arbitrary parameter t
% fit all coordinates as function x_i=a_i*t+b_i
fitX=fit(t,XYZ(:,1),'poly1');
fitY=fit(t,XYZ(:,2),'poly1');
fitZ=fit(t,XYZ(:,3),'poly1');
temp=[0;1]; % define the interval where the line shall be plotted
%Evaluate and plot the line coordinates
Line=[feval(fitX(temp)),feval(fitY(temp)),feval(fitZ(temp))];
plot(Line)
The advantage is that this work for any cloud, even if it is parallel to any axis. another advantage is that you are not limitted only to polynomes of 1st order, you can choose any function for different axis and fit any 3D curve.

Generating N random points with certain predefined distance between them

I have to create highway scenario in MATLAB. I have to generate random points (i.e. vehicles) on highway. By using randn() command, random points are overlapping on each other. I want to generate random points such that a minimum distance between random points is maintained.
Could anybody help me in generating this kind of scenario..
You might consider Poisson disc (a.k.a. disk) sampling. Basically, Poisson-disc sampling produces points that are tightly-packed, but no closer to each other than a specified minimum distance, resulting in a more natural pattern.
My matlab is rusty, sorry, no code, but links
http://www.cs.sandia.gov/~samitch/papers/cccg-present.pdf
https://www.jasondavies.com/poisson-disc/
This is not an elegant solution, but it satisfies your minimum distance constraint.
% Highway dimensions
lx = 1000;
ly = 1000;
% Minimum distance
d = 100;
% Number of points to generate
n = 50;
points = [rand(1, 2) .* [lx ly]];
d2 = d ^ 2;
% Keep adding points until we have n points.
while (size(points, 1) < n)
% Randomly generate a new point
point = rand(1, 2) .* [lx ly];
% Calculate squared distances to all other points
dist2 = sum((points - repmat(point, size(points, 1), 1)) .^ 2, 2);
% Only add this point if it is far enough away from all others.
if (all(dist2 > d2))
points = [points; point];
end
end
plot(points(:,1), points(:,2), 'o')

MATLAB - Plotting random points in a circle using a congruential RNG

I am aiming at plotting some random numbers in a circle using MATLAB. My code:
c = 3; p = 31; x = [7];
% generating random numbers (z) in the range [0,1) using
% congruential random number generator (multiplicative)
for i = 2:200;
x(i) = mod(c*x(i-1),p);
end;
z = x/p;
% plot unit circle
hold on;
theta = 0:pi/50:2*pi;
plot(cos(theta),sin(theta),'.');
hold off;
% plotting random points in the unit circle using in-built rand function
phi = 2*pi*rand(1,200);
r = 1*sqrt(rand(1,200));
% plotting random points using the RNG above
% phi = 2*pi*z;
% r = 1*sqrt(z);
hold on;
x = 0 + r.*cos(phi);
y = 0 + r.*sin(phi);
plot(x,y,'r*');
hold off;
clear;
The problem I am facing is that both z and rand consist of random numbers in the range [0,1). However, when I plot using rand I get the ideal result -
while z gives me a helix sorta thing -
What could be the problem?
Besides Ander's good point about the RNG there is also the problem of using z for both phi and r. Check it by using z=rand(200,1) and then creating your plot:
gives the same result as you had before. If you let z be different for both, you get "true" randomness, to some extend in your RNG. I used this RNG:
c = 991;
p = 997;
x=zeros(400,1);
x(1,1) = 7;
for ii = 2:400;
x(ii,1) = mod(c*x(ii-1),p);
end;
z = x/p;
phi2 = 2*pi*z(1:200,1);
r2 = 1*sqrt(z(201:400,1));
where I let your RNG run a bit longer and then used the first 200 for phi and the last 200 for r:
As you can see there's still some kind of swirl visible, but that's due to your RNG. The larger you pick your c and p the less that will be.
Just to show you how pretty your RNG becomes by setting c=3 and p=31 and using the full 400 range of z as above. Isn't that a great swirl?
Easy! Your random number generator is good to some excent.
Random number generators based on prime division do generally have a period. After a number of samples they repeat themselves.
In your case, try to plot(z)
You will notice that that set of numbers is periodic, and has a period of 31. Coincidence?
I THINK NOT!
Thus, remember that when you want to generate pseudorandom numbers, you need a p bigger than the amount of samples you want to generate.
For example if we choose another set of coprime numbers to generate z
c = 991; p = 997;
The plot(z) will be:
And the final plot: