Fitting the Cluster Number Based on the Centroid

Fitting the Cluster Number Based on the Centroid - matlab

I'm working with k-means in MATLAB. Here is my code:
load cobat.txt; % read the file
k=input('Enter a number: '); % determine the number of cluster
isRand=0; % 0 -> sequeantial initialization
% 1 -> random initialization
[maxRow, maxCol]=size(cobat);
if maxRow<=k,
y=[m, 1:maxRow];
else
% initial value of centroid
if isRand,
p = randperm(size(cobat,1)); % random initialization
for i=1:k
c(i,:)=cobat(p(i),:) ;
end
else
for i=1:k
c(i,:)=cobat(i,:); % sequential initialization
end
end
temp=zeros(maxRow,1); % initialize as zero vector
u=0;
while 1,
d=DistMatrix3(cobat,c); % calculate the distance
[z,g]=min(d,[],2); % set the matrix g group
if g==temp, % if the iteration doesn't change anymore
break; % stop the iteration
else
temp=g; % copy the matrix to the temporary variable
end
for i=1:k
f=find(g==i);
if f % calculate the new centroid
c(i,:)=mean(cobat(find(g==i),:),1)
end
end
c
sort(c)
end
y=[cobat,g]
"cobat" is the file of mine. Here it looks:
65 80 55
45 75 78
36 67 66
65 78 88
79 80 72
77 85 65
76 77 79
65 67 88
85 76 88
56 76 65
"c" is the variable of centroid (the central of the cluster) per each cluster. "g" is the variable to show the cluster number. The problem is, I want to sort/fit the cluster number (small to big) based on the centroid (c). So, I try to sort(c), but it doesn't affect to the cluster number (g).
When I try to sort(g), it's sorted just not like what I want. I want the cluster number is sorted based on the centroid. Example; when I run the code with k=3, here is the final centroid
73.0000 79.0000 70.6667 %C 1
58.3333 73.3333 84.6667 %C 2
36.0000 67.0000 66.0000 %C 3
When I sort it, the number cluster is also "sorted",
36.0000 67.0000 66.0000 %C 3
58.3333 73.3333 70.6667 %C 2
73.0000 79.0000 84.6667 %C 1
I want it the number cluster is fit, like this.
36.0000 67.0000 66.0000 %C 1
58.3333 73.3333 70.6667 %C 2
73.0000 79.0000 84.6667 %C 3
It's fit, not sorted, so when this line 'y=[cobat,g]' is run, it changes too.
This seems easy, but tricky. I couldn't figure out. Anyone have any idea to solve it?
Thank you.

Use the sorted indices returned from sort or sortrow
[B,index] = sortrows( c ); % sort the centroids
g = g(index(end:-1:1)); % arrange the labels based on centroids' order

Related

Matlab - finding the values in a vector making a neighborhood chain

I have a vector that has values, say a=[10 20 42 90] and what I am trying to do is to find the neighbors in the range of 30 and replace these values with their means. For example, for the a vector, the value of 20 is a neighbor of 10. Additionally, 42 is also a neighbor of 10 through 20, because it is a neighbor's neighbor but 90 is not a neighboring value and it is not reachable from 10 with a neighborhood size of 30.
So I want to replace all 10, 20 and 42 with their means and obtain the vector a=[24 90].
If a=[10 20 42 66 155], then the resulting vector would be a=[34.5 155].
How do I achieve that?

a=[10 20 42 66 155]; % sample data
r = 30; % sample range
a = accumarray(cumsum([r+1 abs(diff(a))]>r).',a,[],#mean).';
Ungolfed and commented version:
a=[10 20 42 66 155]; % sample data
r = 30; % range
% difference between subsequent groupmembers. First difference is set to be higher than r
d = [r+1 abs(diff(a))];
% each group one label
L = cumsum(d>r);
% calculate mean of each group
a = accumarray(L.',a,[],#mean).';

Huge number of intilization a small vector without a for-loop?

I received the code to the initialization the vector A1.
Sc=[75 80 85];
Sp=[60 65 70];
C=[10 20 30 40 50 60;
11 21 31 41 51 61];
% KK=1000000;
% for k=1:KK
k=1;
A1=[];
A1=[-C(k,1)+max(60-Sc(1),0) -C(k,2)+max(60-Sc(2),0) -C(k,3)+max(60-Sc(3),0)
-C(k,4)+max(Sp(1)-60,0) -C(k,5)+max(Sp(2)-60,0) -C(k,6)+max(Sp(3)-60,0)];
%end; %KK
The code above is work, but it isn't optimal (it's very long and I need rewrite it when vector's length, n, is changed). In my task the length of A1 is even, and lies in range 6<=n<=16 typically. But I need to initialize vector A1 the huge number times, k<=10^6.
I'd like to rewrite code. My code is below.
n= size(C,2);
M=60;
%KK=1000000;
% for k=1:KK
k=1;
AU=[];AD=[];
for i=1:n
if i<=n/2
AU=[AU, -C(k,i)+max(M-Sc(i),0)];
else
AD=[AD, -C(k,i)+max(Sp(i-n/2)-M,0)];
end %if
end % i
A1=[AU; AD]
% end; % KK
Question. Is it possible to rewrite the code without for-loop? Does it make sense when the vector's length, n, is substantially less the number of initialization, k?

This is a vectorized form:
Sc=[75 80 85];
Sp=[60 65 70];
C=[10 20 30 40 50 60;
11 21 31 41 51 61];
kk=2;%1000000;
n= 6;
M=60;
[K , I] = meshgrid(1:kk,1:n);
Idx = sub2ind([kk,n], K, I);
condition = I <= n/2;
AU = -C(Idx(condition)) + max(M-Sc(I(condition)),0).';
AD = -C(Idx(~condition)) + max(Sp(I(~condition) - n/2)-M,0).';
A1 = [AU AD].'
note: I changed kk from 1000000 to 2 because number of rows of C is 2 and can not be indexed with numbers greater than 2. So this make sence if number of rows of C would be 1000000

3D histogram and conditional coloring

I have a series of ordered points (X, Y, Z) and I want to plot a 3D histogram, any suggestions?
I'm trying to do it by this tutorial http://www.mathworks.com/help/stats/hist3.html , but points are random here and presented as a function. My example is easier, since i already know the points.
Furthermore, depending on the number value of Z coordinate, i'd like to colour it differently. E.g. Max value - green, min value - red. Similar as in this case Conditional coloring of histogram graph in MATLAB, only in 3D.
So, if I have a series of points:
X = [32 64 32 12 56 76 65]
Y = [160 80 70 48 90 80 70]
Z = [80 70 90 20 45 60 12]
Can you help me with the code for 3D histogram with conditional coloring?
So far the code looks like this:
X = [32 64 32 12 56 76 65];
Y= [160 80 70 48 90 80 70];
Z= [80 70 90 20 45 60 12];
A = full( sparse(X',Y',Z'));
figure;
h = bar3(A); % get handle to graphics
for k=1:numel(h),
z=get(h(k),'ZData'); % old data - need for its NaN pattern
nn = isnan(z);
nz = kron( A(:,k),ones(6,4) ); % map color to height 6 faces per data point
nz(nn) = NaN; % used saved NaN pattern for transparent faces
set(h(k),'CData', nz); % set the new colors
end
colorbar;
Now I just have to clear the lines and design the chart to make it look useful. But how would it be possible to make a bar3 without the entire mesh on 0 level?

Based on this answer, all you need to do is rearrange your data to match the Z format of that answer. After than you might need to remove edgelines and possibly clear the zero height bars.
% Step 1: rearrange your data
X = [32 64 32 12 56 76 65];
Y= [160 80 70 48 90 80 70];
Z= [80 70 90 20 45 60 12];
A = full( sparse(X',Y',Z'));
% Step 2: Use the code from the link to plot the 3D histogram
figure;
h = bar3(A); % get handle to graphics
set(h,'edgecolor','none'); % Hopefully this will remove the lines (from https://www.mathworks.com/matlabcentral/newsreader/view_thread/281581)
for k=1:numel(h),
z=get(h(k),'ZData'); % old data - need for its NaN pattern
nn = isnan(z);
nz = kron( A(:,k),ones(6,4) ); % map color to height 6 faces per data point
nz(nn) = NaN; % used saved NaN pattern for transparent faces
nz(nz==0) = NaN; % This bit makes all the zero height bars have no colour
set(h(k),'CData', nz); % set the new colors. Note in later versions you can do h(k).CData = nz
end
colorbar;

Saving quantized coefficients to file

I am trying to read an image, DCT transform it, quantize it, and then save the quantized coefficients to a file that will be read into a fractal encoding program.
When I decode the file (with the quantized coefficients), all I get is some grey screen. Is this due to the contents of the file or am I missing out on something else?
%% LOSSY COMPRESSION-DECOMPRESSION USNIG DISCRETE COSINE TRANSFORM TECHNIQUE.
function[]=dct11(filename,n,m)
% "filename" is the string of characters including Image name and its
% extension.
% "n" denotes the number of bits per pixel.
% "m" denotes the number of most significant bits (MSB) of DCT Coefficients.
% Matrix Intializations.
N=8; % Block size for which DCT is Computed.
M=8;
I=imread('Lenna.bmp'); % Reading the input image file and storing intensity values in 2-D matrix I.
I_dim=size(I); % Finding the dimensions of the image file.
I_Trsfrm.block=zeros(N,M); % Initialising the DCT Coefficients Structure Matrix "I_Trsfrm" with the required dimensions.
Norm_Mat=[16 11 10 16 24 40 51 61 % Normalization matrix (8 X 8) used to Normalize the DCT Matrix.
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99];
save('LenaInitial.dat','I');
%% PART-1: COMPRESSION TECHNIQUE.
% Computing the Quantized & Normalized Discrete Cosine Transform.
% Y(k,l)=(2/root(NM))*c(k)*c(l)*sigma(i=0:N-1)sigma(j=0:M-1)y(i,j)cos(pi(2i+1)k/(2N))cos(pi(2j+1)l/(2M))
% where c(u)=1/root(2) if u=0
% = 1 if u>0
for a=1:I_dim(1)/N
for b=1:I_dim(2)/M
for k=1:N
for l=1:M
prod=0;
for i=1:N
for j=1:M
prod=prod+double(I(N*(a-1)+i,M*(b-1)+j))*cos(pi*(k-1)*(2*i-1)/(2*N))*cos(pi*(l-1)*(2*j-1)/(2*M));
end
end
if k==1
prod=prod*sqrt(1/N);
else
prod=prod*sqrt(2/N);
end
if l==1
prod=prod*sqrt(1/M);
else
prod=prod*sqrt(2/M);
end
I_Trsfrm(a,b).block(k,l)=prod;
end
end
% Normalizing the DCT Matrix and Quantizing the resulting values.
I_Trsfrm(a,b).block=round(I_Trsfrm(a,b).block./Norm_Mat);
% save ('LenaCompressed1.txt');
end
end
%Andrew added this
% save ('LenaCompressed.txt');
% zig-zag coding of the each 8 X 8 Block.
for a=1:I_dim(1)/N
for b=1:I_dim(2)/M
I_zigzag(a,b).block=zeros(1,0);
freq_sum=2:(N+M);
counter=1;
for i=1:length(freq_sum)
if i<=((length(freq_sum)+1)/2)
if rem(i,2)~=0
x_indices=counter:freq_sum(i)-counter;
else
x_indices=freq_sum(i)-counter:-1:counter;
end
index_len=length(x_indices);
y_indices=x_indices(index_len:-1:1); % Creating reverse of the array as "y_indices".
for p=1:index_len
if I_Trsfrm(a,b).block(x_indices(p),y_indices(p))<0
bin_eq=dec2bin(bitxor(2^n-1,abs(I_Trsfrm(a,b).block(x_indices(p),y_indices(p)))),n);
else
bin_eq=dec2bin(I_Trsfrm(a,b).block(x_indices(p),y_indices(p)),n);
end
I_zigzag(a,b).block=[I_zigzag(a,b).block,bin_eq(1:m)];
end
else
counter=counter+1;
if rem(i,2)~=0
x_indices=counter:freq_sum(i)-counter;
else
x_indices=freq_sum(i)-counter:-1:counter;
end
index_len=length(x_indices);
y_indices=x_indices(index_len:-1:1); % Creating reverse of the array as "y_indices".
for p=1:index_len
if I_Trsfrm(a,b).block(x_indices(p),y_indices(p))<0
bin_eq=dec2bin(bitxor(2^n-1,abs(I_Trsfrm(a,b).block(x_indices(p),y_indices(p)))),n);
else
bin_eq=dec2bin(I_Trsfrm(a,b).block(x_indices(p),y_indices(p)),n);
end
I_zigzag(a,b).block=[I_zigzag(a,b).block,bin_eq(1:m)];
end
end
end
end
end
% Clearing unused variables from Memory space
clear I_Trsfrm prod;
clear x_indices y_indices counter;
% Run-Length Encoding the resulting code.
for a=1:I_dim(1)/N
for b=1:I_dim(2)/M
% Computing the Count values for the corresponding symbols and
% savin them in "I_run" structure.
count=0;
run=zeros(1,0);
sym=I_zigzag(a,b).block(1);
j=1;
block_len=length(I_zigzag(a,b).block);
for i=1:block_len
if I_zigzag(a,b).block(i)==sym
count=count+1;
else
run.count(j)=count;
run.sym(j)=sym;
j=j+1;
sym=I_zigzag(a,b).block(i);
count=1;
end
if i==block_len
run.count(j)=count;
run.sym(j)=sym;
end
end
% Computing the codelength needed for the count values.
dim=length(run.count); % calculates number of symbols being encoded.
maxvalue=max(run.count); % finds the maximum count value in the count array of run structure.
codelength=log2(maxvalue)+1;
codelength=floor(codelength);
% Encoding the count values along with their symbols.
I_runcode(a,b).code=zeros(1,0);
for i=1:dim
I_runcode(a,b).code=[I_runcode(a,b).code,dec2bin(run.count(i),codelength),run.sym(i)];
end
end
end
% Saving the Compressed Code to Disk.
save ('LenaCompressed.txt','I_runcode');
% Clearing unused variables from Memory Space.
clear I_zigzag run;

Andrew, you use MATLAB's save statement to save the coefficients into a file ending with .txt. What does the "fractal encoding program" assume about the file format?
Your file will not be a text file. Check out the documentation of save. Perhaps you can use the '-ascii' flag.
Otherwise, you may have to write that file of coefficients yourself. To get you started, I'd say fprintf is worth a look.

Contour plot coloured by clustering of points matlab

I have two vectors which are paired values
size(X)=1e4 x 1; size(Y)=1e4 x 1
Is it possible to plot a contour plot of some sort making the contours by the highest density of points? Ie highest clustering=red, and then gradient colour elsewhere?
If you need more clarification please ask.
Regards,
EXAMPLE DATA:
X=[53 58 62 56 72 63 65 57 52 56 52 70 54 54 59 58 71 66 55 56];
Y=[40 33 35 37 33 36 32 36 35 33 41 35 37 31 40 41 34 33 34 37 ];
scatter(X,Y,'ro');
Thank you for everyone's help. Also remembered we can use hist3:
x={0:0.38/4:0.38}; % # How many bins in x direction
y={0:0.65/7:0.65}; % # How many bins in y direction
ncount=hist3([X Y],'Edges',[x y]);
pcolor(ncount./sum(sum(ncount)));
colorbar
Anyone know why edges in hist3 have to be cells?

This is basically a question about estimating the probability density function generating your data and then visualizing it in a good and meaningful way I'd say. To that end, I would recommend using a more smooth estimate than the histogram, for instance Parzen windowing (a generalization of the histogram method).
In my code below, I have used your example dataset, and estimated the probability density in a grid set up by the range of your data. You here have 3 variables you need to adjust to use on your original data; Borders, Sigma and stepSize.
Border = 5;
Sigma = 5;
stepSize = 1;
X=[53 58 62 56 72 63 65 57 52 56 52 70 54 54 59 58 71 66 55 56];
Y=[40 33 35 37 33 36 32 36 35 33 41 35 37 31 40 41 34 33 34 37 ];
D = [X' Y'];
N = length(X);
Xrange = [min(X)-Border max(X)+Border];
Yrange = [min(Y)-Border max(Y)+Border];
%Setup coordinate grid
[XX YY] = meshgrid(Xrange(1):stepSize:Xrange(2), Yrange(1):stepSize:Yrange(2));
YY = flipud(YY);
%Parzen parameters and function handle
pf1 = #(C1,C2) (1/N)*(1/((2*pi)*Sigma^2)).*...
exp(-( (C1(1)-C2(1))^2+ (C1(2)-C2(2))^2)/(2*Sigma^2));
PPDF1 = zeros(size(XX));
%Populate coordinate surface
[R C] = size(PPDF1);
NN = length(D);
for c=1:C
for r=1:R
for d=1:N
PPDF1(r,c) = PPDF1(r,c) + ...
pf1([XX(1,c) YY(r,1)],[D(d,1) D(d,2)]);
end
end
end
%Normalize data
m1 = max(PPDF1(:));
PPDF1 = PPDF1 / m1;
%Set up visualization
set(0,'defaulttextinterpreter','latex','DefaultAxesFontSize',20)
fig = figure(1);clf
stem3(D(:,1),D(:,2),zeros(N,1),'b.');
hold on;
%Add PDF estimates to figure
s1 = surfc(XX,YY,PPDF1);shading interp;alpha(s1,'color');
sub1=gca;
view(2)
axis([Xrange(1) Xrange(2) Yrange(1) Yrange(2)])
Note, this visualization is actually 3-dimensional:

See this 4 minute video on the mathworks site:
http://blogs.mathworks.com/videos/2010/01/22/advanced-making-a-2d-or-3d-histogram-to-visualize-data-density/
I believe this should provide very close to exactly the functionality you require.

I would divide the area the plot covers into a grid and then count the number of points in each square of the grid. Here's an example of how that could be done.
% Get random data with high density
X=randn(1e4,1);
Y=randn(1e4,1);
Xmin=min(X);
Xmax=max(X);
Ymin=min(Y);
Ymax=max(Y);
% guess of grid size, could be divided into nx and ny
n=floor((length(X))^0.25);
% Create x and y-axis
x=linspace(Xmin,Xmax,n);
y=linspace(Ymin,Ymax,n);
dx=x(2)-x(1);
dy=y(2)-y(1);
griddata=zeros(n);
for i=1:length(X)
% Calculate which bin the point is positioned in
indexX=floor((X(i)-Xmin)/dx)+1;
indexY=floor((Y(i)-Ymin)/dy)+1;
griddata(indexX,indexY)=griddata(indexX,indexY)+1;
end
contourf(x,y,griddata)
Edit: The video in the answer by Marm0t uses the same technique but probably explains it in a better way.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Fitting the Cluster Number Based on the Centroid - matlab

Use the sorted indices returned from sort or sortrow [B,index] = sortrows( c ); % sort the centroids g = g(index(end:-1:1)); % arrange the labels based on centroids' order

Related

Matlab - finding the values in a vector making a neighborhood chain

Huge number of intilization a small vector without a for-loop?

3D histogram and conditional coloring

Saving quantized coefficients to file

Contour plot coloured by clustering of points matlab

Categories

Resources