binary file write/read operation in Matlab - matlab

I am trying to write some 2D coordinates into a binary file. However, what I read from the file that had been writen is quite different from the original data. Details are given here.
For example, I have 45 (X,Y) points. Both X and Y are integral number less than 600. The simulation requires store each of them with two bytes (8 bits) and 2 upper bits of each byte is reserved (for X, the reserved bits are filled by .mrk which is 1 or 2; for Y, simply use 0 instead). In this case, 14 bits binary number is able to represent the maximum, 16383. I write the data in several ways:
in_tmp is structure consisted of points number (.nm), points reserved mark (.mrk) and points coordinates (.coor)
for i=1:in_tmp.nm
x1 = dec2bin(in_tmp.coor(i,1));
y1 = dec2bin(in_tmp.coor(i,2));
t1 = in_tm.mrk(i);
if(t1==1)
t2 = '01';
t2b = 1;
elseif(t1==2)
t2 = '10';
t2b = 2;
end
lenx = 16-length(x1);
leny = 16-length(y1);
x1hl = strcat(t2, '00000000000000'); % High and low
y1hl = '0000000000000000';
x1a = strcat(x1hl(1:lenx), num2str(x1));
y1a = strcat(y1hl(1:leny), num2str(y1));
y1a(1:2) = '00';
% x1b = in_tmp.coor(i,1);
% y1b = in_tmp.coor(i,2);
% fwrite(fp1, t2b, 'ubit2');
% fwrite(fp1, x1b, 'ubit14');
%
% fwrite(fp1, 0, 'ubit2');
% fwrite(fp1, y1b, 'ubit14');
fwrite(fp1, bin2dec(x1a), 'uint16');
fwrite(fp1, bin2dec(y1a), 'uint16');
% fwrite(fp1, bin2dec(x1a(1:8)), 'uint8');
% fwrite(fp1, bin2dec(x1a(9:end)), 'uint8');
% fwrite(fp1, bin2dec(y1a(1:8)), 'uint8');
% fwrite(fp1, bin2dec(y1a(9:end)), 'uint8');
% x1c = in_tmp.coor(i,1);
% y1c = in_tmp.coor(i,2);
%
% x1hex = dec2hex(x1c);
% y1hex = dec2hex(y1c);
% if(length(x1hex)>2)
% x1h = x1hex(1:end-2);
% x1l = x1hex(end-1:end);
% else
% x1h = dec2hex(0);
% x1l = x1hex;
% end
%
% tx1h = dec2bin(hex2dec(x1h));
% l1 = length(tx1h);
% bin0 = dec2bin(128); % '10000000'
% if(t1==1)
% bin0(end-l1+1:end) = tx1h;
% bin0(1)=0;
% bin0(2)=1;
%
% elseif(t1==2)
% bin0(end-l1+1:end) = tx1h;
% end
% x1h = bin2dec(tx1h);
%
% if(length(y1hex)>2)
% y1h = y1hex(1:end-2);
% y1l = y1hex(end-1:end);
% else
% y1h = dec2hex(0);
% y1l = y1hex;
% end
% fwrite(fp1, x1h, 'uint8');
% fwrite(fp1, hex2dec(x1l), 'uint8');
% fwrite(fp1, hex2dec(y1h), 'uint8');
% fwrite(fp1, hex2dec(y1l), 'uint8');
end
The way I read it
for i=1:mt.nm % nm points.
mred(i,6) = fread(fp1, 1, 'uint8'); % Raw X coordinates.
mred(i,7) = fread(fp1, 1, 'uint8'); % upper 2 bits are reserved info.
tmpx = [dec2bin(mred(i,6)), dec2bin(mred(i,7))];
if(length(tmpx)==16)
mred(i,4) = bin2dec(tmpx(1:2)); % Real Mark.
mred(i,1) = bin2dec(tmpx(3:end)); % Real X.
elseif(length(tmpx)==15)
mred(i,4) = bin2dec(tmpx(1)); % Real Type.
mred(i,1) = bin2dec(tmpx(2:end)); % Real X.
else
mred(i,4) = bin2dec(tmpx(1:2)); % Type unknown.
mred(i,1) = bin2dec(tmpx(3:end)); % Real X.
end
mred(i,8) = fread(fp1, 1, 'uint8'); % Y coordinates.
mred(i,9) = fread(fp1, 1, 'uint8'); % upper 2 bits are reserved.
tmpy = [dec2bin(mred(i,8)), dec2bin(mred(i,9))];
if(length(tmpy)==16)
mred(i,10) = bin2dec(tmpy(1:2)); % Real reserved.
mred(i,2) = bin2dec(tmpy(3:end)); % Real Y.
elseif(length(tmpy)==15)
mred(i,10) = bin2dec(tmpy(1)); % Real reserved.
mred(i,2) = bin2dec(tmpy(2:end)); % Real Y.
else
mred(i,10) = -1; % Reserved unknown.
mred(i,2) = bin2dec(tmpy); % Real Y.
end
end
The read() function works well for a given software which is implemented via C++. The software generates coordinates series in such a format. Then, I prepare a read() to get the information in the binary file generated by C++ software. Then, I want to implement the write() with Matlab in that format, but the read() fails to obtain what I had written to the binary file. Anybody help? Thanks.

The problem is probably (at least in part) an endian issue.
On intel architecture (little endian), reading two bytes as uint8 followed by appending the binary representations is not going to give you the same result in general as reading two bytes as uint16.
The following script illustrates how swapping the order of the two uint8's will do the trick.
I changed some variable names and made things more concise for readability.
% declare an input int and the header (in binary string rep)
v = 68;
t2 = '01';
% write to file
x1 = dec2bin(v);
lenx = 16-length(x1);
x1hl = strcat(t2, '00000000000000'); % High and low
x1a = strcat(x1hl(1:lenx), num2str(x1));
% x1a = 0100000001000100
fp1=fopen('temp','w+');
fwrite(fp1, bin2dec(x1a), 'uint16');
% read the file
frewind(fp1);
vtest = fread(fp1, 1, 'uint16'); % upper 2 bits are reserved info.
dec2bin(vtest)
% ans = 100000001000100
% now read *two* bytes as two uint8
frewind(fp1);
byte1 = fread(fp1, 1, 'uint8'); % Raw X coordinates.
byte2 = fread(fp1, 1, 'uint8'); % upper 2 bits are reserved info.
tmpx = [dec2bin(byte2) dec2bin(byte1)]; % <-- swap the order
% tmpx = 10000001000100
Alternately just read the whole 2 bytes as uint16 and work from there.

Related

How to plot very low negative values in Matlab

I want to plot the following - see below.
The reason that I want to use semilogy - or something else, maybe you have suggestions? - is that the data goes so low that the scale is such that all the positive data appears to be zero.
Of course semilogy doesn't work with negative data. But what can I do? The goal is that positive and negative data are somehow visible in the plot as different from zero.
I saw this question (Positive & Negitive Log10 Scale Y axis in Matlab), but is there a simpler way?
Another issue I have with the semilogy command is that the data are plotted as if they go from November to April, whereas they really go from January to June!
%% Date vector
Y = [];
for year = 2008:2016
Y = vertcat(Y,[year;year]);
end
M = repmat([01;07],9,1);
D = [01];
vector = datetime(Y,M,D);
%% Data
operatingValue=...
1.0e+05 *...
[0.020080000000000, 0.000010000000000, 0.000430446606112, 0.000286376498540, 0.000013493575572, 0.000008797774209;...
0.020080000000000, 0.000020000000000, 0.000586846360023, 0.000445575962649, 0.000118642085670, 0.000105982759202;...
0.020090000000000, 0.000010000000000, 0.000304503221392, 0.000168068072591, -0.000004277640797, 0.000006977580173;...
0.020090000000000, 0.000020000000000, 0.000471819542315, 0.000318827321824, 0.000165018495621, 0.000188500216550;...
0.020100000000000, 0.000010000000000, 0.000366527395452, 0.000218539902929, 0.000032265798656, 0.000038839492621;...
0.020100000000000, 0.000020000000000, 0.000318807172600, 0.000170892065948, -0.000093830970932, -0.000096575559444;...
0.020110000000000, 0.000010000000000, 0.000341114962826, 0.000187311222835, -0.000118595282218, -0.000135188693035;...
0.020110000000000, 0.000020000000000, 0.000266317725166, 0.000128625220303, -0.000314547081599, -0.000392868178754;...
0.020120000000000, 0.000010000000000, 0.000104302824558, -0.000000079359646, -0.001817533087893, -0.002027417507676;...
0.020120000000000, 0.000020000000000, 0.000093484465168, -0.000019260661622, -0.002180826237198, -0.001955577709102;...
0.020130000000000, 0.000010000000000, 0.000052921606827, -0.000175185193313, -4.034665389612666, -4.573270848282296;...
0.020130000000000, 0.000020000000000, 0.000027218083520, -0.000167098897097, 0, 0;...
0.020140000000000, 0.000010000000000, 0.000044907412504, -0.000106127286095, -0.012248660549809, -0.010693498138601;...
0.020140000000000, 0.000020000000000, 0.000061663936450, -0.000070280400096, -0.015180683545658, -0.008942771925367;...
0.020150000000000, 0.000010000000000, 0.000029214681162, -0.000190870890021, 0, 0;...
0.020150000000000, 0.000020000000000, 0.000082672707169, -0.000031566292849, -0.003226048850797, -0.003527284081616;...
0.020160000000000, 0.000010000000000, 0.000084562787728, -0.000024916156477, -0.001438488940835, -0.000954872893879;...
0.020160000000000, 0.000020000000000, 0.000178181932848, 0.000054988621755, -0.000172520970578, -0.000139835312255]
figure;
semilogy( datenum(vector), operatingValue(:,3), '-+', datenum(vector), operatingValue(:,4), '-o',...
datenum(vector), operatingValue(:,5), '-*', datenum(vector), operatingValue(:,6), '-x',...
'LineWidth',1.2 ), grid on;
dateaxis('x', 12);
Save the function symlog in your directory.
function symlog(varargin)
% SYMLOG bi-symmetric logarithmic axes scaling
% SYMLOG applies a modified logarithm scale to the specified or current
% axes that handles negative values while maintaining continuity across
% zero. The transformation is defined in an article from the journal
% Measurement Science and Technology (Webber, 2012):
%
% y = sign(x)*(log10(1+abs(x)/(10^C)))
%
% where the scaling constant C determines the resolution of the data
% around zero. The smallest order of magnitude shown on either side of
% zero will be 10^ceil(C).
%
% SYMLOG(ax=gca, var='xyz', C=0) applies this scaling to the axes named
% by letter in the specified axes using the default C of zero. Any of the
% inputs can be ommitted in which case the default values will be used.
%
% SYMLOG uses the UserData attribute of the specified axes to record the
% current transformation applied so that subsequent calls to symlog
% operate on the original data rather than the newly transformed data.
%
% Example:
% x = linspace(-50,50,1e4+1);
% y1 = x;
% y2 = sin(x);
%
% subplot(2,4,1)
% plot(x,y1,x,y2)
%
% subplot(2,4,2)
% plot(x,y1,x,y2)
% set(gca,'XScale','log') % throws warning
%
% subplot(2,4,3)
% plot(x,y1,x,y2)
% set(gca,'YScale','log') % throws warning
%
% subplot(2,4,4)
% plot(x,y1,x,y2)
% set(gca,'XScale','log','YScale','log') % throws warning
%
% subplot(2,4,6)
% plot(x,y1,x,y2)
% symlog('x')
%
% s = subplot(2,4,7);
% plot(x,y1,x,y2)
% symlog(s,'y') % can but don't have to provide s.
%
% subplot(2,4,8)
% plot(x,y1,x,y2)
% symlog() % no harm in letting symlog operate in z axis, too.
%
% Created by:
% Robert Perrotta
%
% Referencing:
% Webber, J. Beau W. "A Bi-Symmetric Log Transformation for Wide-Range
% Data." Measurement Science and Technology 24.2 (2012): 027001.
% Retrieved 6/28/2016 from
% https://kar.kent.ac.uk/32810/2/2012_Bi-symmetric-log-transformation_v5.pdf
% default values
ax = []; % don't call gca unless needed
var = 'xyz';
C = 0;
% user-specified values
for ii = 1:length(varargin)
switch class(varargin{ii})
case 'matlab.graphics.axis.Axes'
ax = varargin{ii};
case 'char'
var = varargin{ii};
case {'double','single'}
C = varargin{ii};
otherwise
error('Don''t know what to do with input %d (type %s)!',ii,class(varargin{ii}))
end
end
if isempty(ax) % user did not specify a value
ax = gca;
end
% execute once per axis
if length(var) > 1
for ii = 1:length(var)
symlog(ax,var(ii),C);
end
return
end
% From here on we redefine C to be 10^C
C = 10^C;
% Axes must be in linear scaling
set(ax,[var,'Scale'],'linear')
% Check for existing transformation
userdata = get(ax,'UserData');
if isfield(userdata,'symlog') && isfield(userdata.symlog,lower(var))
lastC = userdata.symlog.(lower(var));
else
lastC = [];
end
userdata.symlog.(lower(var)) = C; % update with new value
set(ax,'UserData',userdata)
if strcmpi(get(ax,[var,'LimMode']),'manual')
lim = get(ax,[var,'Lim']);
lim = sign(lim).*log10(1+abs(lim)/C);
set(ax,[var,'Lim'],lim)
end
% transform all objects in this plot into logarithmic coordiates
transform_graph_objects(ax, var, C, lastC);
% transform axes labels to match
t0 = max(abs(get(ax,[var,'Lim']))); % MATLAB's automatically-chosen limits
t0 = sign(t0)*C*(10.^(abs(t0))-1);
t0 = sign(t0).*log10(abs(t0));
t0 = ceil(log10(C)):ceil(t0); % use C to determine lowest resolution
t1 = 10.^t0;
mt1 = nan(1,8*(length(t1))); % 8 minor ticks between each tick
for ii = 1:length(t0)
scale = t1(ii)/10;
mt1(8*(ii-1)+(1:8)) = t1(ii) - (8:-1:1)*scale;
end
% mirror over zero to get the negative ticks
t0 = [fliplr(t0),-inf,t0];
t1 = [-fliplr(t1),0,t1];
mt1 = [-fliplr(mt1),mt1];
% the location of our ticks in the transformed space
t1 = sign(t1).*log10(1+abs(t1)/C);
mt1 = sign(mt1).*log10(1+abs(mt1)/C);
lbl = cell(size(t0));
for ii = 1:length(t0)
if t1(ii) == 0
lbl{ii} = '0';
% uncomment to display +/- 10^0 as +/- 1
% elseif t0(ii) == 0
% if t1(ii) < 0
% lbl{ii} = '-1';
% else
% lbl{ii} = '1';
% end
elseif t1(ii) < 0
lbl{ii} = ['-10^{',num2str(t0(ii)),'}'];
elseif t1(ii) > 0
lbl{ii} = ['10^{',num2str(t0(ii)),'}'];
else
lbl{ii} = '0';
end
end
set(ax,[var,'Tick'],t1,[var,'TickLabel'],lbl)
set(ax,[var,'MinorTick'],'on',[var,'MinorGrid'],'on')
rl = get(ax,[var,'Ruler']);
try
set(rl,'MinorTick',mt1)
catch err
if strcmp(err.identifier,'MATLAB:datatypes:onoffboolean:IncorrectValue')
set(rl,'MinorTickValues',mt1)
else
rethrow(err)
end
end
function transform_graph_objects(ax, var, C, lastC)
% transform all lines in this plot
lines = findobj(ax,'Type','line');
for ii = 1:length(lines)
x = get(lines(ii),[var,'Data']);
if ~isempty(lastC) % undo previous transformation
x = sign(x).*lastC.*(10.^abs(x)-1);
end
x = sign(x).*log10(1+abs(x)/C);
set(lines(ii),[var,'Data'],x)
end
% transform all Patches in this plot
patches = findobj(ax,'Type','Patch');
for ii = 1:length(patches)
x = get(patches(ii),[var,'Data']);
if ~isempty(lastC) % undo previous transformation
x = sign(x).*lastC.*(10.^abs(x)-1);
end
x = sign(x).*log10(1+abs(x)/C);
set(patches(ii),[var,'Data'],x)
end
% transform all Retangles in this plot
rectangles = findobj(ax,'Type','Rectangle');
for ii = 1:length(rectangles)
q = get(rectangles(ii),'Position'); % [x y w h]
switch var
case 'x'
x = [q(1) q(1)+q(3)]; % [x x+w]
case 'y'
x = [q(2) q(2)+q(4)]; % [y y+h]
end
if ~isempty(lastC) % undo previous transformation
x = sign(x).*lastC.*(10.^abs(x)-1);
end
x = sign(x).*log10(1+abs(x)/C);
switch var
case 'x'
q(1) = x(1);
q(3) = x(2)-x(1);
case 'y'
q(2) = x(1);
q(4) = x(2)-x(1);
end
set(rectangles(ii),'Position',q)
end
Plot your functions including symlog(gca,'y',-1.7) in the end:
plot( datenum(vector), operatingValue(:,3), '-+', datenum(vector), operatingValue(:,4), '-o',...
datenum(vector), operatingValue(:,5), '-*', datenum(vector), operatingValue(:,6), '-x',...
'LineWidth',1.2 ), grid on;
symlog(gca,'y',-1.7)
Here is your plot with positive and negative values:
Hope this solves your problem.

Reverse 'buffer' function in Matlab

MATLAB's buffer function partitions a vector into a matrix where each column is a segment of the vector (time series in my problem). These segments can be overlapping, and the overlap does not need to be 50%.
I was wondering if there is a reverse operation where one would get back a vector after doing some operations on the matrix? I was thinking of a generic solution where the overlap is not 50%.
I have searched the question archive and couldn't find any answer.
Thanks
You can use this simple function I wrote. There is also a simple example commented that you can run and test.
function invbuff = invbuffer(X_buff0, noverlap, L)
% Example:
% % % A = (1:40)';
% % % N_over = 2;
% % % N_window = 15;
% % % L = length(A);
% % % Abuff0 = buffer(A, N_window, N_over);
% % % Abuff = Abuff0(:, 1:end-0);
% % % invbuff = invbuffer(Abuff, N_over, L);
invbuff0 = [];
for jj=1:size(X_buff0,2)
vec00 = X_buff0(:,jj);
vec00(1:noverlap) = []; % remove overlapping (or it is zero padding of first frame)
invbuff0 = [invbuff0; vec00];
end
invbuff = invbuff0;
invbuff(L+1:end) = []; % remove zero padding of last frame
% sum(sum([A - invbuff])); % == 0
end
Good luck!

Segmenting cursive character (Arabic OCR)

I want to segment an Arabic word into single characters. Based on the histogram/profile, I assume that I can do the segmentation process by cut/segment the characters based on it's baseline (it have similar pixel values).
But, unfortunately, I still stuck to build the appropriate code, to make it works.
% Original Code by Soumyadeep Sinha
% Saving each single segmented character as one file
function [segm] = trysegment (a)
myFolder = 'D:\1. Thesis FINISH!!!\Data set\trial';
level = graythresh (a);
bw = im2bw (a, level);
b = imcomplement (bw);
i= padarray(b,[0 10]);
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Trying Segmentation for Cursive', 'NumberTitle', 'Off')
subplot(2, 2, 1);imshow(i);
subplot(2,2,3);
plot(verticalProjection, 'b-'); %histogram show by this code
% hist(reshape(input,[],3),1:max(input(:)));
grid on;
% % t = verticalProjection;
% % t(t==0) = inf;
% % mayukh = min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
% Extract each region
y=1;
for k = 1 : length(startingColumns)
% Get sub image of just one character...
subImage = i(:, startingColumns(k):endingColumns(k));
% se = strel('rectangle',[2 4]);
% dil = imdilate(subImage, se);
th = bwmorph(subImage,'thin',Inf);
n = imresize (th, [64 NaN], 'bilinear');
figure, imshow (n);
[L,num] = bwlabeln(n);
for z= 1 : num
bw= ismember(L, z);
% Construct filename for this particular image.
baseFileName = sprintf('char %d.png', y);
y=y+1;
% Prepend the folder to make the full file name.
fullFileName = fullfile(myFolder, baseFileName);
% Do the write to disk.
imwrite(bw, fullFileName);
% subplot(2,2,4);
% pause(2);
% imshow(bw);
end
% y=y+1;
end;
segm = (n);
Word image is as follow:
Why the code isn't work?
do you have any recommendation of another codes?
or suggested algorithm to make it works, to do a good segmentation on cursive character?
Thanks before.
Replace this code part from the posted code
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
with the new code part
threshold=max(verticalProjection)/3;
thresholdedProjection=verticalProjection > threshold;
count=0;
startingColumnsIndex=0;
for i=1:length(thresholdedProjection)
if thresholdedProjection(i)
if(count>0)
startingColumnsIndex=startingColumnsIndex+1;
startingColumns(startingColumnsIndex)= i-floor(count/2);
count=0;
end
else
count=count+1;
end
end
endingColumns=[startingColumns(2:end)-1 i-floor(count/2)];
No changes needed for the rest of the code.

phase assignment in an OFDM signal

I have an OFDM signal which is giving me half the power spectrum (half the bandwidth) I am meant to have. I am been told the phase assignment is what is causing it but I have been twitching on it for days.... still not having the right answer
prp=1e-6;
fstep=1/prp;
M = 4; % QPSK signal constellation
k = log2(M); % bits per symbol
fs=4e9;
Ns=floor(prp*fs);
no_of_data_points = (Ns/2);
no_of_points=no_of_data_points;
no_of_ifft_points = (Ns); % 256 points for the FFT/IFFT
no_of_fft_points = (Ns);
nsamp = 1; % Oversampling rate
fl = 0.5e9;
fu = 1.5e9;
Nf=(fu-fl)/fstep;
phin=zeros(Nf,1);
dataIn = randint(no_of_data_points*k*2,1,2); % Generate vector of binary
data_source = randsrc(1, no_of_data_points*k*2, 0:M-1);
qpsk_modulated_data= modulate(modem.qammod(M),data_source);
modu_data= qpsk_modulated_data(:)/sqrt(2);
[theta, rho] = cart2pol(real(modu_data), imag(modu_data));
A=angle(modu_data);
theta=radtodeg(theta);
figure(3);
plot(modu_data,'o');%plot constellation without noise
axis([-2 2 -2 2]);
grid on;
xlabel('real'); ylabel('imag');
%% E:GENERTION
phin = zeros(Nf,1);
phin(1:Nf,1)=theta(1:Nf);
No = fl/fstep;
Vn = zeros(Ns,1);
for r = 1:Nf
Vn(r+No,1) = 1*phin(r,1);
% Vn(r+No,2) = 1*phin(r,2);
end
%%
%------------------------------------------------------
%E. Serial to parallel conversion
%------------------------------------------------------
par_data = reshape(Vn,2,no_of_data_points);
%%
% F. IFFT Transform each period's spectrum (represented by a row of
% time domain via IFFT
time_domain_matrix =ifft(par_data.',Ns);
You are only considering the real part of the signal.

Regarding visualization of movement of the data points in training of the Self-Organizing Map (SOM) using Simulink

I have implemented the Self-Organizing Map(SOM) algorithm in MATLAB. Suppose each of the data points are represented in 2-dimensional space. The problem is that I want to visualize the movement of each of the data points in the training phase i.e. I want to see how the points are moving and eventually forming clusters as the algorithm is in progress say at every fix duration. I believe that this can be done through Simulation in MATLAB,but I don't know how to incorporate my MATLAB code for visualization?
I developed a code example to visualize clustering data with multiple dimensions using all possible data projection in 2-D. It may not be the best idea for visualization (there are techniques developed for this, as SOM itself may be used for this need), specially for a higher dimension numbers, but when the number of possible projections (n-1)! is not that high it is a quite good visualizer.
Cluster Algorithm 
Since I needed access to the code so that I could save the cluster means and cluster labels for each iteration, I used a fast kmeans algorithm available at FEX by Mo Chen, but I had to adapt it so I could have this access. The adapted code is the following:
function [label,m] = litekmeans(X, k)
% Perform k-means clustering.
% X: d x n data matrix
% k: number of seeds
% Written by Michael Chen (sth4nth#gmail.com).
n = size(X,2);
last = 0;
iter = 1;
label{iter} = ceil(k*rand(1,n)); % random initialization
checkLabel = label{iter};
m = {};
while any(checkLabel ~= last)
[u,~,checkLabel] = unique(checkLabel); % remove empty clusters
k = length(u);
E = sparse(1:n,checkLabel,1,n,k,n); % transform label into indicator matrix
curM = X*(E*spdiags(1./sum(E,1)',0,k,k)); % compute m of each cluster
m{iter} = curM;
last = checkLabel';
[~,checkLabel] = max(bsxfun(#minus,curM'*X,dot(curM,curM,1)'/2),[],1); % assign samples to the nearest centers
iter = iter + 1;
label{iter} = checkLabel;
end
% Get last clusters centers
m{iter} = curM;
% If to remove empty clusters:
%for k=1:iter
% [~,~,label{k}] = unique(label{k});
%end
Gif Creation
I also used #Amro's Matlab video tutorial for the gif creation.
Distinguishable Colors
I used this great FEX by Tim Holy for making the cluster colors easier to distinguish.
Resulting code
My resulting code is as follows. I had some issues because the number of clusters would change for each iteration which would cause scatter plot update to delete all cluster centers without giving any errors. Since I didn't noticed that, I was trying to workaround the scatter function with any obscure method that I could find the web (btw, I found a really nice scatter plot alternative here), but fortunately I got what was happening going back to this today. Here is the code I did for it, you may feel free to use it, adapt it, but please keep my reference if you use it.
function varargout=kmeans_test(data,nClusters,plotOpts,dimLabels,...
bigXDim,bigYDim,gifName)
%
% [label,m,figH,handles]=kmeans_test(data,nClusters,plotOpts,...
% dimLabels,bigXDim,bigYDim,gifName)
% Demonstrate kmeans algorithm iterative progress. Inputs are:
%
% -> data (rand(5,100)): the data to use.
%
% -> nClusters (7): number of clusters to use.
%
% -> plotOpts: struct holding the following fields:
%
% o leftBase: the percentage distance from the left
%
% o rightBase: the percentage distance from the right
%
% o bottomBase: the percentage distance from the bottom
%
% o topBase: the percentage distance from the top
%
% o FontSize: FontSize for axes labels.
%
% o widthUsableArea: Total width occupied by axes
%
% o heigthUsableArea: Total heigth occupied by axes
%
% -> bigXDim (1): the big subplot x dimension
%
% -> bigYDim (2): the big subplot y dimension
%
% -> dimLabels: If you want to specify dimensions labels
%
% -> gifName: gif file name to save
%
% Outputs are:
%
% -> label: Sample cluster center number for each iteration
%
% -> m: cluster center mean for each iteration
%
% -> figH: figure handle
%
% -> handles: axes handles
%
%
% - Creation Date: Fri, 13 Sep 2013
% - Last Modified: Mon, 16 Sep 2013
% - Author(s):
% - W.S.Freund <wsfreund_at_gmail_dot_com>
%
% TODO List (?):
%
% - Use input parser
% - Adapt it to be able to cluster any algorithm function.
% - Use arrows indicating cluster centers movement before moving them.
% - Drag and drop small axes to big axes.
%
% Pre-start
if nargin < 7
gifName = 'kmeansClusterization.gif';
if nargin < 6
bigYDim = 2;
if nargin < 5
bigXDim = 1;
if nargin < 4
nDim = size(data,1);
maxDigits = numel(num2str(nDim));
dimLabels = mat2cell(sprintf(['Dim %0' num2str(maxDigits) 'd'],...
1:nDim),1,zeros(1,nDim)+4+maxDigits);
if nargin < 3
plotOpts = struct('leftBase',.05,'rightBase',.02,...
'bottomBase',.05,'topBase',.02,'FontSize',10,...
'widthUsableArea',.87,'heigthUsableArea',.87);
if nargin < 2
nClusters = 7;
if nargin < 1
center1 = [1; 0; 0; 0; 0];
center2 = [0; 1; 0; 0; 0];
center3 = [0; 0; 1; 0; 0];
center4 = [0; 0; 0; 1; 0];
center5 = [0; 0; 0; 0; 1];
center6 = [0; 0; 0; 0; 1.5];
center7 = [0; 0; 0; 1.5; 1];
data = [...
bsxfun(#plus,center1,.5*rand(5,20)) ...
bsxfun(#plus,center2,.5*rand(5,20)) ...
bsxfun(#plus,center3,.5*rand(5,20)) ...
bsxfun(#plus,center4,.5*rand(5,20)) ...
bsxfun(#plus,center5,.5*rand(5,20)) ...
bsxfun(#plus,center6,.2*rand(5,20)) ...
bsxfun(#plus,center7,.2*rand(5,20)) ...
];
end
end
end
end
end
end
end
% NOTE of advice: It seems that Matlab does not test while on
% refreshdata if the dimension of the inputs are equivalent for the
% XData, YData and CData while using scatter. Because of this I wasted
% a lot of time trying to debug what was the problem, trying many
% workaround since my cluster centers would disappear for no reason.
% Draw axes:
nDim = size(data,1);
figH = figure;
set(figH,'Units', 'normalized', 'Position',...
[0, 0, 1, 1],'Color','w','Name',...
'k-means example','NumberTitle','Off',...
'MenuBar','none','Toolbar','figure',...
'Renderer','zbuffer');
% Create dintinguishable colors matrix:
colorMatrix = distinguishable_colors(nClusters);
% Create axes, deploy them on handles matrix more or less how they
% will be positioned:
[handles,horSpace,vertSpace] = ...
createAxesGrid(5,5,plotOpts,dimLabels);
% Add main axes
bigSubSize = ceil(nDim/2);
bigSubVec(bigSubSize^2) = 0;
for k = 0:nDim-bigSubSize
bigSubVec(k*bigSubSize+1:(k+1)*bigSubSize) = ...
... %(nDim-bigSubSize+k)*nDim+1:(nDim-bigSubSize+k)*nDim+(nDim-bigSubSize+1);
bigSubSize+nDim*k:nDim*(k+1);
end
handles(bigSubSize,bigSubSize) = subplot(nDim,nDim,bigSubVec,...
'FontSize',plotOpts.FontSize,'box','on');
bigSubplotH = handles(bigSubSize,bigSubSize);
horSpace(bigSubSize,bigSubSize) = bigSubSize;
vertSpace(bigSubSize,bigSubSize) = bigSubSize;
set(bigSubplotH,'NextPlot','add',...
'FontSize',plotOpts.FontSize,'box','on',...
'XAxisLocation','top','YAxisLocation','right');
% Squeeze axes through space to optimize space usage and improve
% visualization capability:
[leftPos,botPos,subplotWidth,subplotHeight]=setCustomPlotArea(...
handles,plotOpts,horSpace,vertSpace);
pColorAxes = axes('Position',[leftPos(end) botPos(end) ...
subplotWidth subplotHeight],'Parent',figH);
pcolor([1:nClusters+1;1:nClusters+1])
% image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
% check if the upcoming buggy behaviour would be fixed. I was not
% lucky, though...
colormap(pColorAxes,colorMatrix);
% Change XTick positions to its center:
set(pColorAxes,'XTick',.5:1:nClusters+.5);
set(pColorAxes,'YTick',[]);
% Change its label to cluster number:
set(pColorAxes,'XTickLabel',[nClusters 1:nClusters-1]); % FIXME At
% least on my matlab I have to use this buggy way to set XTickLabel.
% Am I doing something wrong? Since I dunno why this is caused, I just
% change the code so that it looks the way it should look, but this is
% quite strange...
xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);
% Now iterate throw data and get cluster information:
[label,m]=litekmeans(data,nClusters);
nIters = numel(m)-1;
scatterColors = colorMatrix(label{1},:);
annH = annotation('textbox',[leftPos(1),botPos(1) subplotWidth ...
subplotHeight],'String',sprintf('Start Conditions'),'EdgeColor',...
'none','FontSize',18);
% Creates dimData_%d variables for first iteration:
for curDim=1:nDim
curDimVarName = genvarname(sprintf('dimData_%d',curDim));
eval([curDimVarName,'= m{1}(curDim,:);']);
end
% clusterColors will hold the colors for the total number of clusters
% on each iteration:
clusterColors = colorMatrix;
% Draw cluster information for first iteration:
for curColumn=1:nDim
for curLine=curColumn+1:nDim
% Big subplot data:
if curColumn == bigXDim && curLine == bigYDim
curAxes = handles(bigSubSize,bigSubSize);
curScatter = scatter(curAxes,data(curColumn,:),...
data(curLine,:),16,'filled');
set(curScatter,'CDataSource','scatterColors');
% Draw cluster centers
curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
curLineVarName = genvarname(sprintf('dimData_%d',curLine));
eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ...
curLineVarName ',100,colorMatrix,''^'',''filled'');']);
set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
curLineVarName,'CDataSource','clusterColors')
end
% Small subplots data:
curAxes = handles(curLine,curColumn);
% Draw data:
curScatter = scatter(curAxes,data(curColumn,:),...
data(curLine,:),16,'filled');
set(curScatter,'CDataSource','scatterColors');
% Draw cluster centers
curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
curLineVarName = genvarname(sprintf('dimData_%d',curLine));
eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ...
curLineVarName ',100,colorMatrix,''^'',''filled'');']);
set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
curLineVarName,'CDataSource','clusterColors');
if curLine==nDim
xlabel(curAxes,dimLabels{curColumn});
set(curAxes,'XTick',xlim(curAxes));
end
if curColumn==1
ylabel(curAxes,dimLabels{curLine});
set(curAxes,'YTick',ylim(curAxes));
end
end
end
refreshdata(figH,'caller');
% Preallocate gif frame. From Amro's tutorial here:
% https://stackoverflow.com/a/11054155/1162884
f = getframe(figH);
[f,map] = rgb2ind(f.cdata, 256, 'nodither');
mov = repmat(f, [1 1 1 nIters+4]);
% Add one frame at start conditions:
curFrame = 1;
% Add three frames without movement at start conditions
f = getframe(figH);
mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
for curIter = 1:nIters
curFrame = curFrame+1;
% Change label text
set(annH,'String',sprintf('Iteration %d',curIter));
% Update cluster point colors
scatterColors = colorMatrix(label{curIter+1},:);
% Update cluster centers:
for curDim=1:nDim
curDimVarName = genvarname(sprintf('dimData_%d',curDim));
eval([curDimVarName,'= m{curIter+1}(curDim,:);']);
end
% Update cluster colors:
nClusterIter = size(m{curIter+1},2);
clusterColors = colorMatrix(1:nClusterIter,:);
% Update graphics:
refreshdata(figH,'caller');
% Update cluster colors:
if nClusterIter~=size(m{curIter},2) % If number of cluster
% of current iteration differs from previous iteration (or start
% conditions in case we are at first iteration) we redraw colors:
pcolor([1:nClusterIter+1;1:nClusterIter+1])
% image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
% check if the upcomming buggy behaviour would be fixed. I was not
% lucky, though...
colormap(pColorAxes,clusterColors);
% Change XTick positions to its center:
set(pColorAxes,'XTick',.5:1:nClusterIter+.5);
set(pColorAxes,'YTick',[]);
% Change its label to cluster number:
set(pColorAxes,'XTickLabel',[nClusterIter 1:nClusterIter-1]);
xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);
end
f = getframe(figH);
mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end
set(annH,'String','Convergence Conditions');
for curFrame = nIters+1:nIters+3
% Add three frames without movement at start conditions
f = getframe(figH);
mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end
imwrite(mov, map, gifName, 'DelayTime',.5, 'LoopCount',inf)
varargout = cell(1,nargout);
if nargout > 0
varargout{1} = label;
if nargout > 1
varargout{2} = m;
if nargout > 2
varargout{3} = figH;
if nargout > 3
varargout{4} = handles;
end
end
end
end
end
function [leftPos,botPos,subplotWidth,subplotHeight] = ...
setCustomPlotArea(handles,plotOpts,horSpace,vertSpace)
%
% -> handles: axes handles
%
% -> plotOpts: struct holding the following fields:
%
% o leftBase: the percentage distance from the left
%
% o rightBase: the percentage distance from the right
%
% o bottomBase: the percentage distance from the bottom
%
% o topBase: the percentage distance from the top
%
% o widthUsableArea: Total width occupied by axes
%
% o heigthUsableArea: Total heigth occupied by axes
%
% -> horSpace: the axes units size (integers only) that current axes
% should occupy in the horizontal (considering that other occupied
% axes handles are empty)
%
% -> vertSpace: the axes units size (integers only) that current axes
% should occupy in the vertical (considering that other occupied
% axes handles are empty)
%
nHorSubPlot = size(handles,1);
nVertSubPlot = size(handles,2);
if nargin < 4
horSpace(nHorSubPlot,nVertSubPlot) = 0;
horSpace = horSpace+1;
if nargin < 3
vertSpace(nHorSubPlot,nVertSubPlot) = 0;
vertSpace = vertSpace+1;
end
end
subplotWidth = plotOpts.widthUsableArea/nHorSubPlot;
subplotHeight = plotOpts.heigthUsableArea/nVertSubPlot;
totalWidth = (1-plotOpts.rightBase) - plotOpts.leftBase;
totalHeight = (1-plotOpts.topBase) - plotOpts.bottomBase;
gapHeigthSpace = (totalHeight - ...
plotOpts.heigthUsableArea)/(nVertSubPlot);
gapWidthSpace = (totalWidth - ...
plotOpts.widthUsableArea)/(nHorSubPlot);
botPos(nVertSubPlot) = plotOpts.bottomBase + gapWidthSpace/2;
leftPos(1) = plotOpts.leftBase + gapHeigthSpace/2;
botPos(nVertSubPlot-1:-1:1) = botPos(nVertSubPlot) + (subplotHeight +...
gapHeigthSpace)*(1:nVertSubPlot-1);
leftPos(2:nHorSubPlot) = leftPos(1) + (subplotWidth +...
gapWidthSpace)*(1:nHorSubPlot-1);
for curLine=1:nHorSubPlot
for curColumn=1:nVertSubPlot
if handles(curLine,curColumn)
set(handles(curLine,curColumn),'Position',[leftPos(curColumn)...
botPos(curLine) horSpace(curLine,curColumn)*subplotWidth ...
vertSpace(curLine,curColumn)*subplotHeight]);
end
end
end
end
function [handles,horSpace,vertSpace] = ...
createAxesGrid(nLines,nColumns,plotOpts,dimLabels)
handles = zeros(nLines,nColumns);
% Those hold the axes size units:
horSpace(nLines,nColumns) = 0;
vertSpace(nLines,nColumns) = 0;
for curColumn=1:nColumns
for curLine=curColumn+1:nLines
handles(curLine,curColumn) = subplot(nLines,...
nColumns,curColumn+(curLine-1)*nColumns);
horSpace(curLine,curColumn) = 1;
vertSpace(curLine,curColumn) = 1;
curAxes = handles(curLine,curColumn);
if feature('UseHG2')
colormap(handle(curAxes),colorMatrix);
end
set(curAxes,'NextPlot','add',...
'FontSize',plotOpts.FontSize,'box','on');
if curLine==nLines
xlabel(curAxes,dimLabels{curColumn});
else
set(curAxes,'XTick',[]);
end
if curColumn==1
ylabel(curAxes,dimLabels{curLine});
else
set(curAxes,'YTick',[]);
end
end
end
end
Example
Here is an example using 5 dimensions, using the code:
center1 = [1; 0; 0; 0; 0];
center2 = [0; 1; 0; 0; 0];
center3 = [0; 0; 1; 0; 0];
center4 = [0; 0; 0; 1; 0];
center5 = [0; 0; 0; 0; 1];
center6 = [0; 0; 0; 0; 1.5];
center7 = [0; 0; 0; 1.5; 1];
data = [...
bsxfun(#plus,center1,.5*rand(5,20)) ...
bsxfun(#plus,center2,.5*rand(5,20)) ...
bsxfun(#plus,center3,.5*rand(5,20)) ...
bsxfun(#plus,center4,.5*rand(5,20)) ...
bsxfun(#plus,center5,.5*rand(5,20)) ...
bsxfun(#plus,center6,.2*rand(5,20)) ...
bsxfun(#plus,center7,.2*rand(5,20)) ...
];
[label,m,figH,handles]=kmeans_test(data,20);