How do I implement k-means clustering in a CIBR system? - matlab

I'm trying to perform content-based image retrieval (CBIR) using k-means clustering. I use the PCA function princomp() with a feature vector length of 190.
I have 500 test images in color taken from here. There's 5 categories in total. When I run my code I only get 3 clusters and the images look very different. What am I doing wrong?
Here is my code:
% construction of feature vector
set = [hsvHist autoCorrelogram Moments_Couleur meanAmplitude msEnergy OndelettesMoments];
% add name of image
dataset(k, :) = [set str2double(name)];
handlse.CaracVt = dataset.';
dlmwrite('f:/hellonewday', handlse.CaracVt);
% ACP function
[COEFF, SCORE, latent] = princomp(handlse.CaracVt());
laten = cumsum(latent)./sum(latent)
list = []; o = 1; c = 0;
for kn = 1:length(laten)
if (isempty(list))
list(o, :) = laten(kn);
o = o + 1;
else
for i = 1:length(list)
kki = abs(laten(kn) - list(i));
if (kki > 0.006)
c = c + 1;
end;
end;
if (c == length(list))
list(o, :) = laten(kn);
o = o + 1;
end;
end;
c = 0;
end;
handlse.NmbreCluster = length(list);
disp('The amount of clusters is: ');
disp(handlse.NmbreCluster);
handlse.CaracVt = handlse.CaracVt.';
mat = handlse.CaracVt;
mat(:, end) = [];
[kMeansClusters, c] = kmeans(mat, handlse.NmbreCluster);
dlmwrite('f:/clusters', kMeansClusters);
dlmwrite('f:/centres', c);
disp('kMeansClusters for each image: ');
disp(kMeansClusters);
c = c.';
ko = 1;
for i = 1:handlse.NmbreCluster
array = zeros(1, 191);
c(i, 191) = 0;
array(:, 1) = c(i);
for kp = 1:length(kMeansClusters)
if (i == kMeansClusters(kp))
ko = ko + 1;
array(ko, :) = handlse.CaracVt(kp, :);
end
end;
myArray{i} = array;
ko = 1;
end;
disp(myArray);
uisave('myArray', 'dataset');

Related

How to correct grid search?

Trying to find the optimal hyperparameters for my svm model using a grid search, but it simply returns 1 for the hyperparameters.
function evaluations = inner_kfold_trainer(C,q,k,features_xy,labels)
features_xy_flds = kdivide(features_xy, k);
labels_flds = kdivide(labels, k);
evaluations = zeros(k,3);
for i = 1:k
fprintf('Fold %i of %i\n',i,k);
train_data = cell2mat(features_xy_flds(1:end ~= i));
train_labels = cell2mat(labels_flds(1:end ~= i));
test_data = cell2mat(features_xy_flds(i));
test_labels = cell2mat(labels_flds(i));
%AU1
train_labels = train_labels(:,1);
test_labels = test_labels(:,1);
[k,~] = size(test_labels);
%train
sv = fitcsvm(train_data,train_labels, 'KernelFunction','polynomial', 'PolynomialOrder',q,'BoxConstraint',C);
sv.predict(test_data);
%Calculate evaluative measures
%svm_outputs = zeros(k,1);
sv_predictions = sv.predict(test_data);
[precision,recall,F1] = evaluation(sv_predictions,test_labels);
evaluations(i,1) = precision;
evaluations(i,2) = recall;
evaluations(i,3) = F1;
end
save('eval.mat', 'evaluations');
end
an inner-fold cross validation function
and below the grid function where something seems to be going wrong
function [q,C] = grid_search(features_xy,labels,k)
% n x n grid
n = 3;
q_grid = linspace(1,19,n);
C_grid = linspace(1,59,n);
tic
evals = zeros(n,n,3);
for i = 1:n
for j = 1:n
fprintf('## i=%i, j=%i ##\n', i, j);
svm_results = inner_kfold_trainer(C_grid(i), q_grid(j),k,features_xy,labels);
evals(i,j,:) = mean(svm_results(:,:));
% precision only
%evals(i,j,:) = max(svm_results(:,1));
toc
end
end
f = evals;
% retrieving the best value of the hyper parameters, to use in the outer
% fold
[M1,I1] = max(f);
[~,I2] = max(M1(1,1,:));
index = I1(:,:,I2);
C = C_grid(index(1))
q = q_grid(index(2))
end
When I run grid_search(features_xy,labels,8) for example, I get C=1 and q=1, for any k(the no. of folds) value. Also features_xy is a 500*98 matrix.

My approximate entropy script for MATLAB isn't working

This is my Approximate entropy Calculator in MATLAB. https://en.wikipedia.org/wiki/Approximate_entropy
I'm not sure why it isn't working. It's returning a negative value.Can anyone help me with this? R1 being the data.
FindSize = size(R1);
N = FindSize(1);
% N = input ('insert number of data values');
%if you want to put your own N in, take away the % from the line above
and
%insert the % before the N = FindSize(1)
%m = input ('insert m: integer representing length of data, embedding
dimension ');
m = 2;
%r = input ('insert r: positive real number for filtering, threshold
');
r = 0.2*std(R1);
for x1= R1(1:N-m+1,1)
D1 = pdist2(x1,x1);
C11 = (D1 <= r)/(N-m+1);
c1 = C11(1);
end
for i1 = 1:N-m+1
s1 = sum(log(c1));
end
phi1 = (s1/(N-m+1));
for x2= R1(1:N-m+2,1)
D2 = pdist2(x2,x2);
C21 = (D2 <= r)/(N-m+2);
c2 = C21(1);
end
for i2 = 1:N-m+2
s2 = sum(log(c2));
end
phi2 = (s2/(N-m+2));
Ap = phi1 - phi2;
Apen = Ap(1)
Following the documentation provided by the Wikipedia article, I developed this small function that calculates the approximate entropy:
function res = approximate_entropy(U,m,r)
N = numel(U);
res = zeros(1,2);
for i = [1 2]
off = m + i - 1;
off_N = N - off;
off_N1 = off_N + 1;
x = zeros(off_N1,off);
for j = 1:off
x(:,j) = U(j:off_N+j);
end
C = zeros(off_N1,1);
for j = 1:off_N1
dist = abs(x - repmat(x(j,:),off_N1,1));
C(j) = sum(~any((dist > r),2)) / off_N1;
end
res(i) = sum(log(C)) / off_N1;
end
res = res(1) - res(2);
end
I first tried to replicate the computation shown the article, and the result I obtain matches the result shown in the example:
U = repmat([85 80 89],1,17);
approximate_entropy(U,2,3)
ans =
-1.09965411068114e-05
Then I created another example that shows a case in which approximate entropy produces a meaningful result (the entropy of the first sample is always less than the entropy of the second one):
% starting variables...
s1 = repmat([10 20],1,10);
s1_m = mean(s1);
s1_s = std(s1);
s2_m = 0;
s2_s = 0;
% datasample will not always return a perfect M and S match
% so let's repeat this until equality is achieved...
while ((s1_m ~= s2_m) && (s1_s ~= s2_s))
s2 = datasample([10 20],20,'Replace',true,'Weights',[0.5 0.5]);
s2_m = mean(s2);
s2_s = std(s2);
end
m = 2;
r = 3;
ae1 = approximate_entropy(s1,m,r)
ae2 = approximate_entropy(s2,m,r)
ae1 =
0.00138568170752751
ae2 =
0.680090884817465
Finally, I tried with your sample data:
fid = fopen('O1.txt','r');
U = cell2mat(textscan(fid,'%f'));
fclose(fid);
m = 2;
r = 0.2 * std(U);
approximate_entropy(U,m,r)
ans =
1.08567461184858

Why do I get such a bad loss in my implementation of k-Nearest Neighbor?

I'm trying to implement k-NN in matlab. I have a matrix of 214 x's that have 9 columns of attributes with the 10th column being the label. I want to measure loss with a 0-1 function on 10 cross-validation tests. I have the following code:
function q3(file)
data = knnfile(file);
loss(data(:,1:9),'KFold',data(:,10))
losses = zeros(25,3);
new_data = data;
new_data(:,10) = [];
sdd = std(new_data);
meand = mean(new_data);
for s = 1:214
for q = 1:9
new_data(s,q) = (new_data(s,q) - meand(q)) / sdd(q);
end
end
new_data = [new_data data(:,10)];
for k = 1:25
loss1 = 0;
loss2 = 0;
for j = 0:9
index = floor(214/10)*j+1;
curd1 = data([1:index-1,index+21:end],:);
curd2 = new_data([1:index-1,index+21:end],:);
for l = 0:20
c1 = knn(curd1,k,data(index+l,:));
c2 = knn(curd2,k,new_data(index+l,:));
loss1 = loss1 + (c1 ~= data(index+l,10));
loss2 = loss2 + (c2 ~= new_data(index+l,10));
end
end
losses(k,1) = k;
losses(k,2) = 100*loss1/210;
losses(k,3) = 100*loss2/210;
end
function cluster = knn(Data,k,x)
distances = zeros(193,2);
for i = 1:size(Data,1)
row = Data(i,:);
d = norm(row(1:size(row,2)-1) - x(1:size(x,2)-1));
distances(i,:) = [d row(10)];
end
distances = sortrows(distances,1);
cluster = mode(distances(1:k,2));
I'm getting 40%+ loss with almost no correlation to k and I'm sure that something here is wrong but I'm not quite sure.
Any help would be appreciated!

Neural network input mistmatch (Iris Dataset)

I currently have an error that i can't pass this is the short code and everything needed in order to have a general idea about my problem
clear;
close all; clear ;
load fisheriris;
m = meas;
d = num2cell(m);
d(:,5) = species(:,1);
c = cvpartition(d(:,5),'kfold',10);
CeDam = cell(10,1);
CeVrem = cell(10,1);
for i=1:10
CeDam{i} = [d(test(c,i),1) d(test(c,i),2) d(test(c,i),3) d(test(c,i),4)]';
end
for i=1:10
CeVrem{i} = d(test(c,i),5)';
end
for i = 1:10
a = CeVrem{i};
[n,m] = size(a);
for j = 1:n
for k = 1:m
if isequal(a(j,k),'setosa') a{n,m} = [1 0 0];
elseif isequal(a(j,k),'versicolor') a{n,m} = [0 1 0];
else a{j,k} = [0,0,1];
end
end
end
CeVrem{i} = a;
end
net = newff(cell2mat(minmax(CeDam{1})),[3 3 3],{'logsig','logsig','logsig',},'trainlm');
net.LW{2,1} = net.LW{2,1}*0.5;
net.b{2} = net.b{2}*2;
net.performFcn = 'mse';
net.trainParam.epochs = 100;
err = 0;
i = 1;
j = 1;
while i <= 10
while j <= 10
if i~=j net = train(net,CeDam{j},CeVrem{j});
end
j=j+1;
end
end
in the train part of the algorithm it gives me an input mistmatch which is very odd for me.
The error messages:
Error using trainlm (line 109) Number of inputs does not match
net.numInputs.
Error in network/train (line 106) [net,tr] =
feval(net.trainFcn,net,X,T,Xi,Ai,EW,net.trainParam);
i managed to fix everything after much work here is the code that works for anyone having the same problem in the future gl :D :).
clear;
close all; clear ;
load fisheriris;
m = meas;
d = num2cell(m);
d(:,5) = species(:,1);
c = cvpartition(d(:,5),'kfold',10);
CeDam = cell(10,1);
CeVrem = cell(10,1);
for i=1:10
CeDam{i} = [m(test(c,i),1) m(test(c,i),2) m(test(c,i),3) m(test(c,i),4)]';
end
for i=1:10
CeVrem{i} = d(test(c,i),5);
end
for i = 1:10
a = CeVrem{i}';
[n,m] = size(a);
b = zeros(3,m);
for j = 1:n
for k = 1:m
if isequal(a(j,k),{'setosa'}) b(1,k) = 1; b(2,k) = 0; b(3,k) = 0;
elseif isequal(a(j,k),{'versicolor'}) b(1,k) = 0; b(2,k) = 1; b(3,k) = 0;
else b(1,k) = 0; b(2,k) = 0; b(3,k) = 1;
end
end
end
CC{i} = b;
end
CC = CC';
net = newff(minmax(CeDam{1}),[3 3 3],{'logsig','logsig','logsig'},'trainlm');
net.LW{2,1} = net.LW{2,1}*0.6;
net.b{2} = net.b{2}*2;
net.performFcn = 'mse';
net.trainParam.epochs = 100;
errglob = 0;
i = 1;
j = 1;
while i <= 10
while j <= 10
if i~=j net = train(net,CeDam{j},CC{j});
end
j=j+1;
end
y=sim(net,CeDam{i});
y=round(y);
e = y - CC{i};
errcur=mse(net,CC{i},y);
errglob = errglob + mse(net,CC{i},y);
fprintf('Avem o eroare de %.2f pe foldul %d \n',errcur,i)
i=i+1;
end
errglob/10
this thread can be closed thx :)
I think you got some problems with mixing up cell and array formats...
Try to replace:
net = train(net,CeDam{j},CeVrem{j});
by:
net = train(net,cell2mat(CeDam{j}),cell2mat(CeVrem{j}')');
AND: please remove your infinite loops in i, by adding i=i+1; or replace the while loops by more natural for loops, e.g.
for i = 1:10
for j = 1:10
if i~=j
net = train(net,cell2mat(CeDam{j}),cell2mat(CeVrem{j}')');
end
end
end
AND: Where are you using your i inside the loop? I guess something is missing there...

Filter points using hist in matlab

I have a vector. I want to remove outliers. I got bin and no of values in that bin. I want to remove all points based on the number of elements in each bin.
Data:
d1 =[
360.471912914169
505.084636471948
514.39429429184
505.285068055647
536.321181755858
503.025854206322
534.304229816684
393.387035881967
396.497969729985
520.592172434431
421.284713703215
420.401106087984
537.05330275495
396.715779872694
514.39429429184
404.442344469518
476.846474245118
599.020867750031
429.163139144079
514.941744277933
445.426761656729
531.013596812737
374.977332648255
364.660115724218
538.306752697753
519.042387479096
1412.54699036882
405.571202133485
516.606049132218
2289.49623498271
378.228766753667
504.730621222846
358.715764917016
462.339366699398
512.429858614816
394.778786157514
366
498.760463549388
366.552861126468
355.37022947906
358.308526273099
376.745272034036
366.934599077274
536.0901883079
483.01740134285
508.975480745389
365.629593988233
536.368800360349
557.024236456548
366.776498701866
501.007025898839
330.686029339009
508.395475983019
429.563732174866
2224.68806802212
534.655786464525
518.711297351426
534.304229816684
514.941744277933
420.32368479542
367.129404978681
525.626188464768
388.329756778952
1251.30895065927
525.626188464768
412.313764019587
513.697381733643
506.675438520558
1517.71183364959
550.276294237722
543.359917550053
500.639590923451
395.129864728041];
Histogram computation:
[nelements,centers] = hist(d1);
nelements=55 13 0 0 1 1 1 0 0 2
I want to remove all points apearing less than 5 (in nelements). It means only first 2 elements in nelements( 55, 13 ) remains.
Is there any function in matlab.
You can do it along these lines:
threshold = 5;
bin_halfwidth = (centers(2)-centers(1))/2;
keep = ~any(abs(bsxfun(#minus, d1, centers(nelements<threshold))) < bin_halfwidth , 2);
d1_keep = d1(keep);
Does this do what you want?
binwidth = centers(2)-centers(1);
centersOfRemainingBins = centers(nelements>5);
remainingvals = false(length(d1),1);
for ii = 1:length(centersOfRemainingBins )
remainingvals = remainingvals | (d1>centersOfRemainingBins (ii)-binwidth/2 & d1<centersOfRemainingBins (ii)+binwidth/2);
end
d_out = d1(remainingvals);
I don't know Matlab function for this problem, but I think, that function with follow code is what are you looking for:
sizeData = size(data);
function filter_hist = filter_hist(data, binCountRemove)
if or(max(sizeData) == 0, binCountRemove < 1)
disp('Error input!');
filter_hist = [];
return;
end
[n, c] = hist(data);
sizeN = size(n);
intervalSize = c(2) - c(1);
if sizeData(1) > sizeData(2)
temp = transpose(data);
else
temp = data;
end
for i = 1:1:max(sizeN)
if n(i) < binCountRemove
a = c(i) - intervalSize / 2;
b = c(i) + intervalSize / 2;
sizeTemp = size(temp);
removeInds = [];
k = 0;
for j = 1:1:max(sizeTemp)
if and(temp(j) > a, less_equal(temp(j), b) == 1)
k = k + 1;
removeInds(k) = j;
end
end
temp(removeInds) = [];
end
end
filter_hist = transpose(temp);
%Determines when 'a' less or equal to 'b' by accuracy
function less_equal = less_equal(a, b)
delta = 10^-6; %Accuracy
if a < b
less_equal = 1;
return;
end
if abs(b - a) < delta
less_equal = 1;
return;
end
less_equal = 0;
You can do something like this
nelements=nelements((nelements >5))