I have two Xval(Predicted values) and Sv(validation test) matrices, one with the classifier output data and the other with the validation data for the same samples. Each column represents the predicted value, eg [0 0 1 0 0 0 0 0 0 0] represents digit 3 (1 in the digit that is). I would like to know if it is possible to calculate the confusion matrix in a vectorized way or with a built in function, the sizes of both matrices are 12000x10. The code who generates both matrices are this
load data;
load test;
[N, m] = size(X);
X = [ones(N, 1) X];
[Nt, mt] = size(Xt);
Xt = [ones(Nt, 1) Xt];
new_order = randperm(N);
X = X(new_order,: );
S = S(new_order,: );
part = 0.8;
Xtr = X(1: (part * N),: );
Xv = X((part * N + 1): N,: );
Str = S(1: (part * N),: );
Sv = S((part * N + 1): N,: );
v_c = [];
v_tx_acerto = [];
tx_acerto_max = 0;
c = 250;
w = (X'*X+c*eye(m+1))\X' * S;
Xval = Xv*w;
for i=1:12000
aux = Xval(i,:);
aux(aux == max(aux)) = 1;
aux(aux<1) = 0;
Xval(i,:) = aux;
end
There are build-in functions confusionmat or plotconfusion. But if you want to have full control, you can just write a simple function yourself, e.g:
function [CMat_rel,CMat_abs] = ConfusionMatrix(Cprd,Cact)
Cprd_uq = unique(Cprd);
Cact_uq = unique(Cact);
NumPrd = length(Cprd_uq);
NumAct = length(Cact_uq);
% assert(NumPrd == NumAct)
% allocate memory
CMat_abs = NaN(NumPrd,NumAct);
CMat_rel = NaN(NumPrd,NumAct);
for j = 1:NumAct
lgAct = Cact == Cact_uq(j);
SumAct = sum(lgAct);
for i = 1:NumAct
lgPrd = Cprd == Cact_uq(i);
Num = sum( lgPrd(lgAct) == true );
CMat_abs(i,j) = Num;
CMat_rel(i,j) = Num/SumAct;
end
end
end
Related
This is my Approximate entropy Calculator in MATLAB. https://en.wikipedia.org/wiki/Approximate_entropy
I'm not sure why it isn't working. It's returning a negative value.Can anyone help me with this? R1 being the data.
FindSize = size(R1);
N = FindSize(1);
% N = input ('insert number of data values');
%if you want to put your own N in, take away the % from the line above
and
%insert the % before the N = FindSize(1)
%m = input ('insert m: integer representing length of data, embedding
dimension ');
m = 2;
%r = input ('insert r: positive real number for filtering, threshold
');
r = 0.2*std(R1);
for x1= R1(1:N-m+1,1)
D1 = pdist2(x1,x1);
C11 = (D1 <= r)/(N-m+1);
c1 = C11(1);
end
for i1 = 1:N-m+1
s1 = sum(log(c1));
end
phi1 = (s1/(N-m+1));
for x2= R1(1:N-m+2,1)
D2 = pdist2(x2,x2);
C21 = (D2 <= r)/(N-m+2);
c2 = C21(1);
end
for i2 = 1:N-m+2
s2 = sum(log(c2));
end
phi2 = (s2/(N-m+2));
Ap = phi1 - phi2;
Apen = Ap(1)
Following the documentation provided by the Wikipedia article, I developed this small function that calculates the approximate entropy:
function res = approximate_entropy(U,m,r)
N = numel(U);
res = zeros(1,2);
for i = [1 2]
off = m + i - 1;
off_N = N - off;
off_N1 = off_N + 1;
x = zeros(off_N1,off);
for j = 1:off
x(:,j) = U(j:off_N+j);
end
C = zeros(off_N1,1);
for j = 1:off_N1
dist = abs(x - repmat(x(j,:),off_N1,1));
C(j) = sum(~any((dist > r),2)) / off_N1;
end
res(i) = sum(log(C)) / off_N1;
end
res = res(1) - res(2);
end
I first tried to replicate the computation shown the article, and the result I obtain matches the result shown in the example:
U = repmat([85 80 89],1,17);
approximate_entropy(U,2,3)
ans =
-1.09965411068114e-05
Then I created another example that shows a case in which approximate entropy produces a meaningful result (the entropy of the first sample is always less than the entropy of the second one):
% starting variables...
s1 = repmat([10 20],1,10);
s1_m = mean(s1);
s1_s = std(s1);
s2_m = 0;
s2_s = 0;
% datasample will not always return a perfect M and S match
% so let's repeat this until equality is achieved...
while ((s1_m ~= s2_m) && (s1_s ~= s2_s))
s2 = datasample([10 20],20,'Replace',true,'Weights',[0.5 0.5]);
s2_m = mean(s2);
s2_s = std(s2);
end
m = 2;
r = 3;
ae1 = approximate_entropy(s1,m,r)
ae2 = approximate_entropy(s2,m,r)
ae1 =
0.00138568170752751
ae2 =
0.680090884817465
Finally, I tried with your sample data:
fid = fopen('O1.txt','r');
U = cell2mat(textscan(fid,'%f'));
fclose(fid);
m = 2;
r = 0.2 * std(U);
approximate_entropy(U,m,r)
ans =
1.08567461184858
I would like to divide an image into 8 by 6 blocks and then from each block would like to get the average of red, green and blue values then store the average values from each block into an array. Say that if I have image divided into 4 blocks the result array would be:
A = [average_red, average_green, average_blue,average_red, ...
average_green, average_blue,average_red, average_green, ...
average_blue,average_red, average_green, average_blue,...
average_red, average_green, average_blue,]
The loop I have created looks very complicated, takes a long time to run and I'm not even sure if it's working properly or not as I have no clue how to check. Is there any simpler way to implement this.
Here is the loop:
[rows, columns, ~] = size(img);
[rows, columns, ~] = size(img);
rBlock = 6;
cBlock = 8;
NumberOfBlocks = rBlock * cBlock;
bRow = ceil(rows/rBlock);
bCol = ceil(columns/cBlock);
row = bRow;
col = bCol;
r = zeros(row*col,1);
g = zeros(row*col,1);
b = zeros(row*col,1);
n = 1;
cl = 1;
rw = 1;
for x = 1:NumberOfBlocks
for i = cl : col
for j = rw : row
% some code
end
end
%some code
if i == columns && j ~= rows
cl = 1;
rw = j - (bRow -1);
col = (col - col) + bCol;
row = row + bRaw;
elseif a == columns && c == rows
display('done');
else
cl = i + 1;
rw = j - (bRow -1);
col = col + col;
row = row + row;
end
end
Because there are only 48 block, you may use simple for loop iterating blocks. (I think it's going to be fast enough).
Here is my code:
%Build test image
img = double(imresize(imread('peppers.png'), [200, 300]));
[rows, columns, ~] = size(img);
rBlock = 6;
cBlock = 8;
NumberOfBlocks = rBlock * cBlock;
bRow = ceil(rows/rBlock);
bCol = ceil(columns/cBlock);
idx = 1;
A = zeros(1, rBlock*cBlock*3);
for y = 0:rBlock-1
for x = 0:cBlock-1
%Block (y,x) boundaries: (x0,y0) to (x1,y1)
x0 = x*bCol+1;
y0 = y*bRow+1;
x1 = min(x0+bCol-1, columns); %Limit x1 to columns
y1 = min(y0+bRow-1, rows); %Limit y1 to rows
redMean = mean2(img(y0:y1, x0:x1, 1)); %Mean of red pixel in block (y,x)
greenMean = mean2(img(y0:y1, x0:x1, 2)); %Mean of green pixel in block (y,x)
blueMean = mean2(img(y0:y1, x0:x1, 3)); %Mean of blue pixel in block (y,x)
%Fill 3 elements of array A.
A(idx) = redMean;
A(idx+1) = greenMean;
A(idx+2) = blueMean;
%Advance index by 3.
idx = idx + 3;
end
end
I executed this code using Feature Matrix 517*11 and Label Matrix 517*1. But once the dimensions of matrices change the code cant be run. How can I fix this?
The error is:
Subscripted assignment dimension mismatch.
in this line :
edges(k,j) = quantlevels(a);
Here is my code:
function [features,weights] = MI(features,labels,Q)
if nargin <3
Q = 12;
end
edges = zeros(size(features,2),Q+1);
for k = 1:size(features,2)
minval = min(features(:,k));
maxval = max(features(:,k));
if minval==maxval
continue;
end
quantlevels = minval:(maxval-minval)/500:maxval;
N = histc(features(:,k),quantlevels);
totsamples = size(features,1);
N_cum = cumsum(N);
edges(k,1) = -Inf;
stepsize = totsamples/Q;
for j = 1:Q-1
a = find(N_cum > j.*stepsize,1);
edges(k,j) = quantlevels(a);
end
edges(k,j+2) = Inf;
end
S = zeros(size(features));
for k = 1:size(S,2)
S(:,k) = quantize(features(:,k),edges(k,:))+1;
end
I = zeros(size(features,2),1);
for k = 1:size(features,2)
I(k) = computeMI(S(:,k),labels,0);
end
[weights,features] = sort(I,'descend');
%% EOF
function [I,M,SP] = computeMI(seq1,seq2,lag)
if nargin <3
lag = 0;
end
if(length(seq1) ~= length(seq2))
error('Input sequences are of different length');
end
lambda1 = max(seq1);
symbol_count1 = zeros(lambda1,1);
for k = 1:lambda1
symbol_count1(k) = sum(seq1 == k);
end
symbol_prob1 = symbol_count1./sum(symbol_count1)+0.000001;
lambda2 = max(seq2);
symbol_count2 = zeros(lambda2,1);
for k = 1:lambda2
symbol_count2(k) = sum(seq2 == k);
end
symbol_prob2 = symbol_count2./sum(symbol_count2)+0.000001;
M = zeros(lambda1,lambda2);
if(lag > 0)
for k = 1:length(seq1)-lag
loc1 = seq1(k);
loc2 = seq2(k+lag);
M(loc1,loc2) = M(loc1,loc2)+1;
end
else
for k = abs(lag)+1:length(seq1)
loc1 = seq1(k);
loc2 = seq2(k+lag);
M(loc1,loc2) = M(loc1,loc2)+1;
end
end
SP = symbol_prob1*symbol_prob2';
M = M./sum(M(:))+0.000001;
I = sum(sum(M.*log2(M./SP)));
function y = quantize(x, q)
x = x(:);
nx = length(x);
nq = length(q);
y = sum(repmat(x,1,nq)>repmat(q,nx,1),2);
I've run the function several times without getting any error.
I've used as input for "seq1" and "seq2" arrays such as 1:10 and 11:20
Possible error might rise in the loops
for k = 1:lambda1
symbol_count1(k) = sum(seq1 == k);
end
if "seq1" and "seq2" are defined as matrices since sum will return an array while
symbol_count1(k)
is expected to be single value.
Another possible error might rise if seq1 and seq2 are not of type integer since they are used as indexes in
M(loc1,loc2) = M(loc1,loc2)+1;
Hope this helps.
How to store all the vectors s into a cell array, so that I can use it later. the number of vectors s is not specific, it depends on the condition while sigma > sigma_min. Can anyone help me?
A_pinv = A'* inv(A * A');
s = A_pinv * X
sigma = 2*max(abs(s));
sigma_min = 0.0001;
sigma_decrease_factor = 0.5;
while sigma>sigma_min
for i = 1:L
delta = s.*exp(-abs(s).^2/sigma^2);
s = s - 0.5*delta;
s = s - A_pinv*(A*s - X);
end
sigma = sigma * sigma_decrease_factor;
end
I haven't tested it, but I think it will work,
count = 0;
Data = {};
while sigma>sigma_min
count = count + 1;
for i = 1:L
delta = s.*exp(-abs(s).^2/sigma^2);
s = s - 0.5*delta;
s = s - A_pinv*(A*s - X);
end
Data{count} = s;
sigma = sigma * sigma_decrease_factor;
end
I have a vector. I want to remove outliers. I got bin and no of values in that bin. I want to remove all points based on the number of elements in each bin.
Data:
d1 =[
360.471912914169
505.084636471948
514.39429429184
505.285068055647
536.321181755858
503.025854206322
534.304229816684
393.387035881967
396.497969729985
520.592172434431
421.284713703215
420.401106087984
537.05330275495
396.715779872694
514.39429429184
404.442344469518
476.846474245118
599.020867750031
429.163139144079
514.941744277933
445.426761656729
531.013596812737
374.977332648255
364.660115724218
538.306752697753
519.042387479096
1412.54699036882
405.571202133485
516.606049132218
2289.49623498271
378.228766753667
504.730621222846
358.715764917016
462.339366699398
512.429858614816
394.778786157514
366
498.760463549388
366.552861126468
355.37022947906
358.308526273099
376.745272034036
366.934599077274
536.0901883079
483.01740134285
508.975480745389
365.629593988233
536.368800360349
557.024236456548
366.776498701866
501.007025898839
330.686029339009
508.395475983019
429.563732174866
2224.68806802212
534.655786464525
518.711297351426
534.304229816684
514.941744277933
420.32368479542
367.129404978681
525.626188464768
388.329756778952
1251.30895065927
525.626188464768
412.313764019587
513.697381733643
506.675438520558
1517.71183364959
550.276294237722
543.359917550053
500.639590923451
395.129864728041];
Histogram computation:
[nelements,centers] = hist(d1);
nelements=55 13 0 0 1 1 1 0 0 2
I want to remove all points apearing less than 5 (in nelements). It means only first 2 elements in nelements( 55, 13 ) remains.
Is there any function in matlab.
You can do it along these lines:
threshold = 5;
bin_halfwidth = (centers(2)-centers(1))/2;
keep = ~any(abs(bsxfun(#minus, d1, centers(nelements<threshold))) < bin_halfwidth , 2);
d1_keep = d1(keep);
Does this do what you want?
binwidth = centers(2)-centers(1);
centersOfRemainingBins = centers(nelements>5);
remainingvals = false(length(d1),1);
for ii = 1:length(centersOfRemainingBins )
remainingvals = remainingvals | (d1>centersOfRemainingBins (ii)-binwidth/2 & d1<centersOfRemainingBins (ii)+binwidth/2);
end
d_out = d1(remainingvals);
I don't know Matlab function for this problem, but I think, that function with follow code is what are you looking for:
sizeData = size(data);
function filter_hist = filter_hist(data, binCountRemove)
if or(max(sizeData) == 0, binCountRemove < 1)
disp('Error input!');
filter_hist = [];
return;
end
[n, c] = hist(data);
sizeN = size(n);
intervalSize = c(2) - c(1);
if sizeData(1) > sizeData(2)
temp = transpose(data);
else
temp = data;
end
for i = 1:1:max(sizeN)
if n(i) < binCountRemove
a = c(i) - intervalSize / 2;
b = c(i) + intervalSize / 2;
sizeTemp = size(temp);
removeInds = [];
k = 0;
for j = 1:1:max(sizeTemp)
if and(temp(j) > a, less_equal(temp(j), b) == 1)
k = k + 1;
removeInds(k) = j;
end
end
temp(removeInds) = [];
end
end
filter_hist = transpose(temp);
%Determines when 'a' less or equal to 'b' by accuracy
function less_equal = less_equal(a, b)
delta = 10^-6; %Accuracy
if a < b
less_equal = 1;
return;
end
if abs(b - a) < delta
less_equal = 1;
return;
end
less_equal = 0;
You can do something like this
nelements=nelements((nelements >5))