What is the better way to change the percentages of the training and the testing during the splitting process? - matlab

With using the PCA technique and the Yale database, I'm trying to work on face recognition within Matlab by randomly splitting the training process to 20% and the testing process to 80%. It is given an
Index in position 2 exceeds array bounds (must not exceed 29)
error. The following is the code, hoping to get help:
dataset = load('yale_FaceDataset.mat');
trainSz = round(dataset.samples*0.2);
testSz = round(dataset.samples*0.8);
trainSetCell = cell(1,trainSz*dataset.classes);
testSetCell = cell(1,testSz*dataset.classes);
j = 1;
k = 1;
m = 1;
for i = 1:dataset.classes
% training set
trainSetCell(k:k+trainSz-1) = dataset.images(j:j+trainSz-1);
trainLabels(k:k+trainSz-1) = dataset.labels(j:j+trainSz-1);
k = k+trainSz;
% test set
testSetCell(m:m+testSz-1) = dataset.images(j+trainSz:j+dataset.samples-1);
testLabels(m:m+testSz-1) = dataset.labels(j+trainSz:j+dataset.samples-1);
m = m+testSz;
j = j+dataset.samples;
end
% convert the data from a cell into a matrix format
numImgs = length(trainSetCell);
trainSet = zeros(numImgs,numel(trainSetCell{1}));
for i = 1:numImgs
trainSet(i,:) = reshape(trainSetCell{i},[],1);
end
numImgs = length(testSetCell);
testSet = zeros(numImgs,numel(testSetCell{1}));
for i = 1:numImgs
testSet(i,:) = reshape(testSetCell{i},[],1);
end
%% applying PCA
% compute the mean face
mu = mean(trainSet)';
% centre the training data
trainSet = trainSet - (repmat(mu,1,size(trainSet,1)))';
% generate the eigenfaces(features of the training set)
eigenfaces = pca(trainSet);
% set the number of principal components
Ncomponents = 100;
% Out of the generated components, we keep "Ncomponents"
eigenfaces = eigenfaces(:,1:Ncomponents);
% generate training features
trainFeatures = eigenfaces' * trainSet';
% Subspace projection
% centre features
testSet = testSet - (repmat(mu,1,size(testSet,1)))';
% subspace projection
testFeatures = inv(eigenfaces'*eigenfaces) * eigenfaces' * testSet';
mdl = fitcdiscr(trainFeatures',trainLabels);
labels = predict(mdl,testFeatures');
% find the images that were recognised and their respect. labels
correctRec = find(testLabels == labels');
correctLabels = labels(correctRec);
% find the images that were NOT recognised and their respect. labels
falseRec = find(testLabels ~= labels');
falseLabels = labels(falseRec);
% compute and display the recognition rate
result = length(correctRec)/length(testLabels)*100;
fprintf('The recognition rate is: %0.3f \n',result);
% divide the images into : recognised and unrecognised
correctTest = testSetCell(correctRec);
falseTest = testSetCell(falseRec);
% display some recognised samples and their respective labels
imgshow(correctTest(1:8),correctLabels(1:8));
% display all unrecognised samples and their respective labels
imgshow(falseTest(1:length(falseTest)), falseLabels(1:length(falseTest)));

it would be nice, if you provide also the line-number and the full message of the error and if you would strip your code to the essential. I guess, the PCA-stuff is not necessary here, as the error is raised probably in your loop. That is because you are incrementing j by j = j+dataset.samples; and take this in the next loop-set for indexing j:j+trainSz-1, which now must exceed dataset.samples...
Nevertheless, there is no randomness in the indexing. It is easiest if you use the built-in cvpartition-function:
% split data
cvp = cvpartition(Lbl,'HoldOut',.2);
lgTrn = cvp.training;
lgTst = cvp.test;
You may provide the number of classes as first input (Lbl in this case) or the actual class vector to let cvpartition pick random subsets that reflect the original distribution of the individual classes.

Related

How to calculate confusion matrix for object detection/recognition?

I have solved one image recognition problem of total 32 categories. I got the result and calculated the average precision of it. I need to draw the confusion matrix.
I am working on YOLOv2 and have created a confusion matrix for detection network. Hope this helps :)
testObjects are true labels, predLabels are predicted labels
TestData is imageDatastore() of imdsTest
testObjects = testData.UnderlyingDatastores{1, 1}.Files ; %'C:\Users\admin\Desktop\Img_Data\Flower1\Flower101.jpg'
testObjects = erase(testObjects,fullfile(pwd,imgFolderName)); %'\Flower1\Flower101.jpg'
testObjects = categorical(extractBetween(testObjects, "\","\")); % Flower1 - array
predLabels = zeros(2,1);
predLabels = categorical(predLabels); % Prelocation
for iPred = 1:length(testObjects)
[~, idxx] = max(cell2mat(detectionResults.Scores(iPred))); % max of all the bounding box scores
multiLabels = detectionResults.Labels{iPred,1}; % find label of max score
if isempty(multiLabels) == 1
predLabels(iPred,1) = {'NaN'};
predLabels(iPred,1) = standardizeMissing(predLabels(iPred,1),{'NaN'});
else
predLabels(iPred,1) = (multiLabels(idxx,1));
end
end
predLabels = removecats(predLabels);
plotconfusion (testObjects,predLabels) %confusionchart(testAsts,predLabels)

Updating histogram in a for-loop without growing y-data

I have had zero luck finding this elsewhere on the site, so here's my problem. I loop through about a thousand mat files, each with about 10,000 points of data. I'm trying to create an overall histogram of this data, but it's not very feasible to concatenate all this data to give to hist.
I was hoping to be able to create an N and Bin variable each loop using hist (y), then N and Bin would be recalculated on the next loop iteration by using hist(y_new). And so on and so on. That way the source data doesn't grow and when the loop finally ends, I can just use bar(). If this method wouldn't work, then I am very open-minded to other solutions.
Also, it is probably not safe to assume that the x data will remain constant throughout each iteration. I'm using 2012a.
Thanks for any help!!
I think the best solution here is to loop through your files twice: once to set the bins and once to do the histogram. But, if this is impossible in your case, here's a one shot solution that requires you to set the bin width beforehand.
clear; close all;
rng('default') % for reproducibility
% make example data
N = 10; % number of data files
M = 5; % length of data files
xs = cell(1,N);
for i = 1:N
xs{i} = trnd(1,1,M);
end
% parameters
width = 2;
% main
for i = 1:length(xs)
x = xs{i}; % "load data"
range = [min(x) max(x)];
binsPos = 0:width:range(2)+width;
binsNeg = fliplr( 0:-width:range(1)-width );
newBins = [binsNeg(1:end-1) binsPos];
newCounts = histc(x, newBins);
newCounts(end) = []; % last bin should always be zero, see help histc
if i == 1
counts = newCounts;
bins = newBins;
else
% combine new and old counts
allBins = min(bins(1), newBins(1)) : width : max(bins(end), newBins(end));
allCounts = zeros(1,length(allBins)-1);
allCounts(find(allBins==bins(1)) : find(allBins==bins(end-1))) = counts;
allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1))) = ...
allCounts(find(allBins==newBins(1)) : find(allBins==newBins(end-1))) + newCounts;
bins = allBins;
counts = allCounts;
end
end
% check
figure
bar(bins(1:end-1) + width/2, counts)
xFull = [xs{:}];
[fullCounts] = histc(xFull, bins);
fullCounts(end) = [];
figure
bar(bins(1:end-1) + width/2, fullCounts)

Dimensions issus

Finding maximum values of wave heights and wave lengths
dwcL01 though dwcL10 is arrays of <3001x2 double> with output from a numerical wave model.
Part of my script:
%% Plotting results from SWASH
% Examination of phase velocity on deep water with different number of layers
% Wave height 3 meters, wave peroid 8 sec on a depth of 30 meters
clear all; close all; clc;
T=8;
L0=1.56*T^2;
%% Loading results tabels.
load dwcL01.tbl; load dwcL02.tbl; load dwcL03.tbl; load dwcL04.tbl;
load dwcL05.tbl; load dwcL06.tbl; load dwcL07.tbl; load dwcL08.tbl;
load dwcL09.tbl; load dwcL10.tbl;
M(:,:,1) = dwcL01; M(:,:,2) = dwcL02; M(:,:,3) = dwcL03; M(:,:,4) = dwcL04;
M(:,:,5) = dwcL05; M(:,:,6) = dwcL06; M(:,:,7) = dwcL07; M(:,:,8) = dwcL08;
M(:,:,9) = dwcL09; M(:,:,10) = dwcL10;
%% Finding position of wave crest using diff and sign.
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
The loop
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
Works fine for ii = 1 Getting the following error for ii = 2
Index exceeds matrix dimensions.
Error in mkPlot (line 19)
Wc(:,:,i) = M(Tp,:,i);
Don't have the same number of wave crests for the different set ups, naturally M(Tp,1,ii) will have different dimensions. How do I work around this issue? Can it be done in a for loop? please feel free to email me or other wise ask for further information.
The problem is that Tp is a three dimensional array. I need to call the Tp(:,:,ii) corresponding to the present scenario. Together with this and defining Wc as a cell I solve my issue.
for ii = 1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc{:,:,ii} = M(Tp(:,:,ii),1,ii);
L{:,ii} = diff(cell2mat(Wc(ii)));
end

PCA codes input, use for PalmPrint Recognition

I'm new to Matlab.
I'm trying to apply PCA function(URL listed below)into my palm print recognition program to generate the eigenpalms. My palm print grey scale images dimension are 450*400.
Before using it, I was trying to study these codes and add some codes to save the eigenvector as .mat file. Some of the %comments added by me for my self understanding.
After a few days of studying, I still unable to get the answers.
I decided to ask for helps.I have a few questions to ask regarding this PCA.m.
PCA.m
What is the input of the "options" should be? of "PCA(data,details,options)"
(is it an integer for reduced dimension? I was trying to figure out where is the "options" value passing, but still unable to get the ans. The msgbox of "h & h2", is to check the codes run until where. I was trying to use integer of 10, but the PCA.m processed dimension are 400*400.)
The "eigvector" that I save as ".mat" file is ready to perform Euclidean distance classifier with other eigenvector? (I'm thinking that eigvector is equal to eigenpalm, like in face recognition, the eigen faces. I was trying to convert the eigenvector matrix back to image, but the image after PCA process is in Black and many dots on it)
mySVD.m
In this function, there are two values that can be changed, which are MAX_MATRIX_SIZE set by 1600 and EIGVECTOR_RATIO set by 0.1%. May I know these values will affect the results? ( I was trying to play around with the values, but I cant see the different. My palm print image dimension is set by 450*400, so the Max_matrix_size should set at 180,000?)
** I hope you guys able to understand what I'm asking, please help, Thanks guys (=
Original Version : http://www.cad.zju.edu.cn/home/dengcai/Data/code/PCA.m
mySVD: http://www.cad.zju.edu.cn/home/dengcai/Data/code/mySVD.m
% Edited Version by me
function [eigvector, eigvalue] = PCA(data,details,options)
%PCA Principal Component Analysis
%
% Usage:
% [eigvector, eigvalue] = PCA(data, options)
% [eigvector, eigvalue] = PCA(data)
%
% Input:
% data - Data matrix. Each row vector of fea is a data point.
% fea = finite element analysis ?????
% options.ReducedDim - The dimensionality of the reduced subspace. If 0,
% all the dimensions will be kept.
% Default is 0.
%
% Output:
% eigvector - Each column is an embedding function, for a new
% data point (row vector) x, y = x*eigvector
% will be the embedding result of x.
% eigvalue - The sorted eigvalue of PCA eigen-problem.
%
% Examples:
% fea = rand(7,10);
% options=[]; %store an empty matrix in options
% options.ReducedDim=4;
% [eigvector,eigvalue] = PCA(fea,4);
% Y = fea*eigvector;
%
% version 3.0 --Dec/2011
% version 2.2 --Feb/2009
% version 2.1 --June/2007
% version 2.0 --May/2007
% version 1.1 --Feb/2006
% version 1.0 --April/2004
%
% Written by Deng Cai (dengcai AT gmail.com)
%
if (~exist('options','var'))
%A = exist('name','kind')
% var = Checks only for variables.
%http://www.mathworks.com/help/matlab/matlab_prog/symbol-reference.html#bsv2dx9-1
%The tilde "~" character is used in comparing arrays for unequal values,
%finding the logical NOT of an array,
%and as a placeholder for an input or output argument you want to omit from a function call.
options = [];
end
h2 = msgbox('not yet');
ReducedDim = 0;
if isfield(options,'ReducedDim')
%tf = isfield(S, 'fieldname')
h2 = msgbox('checked');
ReducedDim = options.ReducedDim;
end
[nSmp,nFea] = size(data);
if (ReducedDim > nFea) || (ReducedDim <=0)
ReducedDim = nFea;
end
if issparse(data)
data = full(data);
end
sampleMean = mean(data,1);
data = (data - repmat(sampleMean,nSmp,1));
[eigvector, eigvalue] = mySVD(data',ReducedDim);
eigvalue = full(diag(eigvalue)).^2;
if isfield(options,'PCARatio')
sumEig = sum(eigvalue);
sumEig = sumEig*options.PCARatio;
sumNow = 0;
for idx = 1:length(eigvalue)
sumNow = sumNow + eigvalue(idx);
if sumNow >= sumEig
break;
end
end
eigvector = eigvector(:,1:idx);
end
%dt get from C# program, user ID and name
evFolder = 'ev\';
userIDName = details; %get ID and Name
userIDNameWE = strcat(userIDName,'\');%get ID and Name with extension
filePath = fullfile('C:\Users\***\Desktop\Data Collection\');
userIDNameFolder = strcat(filePath,userIDNameWE); %ID and Name folder
userIDNameEVFolder = strcat(userIDNameFolder,evFolder);%EV folder in ID and Name Folder
userIDNameEVFile = strcat(userIDNameEVFolder,userIDName); % EV file with ID and Name
if ~exist(userIDNameEVFolder, 'dir')
mkdir(userIDNameEVFolder);
end
newFile = strcat(userIDNameEVFile,'_1');
searchMat = strcat(newFile,'.mat');
if exist(searchMat, 'file')
filePattern = strcat(userIDNameEVFile,'_');
D = dir([userIDNameEVFolder, '*.mat']);
Num = length(D(not([D.isdir])))
Num=Num+1;
fileName = [filePattern,num2str(Num)];
save(fileName,'eigvector');
else
newFile = strcat(userIDNameEVFile,'_1');
save(newFile,'eigvector');
end
You pass options in a structure, for instance:
options.ReducedDim = 2;
or
options.PCARatio =0.4;
The option ReducedDim selects the number of dimensions you want to use to represent the final projection of the original matrix. For instance if you pick option.ReducedDim = 2 you use only the two eigenvectors with largest eigenvalues (the two principal components) to represent your data (in effect the PCA will return the two eigenvectors with largest eigenvalues).
PCARatio instead allows you to pick the number of dimensions as the first eigenvectors with largest eigenvalues that account for fraction PCARatio of the total sum of eigenvalues.
In mySVD.m, I would not increase the default values unless you expect more than 1600 eigenvectors to be necessary to describe your dataset. I think you can safely leave the default values.

Kmean plotting in matlab

I am on a project thumb recognition system on matlab. I implemented Kmean Algorithm and I got results as well. Actually now I want to plot the results like here they done. I am trying but couldn't be able to do so. I am using the following code.
load training.mat; % loaded just to get trainingData variable
labelData = zeros(200,1);
labelData(1:100,:) = 0;
labelData(101:200,:) = 1;
k=2;
[trainCtr, traina] = kmeans(trainingData,k);
trainingResult1=[];
for i=1:k
trainingResult1 = [trainingResult1 sum(trainCtr(1:100)==i)];
end
trainingResult2=[];
for i=1:k
trainingResult2 = [trainingResult2 sum(trainCtr(101:200)==i)];
end
load testing.mat; % loaded just to get testingData variable
c1 = zeros(k,1054);
c1 = traina;
cluster = zeros(200,1);
for j=1:200
testTemp = repmat(testingData(j,1:1054),k,1);
difference = sum((c1 - testTemp).^2, 2);
[value index] = min(difference);
cluster(j,1) = index;
end
testingResult1 = [];
for i=1:k
testingResult1 = [testingResult1 sum(cluster(1:100)==i)];
end
testingResult2 = [];
for i=1:k
testingResult2 = [testingResult2 sum(cluster(101:200)==i)];
end
in above code trainingData is matrix of 200 X 1054 in which 200 are images of thumbs and 1054 are columns. actually each image is of 25 X 42. I reshaped each image in to row matrix (1 X 1050) and 4 other (some features) columns so total of 1054 columns are in each image. Similarly testingData I made it in the similar manner as I made testingData It is also the order of 200 X 1054. Now my Problem is just to plot the results as they did in here.
After selecting 2 features, you can just follow the example. Start a figure, use hold on, and use plot or scatter to plot the centroids and the data points. E.g.
selectedFeatures = [42,43];
plot(trainingData(trainCtr==1,selectedFeatures(1)),
trainingData(trainCtr==1,selectedFeatures(2)),
'r.','MarkerSize',12)
Would plot the selected feature values of the data points in cluster 1.