"Out of Memory" Matlab - matlab

Well I made 4 standalone executable from 4 different Matlab functions to build a face recognition system. I am calling those 4 executables using different batch codes and performing tasks on images. The total number of images I have is above 300k. 3 of these 4 executables is working good, but I am facing "out of memory" problem when I am trying to call the standalone executable of Fisherface function. It simply calculates unique features of each image using Fisher's linear discriminant analysis. The analysis is applied on the huge face matrix which consists of pixel values of over 150,000 images of size 60*60. Hence the size of the matrix is 150,000*3600.
Well what I understand is its happening due to shortage of contiguous memory in RAM. So as a way out, I chose to divide my large image set into number of subsets, each of which contains 3000 images. Now when an input face is provided, it searches for best matches of that input in each of those subset and finally sorts out the final list of 3 best matches with lowest distances (Euclidean). This resolved the out of memory error but the recognition rate became much lower. Because when the discriminant analysis is done in the original face matrix (which I have tested in smaller datasets containing 4000-5000 images), it gives good recognition rate.
I am seeking a way out of this problem. I want to perform all the operations on the large matrix. Is there a way to implement the function more efficiently, for example, allocating memory dynamically in Matlab? I hope I have been fairly specific in order to explain my problem. Below, I have provided the code segment of that particular executable.
function FisherfaceCorenew(matname)
load(matname);
Class_number = size(T,2) ;
Class_population = 1;
P = Class_population * Class_number; % Total number of training images
%%%%%%%%%%%%%%%%%%%%%%%% calculating the mean image
m_database = single(mean(T,2));
%%%%%%%%%%%%%%%%%%%%%%%% Calculating the deviation of each image from mean image
A = T - repmat(m_database,1,P);
L = single(A')*single(A);
[V D] = eig(L); % Diagonal elements of D are the eigenvalues for both L=A'*A and C=A*A'.
%%%%%%%%%%%%%%%%%%%%%%%% Sorting and eliminating small eigenvalues
L_eig_vec = [];
for i = 1 : P
L_eig_vec = [L_eig_vec V(:,i)];
end
%%%%%%%%%%%%%%%%%%%%%%%% Calculating the eigenvectors of covariance matrix 'C'
V_PCA = single(A) * single(L_eig_vec);
%%%%%%%%%%%%%%%%%%%%%%%% Projecting centered image vectors onto eigenspace
ProjectedImages_PCA = [];
for i = 1 : P
temp = single(V_PCA')*single(A(:,i));
ProjectedImages_PCA = [ProjectedImages_PCA temp];
end
%%%%%%%%%%%%%%%%%%%%%%%% Calculating the mean of each class in eigenspace
m_PCA = mean(ProjectedImages_PCA,2); % Total mean in eigenspace
m = zeros(P,Class_number);
Sw = zeros(P,P); %new
Sb = zeros(P,P); %new
for i = 1 : Class_number
m(:,i) = mean( ( ProjectedImages_PCA(:,((i-1)*Class_population+1):i*Class_population) ), 2 )';
S = zeros(P,P); %new
for j = ( (i-1)*Class_population+1 ) : ( i*Class_population )
S = S + (ProjectedImages_PCA(:,j)-m(:,i))*(ProjectedImages_PCA(:,j)-m(:,i))';
end
Sw = Sw + S; % Within Scatter Matrix
Sb = Sb + (m(:,i)-m_PCA) * (m(:,i)-m_PCA)'; % Between Scatter Matrix
end
%%%%%%%%%%%%%%%%%%%%%%%% Calculating Fisher discriminant basis's
% We want to maximise the Between Scatter Matrix, while minimising the
% Within Scatter Matrix. Thus, a cost function J is defined, so that this condition is satisfied.
[J_eig_vec, J_eig_val] = eig(Sb,Sw);
J_eig_vec = fliplr(J_eig_vec);
%%%%%%%%%%%%%%%%%%%%%%%% Eliminating zero eigens and sorting in descend order
for i = 1 : Class_number-1
V_Fisher(:,i) = J_eig_vec(:,i);
end
%%%%%%%%%%%%%%%%%%%%%%%% Projecting images onto Fisher linear space
for i = 1 : Class_number*Class_population
ProjectedImages_Fisher(:,i) = V_Fisher' * ProjectedImages_PCA(:,i);
end
save fisherdata.mat m_database V_PCA V_Fisher ProjectedImages_Fisher;
end

It's not easy to help you, because we can't see the sizes of your matrices.
At least you could use the Matlab clear command after you don't use a variable anymore (e.g. A).
Maybe you could use the single() command when you allocate A variable instead of in every equation.
A = single(T - repmat(m_database,1,P));
And then
L = A'*A;
Also you could use the Matlab profiler with memory usage to see your memory demand.
Another option could be to use sparse matrices or reduce to even smaller datatypes like uint8, if appropriate for some data.

Related

How can I modify a set of vectors to have the same size?

i'm trying to extract HOG_features for mathematical symbols classification (i will use SVM classifier). I get a 1xn vector then i have to put all the vectors in a single matrix. The problem is that the size of the feature vector is different for each image so I can't concatenate them.
Is there a way to make all vectors having the same size ?
Thank you in advance.
Here is the code:
rep1 = 'D:\mémoire MASTER\data';
ext = '*.tif' ;
chemin = fullfile(rep1, ext);
list = dir(chemin);
for i=1:length(list)
I = imread(fullfile(rep1, list(i).name), ext(3:end));
if size(I,3)==3 % RGB image
I = rgb2gray(I);
end
I1 = imbinarize(I);
% Extract HOG features data
HOG_feat = extractHOGFeatures(I1,'CellSize', [2 2]);
HOG_feat1 = HOG_feat';
end
You can pad each one with zeros to be as long as the longest one:
e.g. to put two vectors, v1 and v2, into a matrix M:
M = zeros(2,max(length(v1),length(v2)));
M(1,1:length(v1)) = v1;
M(2,1:length(v2)) = v2;
You have the problem that all your vectors are a different size. Instead of trying to coerce them to be the sesame size by zero-padding or interpolating (both I think are bad ideas), change your computation so that the length of the output vector does not depend on the size of the image.
This is your current code:
HOG_feat = extractHOGFeatures(I1,'CellSize', [2 2]);
% ^^^
% the image is split in cells of 2x2 pixels
2x2 cells are way too small for this method anyway. You could instead divide your image into a set number of cells, say 100 cells:
cellSize = ceil(size(I1)/10);
HOG_feat = extractHOGFeatures(I1,'CellSize', cellSize);
(I’m using ceil in the division because I figure it’s necessary to have an integer size. But I’m not sure whether ceil or floor or round is needed here, and I don’t have access to this function to test it. A bit of trial and error should show which method gives consistent output size.)

optimizing manually-coded k-means in MATLAB?

So I'm writing a k-means script in MATLAB, since the native function doesn't seem to be very efficient, and it seems to be fully operational. It appears to work on the small training set that I'm using (which is a 150x2 matrix fed via text file). However, the runtime is taking exponentially longer for my target data set, which is a 3924x19 matrix.
I'm not the greatest at vectorization, so any suggestions would be greatly appreciated. Here's my k-means script so far (I know I'm going to have to adjust my convergence condition, since it's looking for an exact match, and I'll probably need more iterations for a dataset this large, but I want it to be able to finish in a reasonable time first, before I crank that number up):
clear all;
%take input file (manually specified by user
disp('Please type input filename (in working directory): ')
target_file = input('filename: ', 's');
%parse and load into matrix
data = load(target_file);
%prompt name of output file for later) UNCOMMENT BELOW TWO LINES LATER
% disp('Please type output filename (to be saved in working directory): ')
% output_name = input('filename:', 's')
%prompt number of clusters
disp('Please type desired number of clusters: ')
c = input ('number of clusters: ');
%specify type of kmeans algorithm ('regular' for regular, 'fuzzy' for fuzzy)
%UNCOMMENT BELOW TWO LINES LATER
% disp('Please specify type (regular or fuzzy):')
% runtype = input('type: ', 's')
%initialize cluster centroid locations within bounds given by data set
%initialize rangemax and rangemin row vectors
%with length same as number of dimensions
rangemax = zeros(1,size(data,2));
rangemin = zeros(1,size(data,2));
%map max and min values for bounds
for dim = 1:size(data,2)
rangemax(dim) = max(data(:,dim));
rangemin(dim) = min(data(:,dim));
end
% rangemax
% rangemin
%randomly initialize mu_k (center) locations in (k x n) matrix where k is
%cluster number and n is number of dimensions/coordinates
mu_k = zeros(c,size(data,2));
for k = 1:size(data,2)
mu_k(k,:) = rangemin + (rangemax - rangemin).*rand(1,1);
end
mu_k
%iterate k-means
%initialize holding variable for distance comparison
comparisonmatrix = [];
%initialize assignment vector
assignment = zeros(size(data,1),1);
%initialize distance holding vector
dist = zeros(1,size(data,2));
%specify convergence threshold
%threshold = 0.001;
for iteration = 1:25
%save current assignment values to check convergence condition
hold_assignment = assignment;
for point = 1:size(data,1)
%calculate distances from point to centers
for k = 1:c
%holding variables
comparisonmatrix = [data(point,:);mu_k(k,:)];
dist(k) = pdist(comparisonmatrix);
end
%record location of mininum distance (location value will be between 1
%and k)
[minval, location] = min(dist);
%assign cluster number (analogous to location value)
assignment(point) = location;
end
%check convergence criteria
if isequal(assignment,hold_assignment)
break
end
%revise mu_k locations
%count number of each label
assignment_count = zeros(1,c);
for i = 1:size(data,1)
assignment_count(assignment(i)) = assignment_count(assignment(i)) + 1;
end
%compute centroids
point_total = zeros(size(mu_k));
for row = 1:size(data,1)
point_total(assignment(row),:) = point_total(assignment(row)) + data(row,:);
end
%move mu_k values to centroids
for center = 1:c
mu_k(center,:) = point_total(center,:)/assignment_count(center);
end
end
There are a lot of loops in there, so I feel that there's a lot of optimization to be made. However, I think I've just been staring at this code for far too long, so some fresh eyes could help. Please let me know if I need to clarify anything in the code block.
When the above code block is executed (in context) on the large dataset, it takes 3732.152 seconds, according to MATLAB's profiler, to make the full 25 iterations (I'm assuming it hasn't "converged" according to my criteria yet) for 150 clusters, but about 130 of them return NaNs (130 rows in mu_k).
Profiling will help, but the place to rework your code is to avoid the loop over the number of data points (for point = 1:size(data,1)). Vectorize that.
In your for iteration loop here is a quick partial example,
[nPoints,nDims] = size(data);
% Calculate all high-dimensional distances at once
kdiffs = bsxfun(#minus,data,permute(mu_k,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
distances = sum(kdiffs.^2,2); % no need to do sqrt
distances = squeeze(distances); % Nx1xK => NxK
% Find closest cluster center for each point
[~,ik] = min(distances,[],2); % Nx1
% Calculate the new cluster centers (mean the data)
mu_k_new = zeros(c,nDims);
for i=1:c,
indk = ik==i;
clustersizes(i) = nnz(indk);
mu_k_new(i,:) = mean(data(indk,:))';
end
This isn't the only (or the best) way to do it, but it should be a decent example.
Some other comments:
Instead of using input, make this script into a function to efficiently handle input arguments.
If you want an easy way to specify a file, see uigetfile.
With many MATLAB functions, such as max, min, sum, mean, etc., you can specify a dimension over which the function should operate. This way you an run it on a matrix and compute values for multiple conditions/dimensions at the same time.
Once you get decent performance, consider iterating longer, specifically until the centers no longer change or the number of samples that change clusters becomes small.
The cluster with the smallest distance for each point, ik, will be the same with squared Euclidean distance.

Multiply an arbitrary number of matrices an arbitrary number of times

I have found several questions/answers for vectorizing and speeding up routines for multiplying a matrix and a vector in a single loop, but I am trying to do something a little more general, namely multiplying an arbitrary number of matrices together, and then performing that operation an arbitrary number of times.
I am writing a general routine for calculating thin-film reflection from an arbitrary number of layers vs optical frequency. For each optical frequency W each layer has an index of refraction N and an associated 2x2 transfer matrix L and 2x2 interface matrix I which depends on the index of refraction and the thickness of the layer. If n is the number of layers, and m is the number of frequencies, then I can vectorize the index into an n x m matrix, but then in order to calculate the reflection at each frequency, I have to do nested loops. Since I am ultimately using this as part of a fitting routine, anything I can do to speed it up would be greatly appreciated.
This should provide a minimum working example:
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
for x = 1:m %loop over frequencies
C = eye(2); % first medium is air
for y = 2:n %loop over layers
na = N(y-1,x);
nb = N(y,x);
%I = InterfaceMatrix(na,nb); % calculate the 2x2 interface matrix
I = [1 na*nb;na*nb 1]; % dummy matrix
%L = TransferMatrix(nb) % calculate the 2x2 transfer matrix
L = [exp(-1i*nb*W(x)*D(y)) 0; 0 exp(+1i*nb*W(x)*D(y))]; % dummy matrix
C = C*I*L;
end
a = C(1,1);
c = C(2,1);
r(x) = c/a; % reflectivity, the answer I want.
end
Running this twice for two different polarizations for a three layer (air/stuff/substrate) problem with 2562 frequencies takes 0.952 seconds while solving the exact same problem with the explicit formula (vectorized) for a three layer system takes 0.0265 seconds. The problem is that beyond 3 layers, the explicit formula rapidly becomes intractable and I would have to have a different subroutine for each number of layers while the above is completely general.
Is there hope for vectorizing this code or otherwise speeding it up?
(edited to add that I've left several things out of the code to shorten it, so please don't try to use this to actually calculate reflectivity)
Edit: In order to clarify, I and L are different for each layer and for each frequency, so they change in each loop. Simply taking the exponent will not work. For a real world example, take the simplest case of a soap bubble in air. There are three layers (air/soap/air) and two interfaces. For a given frequency, the full transfer matrix C is:
C = L_air * I_air2soap * L_soap * I_soap2air * L_air;
and I_air2soap ~= I_soap2air. Thus, I start with L_air = eye(2) and then go down successive layers, computing I_(y-1,y) and L_y, multiplying them with the result from the previous loop, and going on until I get to the bottom of the stack. Then I grab the first and third values, take the ratio, and that is the reflectivity at that frequency. Then I move on to the next frequency and do it all again.
I suspect that the answer is going to somehow involve a block-diagonal matrix for each layer as mentioned below.
Not next to a matlab, so that's only a starter,
Instead of the double loop you can write na*nb as Nab=N(1:end-1,:).*N(2:end,:);
The term in the exponent nb*W(x)*D(y) can be written as e=N(2:end,:)*W'*D;
The result of I*L is a 2x2 block matrix that has this form:
M = [1, Nab; Nab, 1]*[e-, 0;0, e+] = [e- , Nab*e+ ; Nab*e- , e+]
with e- as exp(-1i*e), and e+ as exp(1i*e)'
see kron on how to get the block matrix form, to vectorize the propagation C=C*I*L just take M^n
#Lama put me on the right path by suggesting block matrices, but the ultimate answer ended up being more complicated, and so I put it here for posterity. Since the transfer and interface matrix is different for each layer, I leave in the loop over the layers, but construct a large sparse block matrix where each block represents a frequency.
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
C = speye(2*m); % first medium is air
even = 2:2:2*m;
odd = 1:2:2*m-1;
for y = 2:n %loop over layers
na = N(y-1,:);
nb = N(y,:);
% get the reflection and transmission coefficients from subroutines as a vector
% of length m, one value for each frequency
%t = Tab(na, nb);
%r = Rab(na, nb);
t = rand(size(W)); % dummy vector for MWE
r = rand(size(W)); % dummy vector for MWE
% create diagonal and off-diagonal elements. each block is [1 r;r 1]/t
Id(even) = 1./t;
Id(odd) = Id(even);
Io(even) = 0;
Io(odd) = r./t;
It = [Io;Id/2].';
I = spdiags(It,[-1 0],2*m,2*m);
I = I + I.';
b = 1i.*(2*pi*D(n).*nb).*W;
B(even) = -b;
B(odd) = b;
L = spdiags(exp(B).',0,2*m,2*m);
C = C*I*L;
end
a = spdiags(C,0);
a = a(odd).';
c = spdiags(C,-1);
c = c(odd).';
r = c./a; % reflectivity, the answer I want.
With the 3 layer system mentioned above, it isn't quite as fast as the explicit formula, but it's close and probably can get a little faster after some profiling. The full version of the original code clocks at 0.97 seconds, the formula at 0.012 seconds and the sparse diagonal version here at 0.065 seconds.

How can we produce kappa and delta in the following model using Matlab?

I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );

KNN algo in matlab

I am working on thumb recognition system. I need to implement KNN algorithm to classify my images. according to this, it has only 2 measurements, through which it is calculating the distance to find the nearest neighbour but in my case I have 400 images of 25 X 42, in which 200 are for training and 200 for testing. I am searching for few hours but I am not finding the way to find the distance between the points.
EDIT:
I have reshaped 1st 200 images in to 1 X 1050 and stored them in a matrix trainingData of 200 X 1050. similarly I made testingData.
Here is an illustration code for k-nearest neighbor classification (some functions used require the Statistics toolbox):
%# image size
sz = [25,42];
%# training images
numTrain = 200;
trainData = zeros(numTrain,prod(sz));
for i=1:numTrain
img = imread( sprintf('train/image_%03d.jpg',i) );
trainData(i,:) = img(:);
end
%# testing images
numTest = 200;
testData = zeros(numTest,prod(sz));
for i=1:numTest
img = imread( sprintf('test/image_%03d.jpg',i) );
testData(i,:) = img(:);
end
%# target class (I'm just using random values. Load your actual values instead)
trainClass = randi([1 5], [numTrain 1]);
testClass = randi([1 5], [numTest 1]);
%# compute pairwise distances between each test instance vs. all training data
D = pdist2(testData, trainData, 'euclidean');
[D,idx] = sort(D, 2, 'ascend');
%# K nearest neighbors
K = 5;
D = D(:,1:K);
idx = idx(:,1:K);
%# majority vote
prediction = mode(trainClass(idx),2);
%# performance (confusion matrix and classification error)
C = confusionmat(testClass, prediction);
err = sum(C(:)) - sum(diag(C))
If you want to compute the Euclidean distance between vectors a and b, just use Pythagoras. In Matlab:
dist = sqrt(sum((a-b).^2));
However, you might want to use pdist to compute it for all combinations of vectors in your matrix at once.
dist = squareform(pdist(myVectors, 'euclidean'));
I'm interpreting columns as instances to classify and rows as potential neighbors. This is arbitrary though and you could switch them around.
If have a separate test set, you can compute the distance to the instances in the training set with pdist2:
dist = pdist2(trainingSet, testSet, 'euclidean')
You can use this distance matrix to knn-classify your vectors as follows. I'll generate some random data to serve as example, which will result in low (around chance level) accuracy. But of course you should plug in your actual data and results will probably be better.
m = rand(nrOfVectors,nrOfFeatures); % random example data
classes = randi(nrOfClasses, 1, nrOfVectors); % random true classes
k = 3; % number of neighbors to consider, 3 is a common value
d = squareform(pdist(m, 'euclidean')); % distance matrix
[neighborvals, neighborindex] = sort(d,1); % get sorted distances
Take a look at the neighborvals and neighborindex matrices and see if they make sense to you. The first is a sorted version of the earlier d matrix, and the latter gives the corresponding instance numbers. Note that the self-distances (on the diagonal in d) have floated to the top. We're not interested in this (always zero), so we'll skip the top row in the next step.
assignedClasses = mode(neighborclasses(2:1+k,:),1);
So we assign the most common class among the k nearest neighbors!
You can compare the assigned classes with the actual classes to get an accuracy score:
accuracy = 100 * sum(classes == assignedClasses)/length(classes);
fprintf('KNN Classifier Accuracy: %.2f%%\n', 100*accuracy)
Or make a confusion matrix to see the distribution of classifications:
confusionmat(classes, assignedClasses)
yes, there is a function for knn : knnclassify
Play around with the number of neighbors you want to keep in order to get the best result (use a confusion matrix). This function takes care of the distance, of course.