How to enhance the accuracy of knn classifier? - matlab

my homework is to make a code in Matlab to calculate the accuracy of the knn classifier if my data as the following Training data
Data length: 6 seconds, 3 channels, 768 samples / trial, 140 tests, fs = 128 Hz Test data: 3 channels, 1152 samples / trial, 140 experiments.
I have written part of the code, but then I do not know where to use the cross-validation and the accuracy was very low 65%..
clear all
close all
clc
load('LabelTest.mat');
load('LabelTrain.mat');
load('TestData.mat');
load('TrainData.mat');
LabelTest=LabelTest;
LabelTrain=LabelTrain;
TestData=TestData;
TrainData=TrainData;
%Extracting feature from the training set
ndx1=find(LabelTrain==1);
ndx2=find(LabelTrain==2);
TrainClass1=TrainData(:,:,ndx1);
TrainClass2=TrainData(:,:,ndx2);
K1=1;
K2=2;
for i=1:size(TrainClass1,3)
FVclass1(i,:)=[kurtosis(TrainClass1(:,K1,i)) std(TrainClass1(:,K1,i)) sum(TrainClass1(:,K2,i))];
FVclass2(i,:)=[kurtosis(TrainClass2(:,K1,i)) std(TrainClass2(:,K1,i)) sum(TrainClass2(:,K2,i))];
end
FVTrain=[FVclass1;FVclass2];
%Test data feature extraction
for j=1:size(TestData,3)
FVTest(j,:)=[kurtosis(TestData(:,K1,j)) mean(TestData(:,K1,j)) sum(TestData(:,K2,j))];
end
TR_Label=[ones(1,size(TrainClass1,3)) 2*ones(1,size(TrainClass2,3))];
for k=35:-1:1
PredictedClass=knnclassify(FVTest,FVTrain,TR_Label,k); %classification predic
PERF=classperf(LabelTest,PredictedClass);
SD (k)=PERF.CorrectRate ; %Test the accuracy
end
figure
plot(1:35,SD);

Related

How to implement KNN using Matlab and calculate the percentage accuracy

I'm new using matlab, my goal is to implement knn, I have two differents txt files, one contains test data(sample) and the other one contains training data.
So far I think I should do something like this, but I'm not sure how to do it.
load fisheriris
x = meas(:,3:4);
gscatter(x(:,1),x(:,2),species)
newpoint = [5 1.45];
[n,d] = knnsearch(x,newpoint,'k',10);
line(x(n,1),x(n,2),'color',[.5 .5 .5],'marker','o','linestyle','none','markersize',10)
Or maybe this is a more simple way to do it, to me that's very clear the two different sets of data, sample and training, but this doesn't show the accuracy of the predicted class.
A= [50, 60;
7,2;
13,12;
100,200;];
B=[1,0;
200,30;
19,10];
G={'First Row';
'Second Row';
'Third Row'};
class = knnclassify(A,B,G);
disp('Result: ');
disp(class);
the matrix looks like this:
Training data:
148.0,50.0,0
187.0,34.0,0
204.0,89.0,0
430.0,161.0,1
427.0,22.0,1
-42.0,469.0,1
more,more,class....
Test data:
290.0,-57.0,0
194.0,-80.0,0
174.0,33.0,0
465.0,691.0,1
270.0,-194.0,1
-56.0,665.0,1
more,more,class....
How can I classify this data using knn and show the predictions for each row so I can calculate the accuracy percentage?
-------EDITED------
I forgot, if I need the accuracy for each class, what should I do?
Here is the updated code using knnclassify
trainData= [148.0,50.0,0; ...
187.0,34.0,0; ...
204.0,89.0,0; ...
430.0,161.0,1; ...
427.0,22.0,1; ...
-42.0,469.0,1 ...
];
testData= [290.0,-57.0,0; ...
194.0,-80.0,0; ...
174.0,33.0,0; ...
465.0,691.0,1; ...
270.0,-194.0,1; ...
-56.0,665.0,1];
% Data
Sample=testData(:,1:2);
Training=trainData(:,1:2);
Group=trainData(:,3);
% Classify
k=1; % number of nearest neighbors used in the classification
Class = knnclassify(Sample, Training, Group,k);
% Display Prediction
fprintf('%.1f %.1f - Real %d , Predicted %d\n',[testData.'; Class.']);
% Calculate percentage accuracy for each class
trueClass=testData(:,3);
classList=unique(trueClass);
for classIndex=1:length(classList)
indexesOfEachClass=find(trueClass==classList(classIndex));
percentageAccuracyEachClass(classIndex,1)=sum(Class(indexesOfEachClass)==trueClass(indexesOfEachClass))/length(indexesOfEachClass)*100;
end
fprintf('\nClass %d Accuracy : %f%%',[classList.'; percentageAccuracyEachClass.']);
% Calculate overall percentage accuracy
dataClassifiedAccurately=Class==trueClass;
percentageAccuracy=sum(dataClassifiedAccurately)/length(dataClassifiedAccurately)*100;
fprintf('\n\nOverall Accuracy : %f%%\n',percentageAccuracy);

How to make sound signal length the same in MATLAB?

I found this speech recognition code that I downloaded from a blog. It works fine, it asks to record sounds to create a dataset and then you have to call a function to train the system using neural networks.
I want to use this code to train using my dataset of 20 words that I want to recognise.
Problem:
I have a dataset of 800 files for twenty words i.e. 40 recordings from different people for each word. I used Windows sound recorder to collect the files.
The problem is that in the code is that the size of the input file is set to ALWAYS be 8000, my dataset on the other hand is not constant, some files are 2 seconds long, some are 3 that means there'll be different number of samples in each file.
If the samples per input signal variate it'll probably generate errors.
I want to use my files to train the system.
How do I do that?
Code:
clc;clear all;
load('voicetrainfinal.mat');
Fs=8000;
for l=1:20
clear y1 y2 y3;
display('record voice');
pause();
x=wavrecord(Fs,Fs); % wavrecord(n,Fs) records n samples at a sampling rate of Fs
maxval = max(x);
if maxval<0.04
display('Threshold value is too large!');
end
t=0.04;
j=1;
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
y2=y1/(max(abs(y1)));
y3=[y2,zeros(1,3120-length(y2))];
y=filter([1 -0.9],1,y3');%high pass filter to boost the high frequency components
%%frame blocking
blocklen=240;%30ms block
overlap=80;
block(1,:)=y(1:240);
for i=1:18
block(i+1,:)=y(i*160:(i*160+blocklen-1));
end
w=hamming(blocklen);
for i=1:19
a=xcorr((block(i,:).*w'),12);%finding auto correlation from lag -12 to 12
for j=1:12
auto(j,:)=fliplr(a(j+1:j+12));%forming autocorrelation matrix from lag 0 to 11
end
z=fliplr(a(1:12));%forming a column matrix of autocorrelations for lags 1 to 12
alpha=pinv(auto)*z';
lpc(:,i)=alpha;
end
wavplay(x,Fs);
X1=reshape(lpc,1,228);
a1=sigmoid(Theta1*[1;X1']);
h=sigmoid(Theta2*[1;a1]);
m=max(h);
p1=find(h==m);
if(p1==10)
P=0
else
P=p1
end
end
In your code you have:
Fs=8000;
wavrecord(n,Fs) % records n samples at a sampling rate Fs
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
It seems that instead of recording you are going to import your sound file (here for a .wave file):
[y, Fs] = wavread(filename);
Instead of hardcoding the 8000value you can read the length of your file:
n = length(y);
and then just use that n variable in the for loop:
for i=1:n
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
The rest of the code seems to be independent of that 8000 value.
If you are worried that having non-constant file length. Compute n_max, the maximum length of all the audio recordings you have. And for recording shorter than n_max samples pad them with zeros so as to make them all n_max long.
n_max = 0;
for file = ["file1" "file2" ... "filen"]
[y, Fs] = wavread(filename);
n_max = max(n_max,length(y));
end
Then each time you process a sound vector you can pad it with 0 (harmless for you, because 0 means no sound) like so:
y = [y, zeros(1, n_max - length(y))];
n=noOfFiles
for k=1:n
M(k,1:length(filedata{k})) = filedata{k}
end
:P

How to classification Hyperspectral data set with LibSVM and training SVM with .mat file?

I am trying to do classification Hyperspectral dataset using LibSVM.
I have two set of data:
one .mat file containing 145*145 pixel in 200 bands.
one .mat file containing 145*145 pixel for 16 classes label(value
from 1 to 16).(background value is 0)
more information in this link: Indain_pines_dataset
My question is: How to sampling specific number or percent pixel of classes for training and testing LibSVM (training_label_vector and testing label vector) for 16 class.
My goal is multiclass classification of This data.
Please help..
I'll give you the general idea :
First select your samples on the ground truth ( 16 classes image) :
Idx = cell (16,1) ;
Sample = zeros(no_samples) ;
for k = 1 : 16
[~,Idx{k}] = find(HSI == k-1) ;
[~,Sample(k)] = datasample(Idx{k},no_samples) ;
end
Idx is a cell with all the indices of the pixels of each class. The Kth column of Sample contains the members of Idx that were randomly sampled for training purposes.
Now you want to recover the spectral signatures for the training algorithm.
Hr = reshape(HSI,145*145,200) ; to get a 2D array out of the HSI.
class = cell(16,1) ; will contain the training samples for each class.
for k = 1 : 16
class{k} = Hr(Idx{k}(Sample(:),1),:) ;
end
Now class{1} contains no_samples spectral signatures. I guess that on indian pines 10 samples is plenty. Keep in mind that some clusters are very small (in no of pixels) on this dataset. Good Luck !

Error Backpropagation - Neural network

I am trying to write a code for error back-propagation for neural network but my code is taking really long time to execute. I know that training of Neural network takes long time but it is taking long time for a single iteration as well.
Multi-class classification problem!
Total number of training set = 19978
Number of inputs = 513
Number of hidden units = 345
Number of classes = 10
Below is my entire code:
X=horzcat(ones(19978,1),inputMatrix); %Adding bias
M=floor(0.66*(513+10)); %Taking two-third of imput+output
Wji=rand(513,M);
aj=X*Wji;
zj=tanh(aj); %Hidden Layer output
Wkj=rand(M,10);
ak=zj*Wkj;
akTranspose = ak';
ykTranspose=softmax(akTranspose); %For multi-class classification
yk=ykTranspose'; %Final output
error=0;
%Initializing target variables
t = zeros(19978,10);
t(1:2000,1)=1;
t(2001:4000,2)=1;
t(4001:6000,3)=1;
t(6001:8000,4)=1;
t(8001:10000,5)=1;
t(10001:12000,6)=1;
t(12001:14000,7)=1;
t(14001:16000,8)=1;
t(16001:18000,9)=1;
t(18001:19778,10)=1;
errorArray=zeros(100000,1); %Stroing error values to keep track of error iteration
errorDiff=zeros(100000,1);
for nIterations=1:5
errorOld=error;
aj=X*Wji; %Forward propagating in each iteration
zj=tanh(aj);
ak=zj*Wkj;
akTranspose = ak';
ykTranspose=softmax(akTranspose);
yk=ykTranspose';
error=0;
%Calculating error
for n=1:19978 %for 19978 training samples
for k=1:10 %for 10 classes
error = error + t(n,k)*log(yk(n,k)); %using cross entropy function
end
end
error=-error;
Ediff = error-errorOld;
errorArray(nIterations,1)=error;
errorDiff(nIterations,1)=Ediff;
%Calculating dervative of error wrt weights wji
derEWji=zeros(513,345);
derEWkj=zeros(345,10);
for i=1:513
for j=1:M;
derErrorTemp=0;
for k=1:10
for n=1:19978
derErrorTemp=derErrorTemp+Wkj(j,k)*(yk(n,k)-t(n,k));
Calculating derivative of E wrt Wkj%
derEWkj(j,k) = derEWkj(j,k)+(yk(n,k)-t(n,k))*zj(n,j);
end
end
for n=1:19978
Calculating derivative of E wrt Wji
derEWji(i,j) = derEWji(i,j)+(1-(zj(n,j)*zj(n,j)))*derErrorTemp;
end
end
end
eta = 0.0001; %learning rate
Wji = Wji - eta.*derEWji; %updating weights
Wkj = Wkj - eta.*derEWkj;
end
for-loop is very time-consuming in Matlab even with the help of JIT. Try to modify your code by vectorize them rather than organizing them in a 3-loop or even 4-loop. For example,
for n=1:19978 %for 19978 training samples
for k=1:10 %for 10 classes
error = error + t(n,k)*log(yk(n,k)); %using cross entropy function
end
end
can be changed to:
error = sum(sum(t.*yk)); % t and yk are both n*k arrays that you construct
You may try to do similar jobs for the rest of your code. Use dot product or multiplication operations on arrays for different cases.

using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.
What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.
This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.
How can i remove the dependency between number of training images and features, so i can test with any number of images?
I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].
You seem to already have figured out the problem... According to the README file included in the MATLAB package:
To use precomputed kernel, you must include sample serial number as
the first column of the training and testing data.
Let me illustrate with an example:
%# read dataset
[dataClass, data] = libsvmread('./heart_scale');
%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);
%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = #(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];
%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);
%# confusion matrix
C = confusionmat(testClass,predClass)
The output:
*
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)
C =
65 5
12 38