Improve the accuracy performance on SVM - matlab

I am working on people detecting using two different features HOG and LBP. I used SVM to train the positive and negative samples. Here, I wanna ask how to improve the accuracy of SVM itself? Because, everytime I added up more positives and negatives sample, the accuracy is always decreasing. Currently my positive samples are 1500 and negative samples are 700.
%extract features
[fpos,fneg] = features(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
fprintf('Training SVM..\n');
%L = ones(length(SV),1);
T = cell2mat(HOG_featV(2,:));
HOGtP = HOG_featV(3,:)';
C = cell2mat(HOGtP); % each row of P correspond to a training example
%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);
LBP_featV = loadingV(LBPpos, LBPneg);
LBPlabel = cell2mat(LBP_featV(2,:));
LBPtP = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example
featureVector = [C M];
model = svmlearn(featureVector, T','-t 2 -g 0.3 -c 0.5');
Anyone knows how to find best C and Gamma value for improving SVM accuracy?
Thank you,

To find best C and Gamma value for improving SVM accuracy you typically perform cross-validation. In sum you can leave-one-out (1 sample) and test the VBM for that sample using the different parameters (2 parameters define a 2d grid). Typically you would test each decade of the parameters for a certain range. For example: C = [0.01, 0.1, 1, ..., 10^9]; G= [1^-5, 1^-4, ..., 1000]. This should also improve your SVM accuracy by optimizing the hyper-parameters.
By looking again to your question it seems you are using the svmlearn of the machine learning toolbox (statistics toolbox) of Matlab. Therefore you have already built-in functions for cross-validation. Give a look at: http://www.mathworks.co.uk/help/stats/support-vector-machines-svm.html

I followed ASantosRibeiro's method to optimize the parameters before and it works well.
In addition, you could try to add more negative samples until the proportion of the negative and positive reach 2:1. The reason is that when you implement real-time application, you should scan the whole image step by step and commonly the negative extracted samples would be much more than the people-contained samples.
Thus, add more negative training samples is a quite straightforward but effective way to improve overall accuracy(Both false positive and true negative).

Related

Neural network y=f(x) regression

Encouraged by some success in MNIST classification I wanted to solve a "real" problem with some neural networks.
The task seems quite easy:
We have:
some x-value (e.g. 1:1:100)
some y-values (e.g. x^2)
I want to train a network with 1 input (for 1 x-value) and one output (for 1 y-value). One hidden layer.
Here is my basic procedure:
Slicing my x-values into different batches (e.g. 10 elements per batch)
In each batch calculating the outputs of the net, then applying backpropagation, calculating weight and bias updates
After each batch averaging the calculated weight and bias updates and actually update the weights and biases
Repeating step 1. - 3. multiple times
This procedure worked fine for MNIST, but for the regression it totally fails.
I am wondering if I do something fundamentally wrong.
I tried different batchsizes, up to averaging over ALL x values.
Basically the network does not train well. After manually tweaking the weights and biases (with 2 hidden neurons) I could approximate my y=f(x) quite well, but when the network shall learn the parameters, it fails.
When I have just one element for x and one for y and I train the network, it trains well for this one specific pair.
Maybe somebody has a hint for me. Am I misunderstanding regression with neural networks?
So far I assume, the code itself is okay, as it worked for MNIST and it works for the "one x/y pair example". I rather think my overall approach (see above) may be not suitable for regression.
Thanks,
Jim
ps: I will post some code tomorrow...
Here comes the code (MATLAB). As I said, its one hidden layer, with two hidden neurons:
% init hyper-parameters
hidden_neurons=2;
input_neurons=1;
output_neurons=1;
learning_rate=0.5;
batchsize=50;
% load data
training_data=d(1:100)/100;
training_labels=v_start(1:100)/255;
% init weights
init_randomly=1;
if init_randomly
% initialize weights and bias with random numbers between -0.5 and +0.5
w1=rand(hidden_neurons,input_neurons)-0.5;
b1=rand(hidden_neurons,1)-0.5;
w2=rand(output_neurons,hidden_neurons)-0.5;
b2=rand(output_neurons,1)-0.5;
else
% initialize with manually determined values
w1=[10;-10];
b1=[-3;-0.5];
w2=[0.2 0.2];
b2=0;
end
for epochs =1:2000 % looping over some epochs
for i = 1:batchsize:length(training_data) % slice training data into batches
batch_data=training_data(i:min(i+batchsize,length(training_data))); % generating training batch
batch_labels=training_labels(i:min(i+batchsize,length(training_data))); % generating training label batch
% initialize weight updates for next batch
w2_update=0;
b2_update =0;
w1_update =0;
b1_update =0;
for k = 1: length(batch_data) % looping over one single batch
% extract trainig sample
x=batch_data(k); % extracting one single training sample
y=batch_labels(k); % extracting expected output of training sample
% forward pass
z1 = w1*x+b1; % sum of first layer
a1 = sigmoid(z1); % activation of first layer (sigmoid)
z2 = w2*a1+b2; % sum of second layer
a2=z2; %activation of second layer (linear)
% backward pass
delta_2=(a2-y); %calculating delta of second layer assuming quadratic cost; derivative of linear unit is equal to 1 for all x.
delta_1=(w2'*delta_2).* (a1.*(1-a1)); % calculating delta of first layer
% calculating the weight and bias updates averaging over one
% batch
w2_update = w2_update +(delta_2*a1') * (1/length(batch_data));
b2_update = b2_update + delta_2 * (1/length(batch_data));
w1_update = w1_update + (delta_1*x') * (1/length(batch_data));
b1_update = b1_update + delta_1 * (1/length(batch_data));
end
% actually updating the weights. Updated weights will be used in
% next batch
w2 = w2 - learning_rate * w2_update;
b2 = b2 - learning_rate * b2_update;
w1 = w1 - learning_rate * w1_update;
b1 = b1 - learning_rate * b1_update;
end
end
Here is the outcome with random initialization, showing the expected output, the output before training, and the output after training:
training with random init
One can argue that the blue line is already closer than the black one, in that sense the network has optimized the results already. But I am not satisfied.
Here is the result with my manually tweaked values:
training with pre-init
The black line is not bad for just two hidden neurons, but my expectation was rather, that such a black line would be the outcome of training starting with random init.
Any suggestions what I am doing wrong?
Thanks!
Ok, after some research I found some interesting points:
The function I tried to learn seems particularly hard to learn (not sure why)
With the same setup I tried to learn some 3rd degree polynomials which was successful (cost <1e-6)
Randomizing training samples seems to improve learning (for the polynomial and my initial function). I know this is well known in literature but I always skipped that part in implementation. So I learned for myself how important it is.
For learning "curvy/wiggly" functions, I found sigmoid works better than ReLu. (output layer is still "linear" as suggested for regression)
a learning rate of 0.1 worked fine for the curve fitting I finally wanted to perform
A larger batchsize would smoothen the cost vs. epochs plot (surprise...)
Initializing weigths between -5 and +5 worked better than -0.5 and 0.5 for my application
In the end I got quite convincing results for what I intendet to learn with the network :)
Have you tried with a much smaller learning rate? Generally, learning rates of 0.001 are a good starting point, 0.5 is in most cases way too large.
Also note that your predefined weights are in an extremely flat region of the sigmoid function (sigmoid(10) = 1, sigmoid(-10) = 0), with the derivative at both positions close to 0. That means that backpropagating from such a position (or getting to such a position) is extremely difficult; For exactly that reason, some people prefer to use ReLUs instead of sigmoid, since it has only a "dead" region for negative activations.
Also, am I correct in seeing that you only have 100 training samples? You could maybe try a smaller batch size, or increase the number of samples you take. Also don't forget to shuffle your samples after each epoch. Reasons are given plenty, for example here.

Why is accuracy so much lower when using fitcecoc() compared to trainImage​CategoryCl​assifier()​?

I am trying to use bag of words and fitcecoc() (multiclass SVM) to reproduce similar results to those obtained by using Image Category classifier
as seen in the documentation.
% Code from documentation
bag = bagOfFeatures(trainingSet); % create bag of features from trainingSet (an image datastore)
categoryClassifier = trainImageCategoryClassifier(trainingSet, bag);
confMatrix = evaluate(categoryClassifier, validationSet);
This returns accuracy of ~98% on the validation set.
However when I pass the histogram of visual word occurrences into the multiclass SVM classifier it has ~2.5% accuracy.
SVM_SURF = fitcecoc(trainFeatures,trainingSet.Labels);
bag = bagOfFeatures(validationSet);
featureMatrix = encode(bag, validationSet); % histogram of visual word occurrences
[pred score cost] = predict(SVM_SURF, featureMatrix)
accuracy = sum(validationSet.Labels == pred)/size(validationSet.Labels,1);
accuracy
Is there an obvious reason as to why the accuracy is so much lower when bag of words is passed into fitcecoc() rather than trainImageCategoryClassifier()?
The fitcecoc classifier is a multi-purpose (image, financial data, ...`) classifier. By configuring the kernel you can get better accuracy rates. However, traditionally, the fitcecoc function gives much better results if you increase the training data.

Feedforward neural network classification in Matlab

I have two gaussian distribution samples, one guassian contains 10,000 samples and the other gaussian also contains 10,000 samples, I would like to train a feed-forward neural network with these samples but I dont know how many samples I have to take in order to get an optimal decision boundary.
Here is the code but I dont know exactly the solution and the output are weirds.
x1 = -49:1:50;
x2 = -49:1:50;
[X1, X2] = meshgrid(x1, x2);
Gaussian1 = mvnpdf([X1(:) X2(:)], mean1, var1);// for class A
Gaussian2 = mvnpdf([X1(:) X2(:)], mean2, var2);// for Class B
net = feedforwardnet(10);
G1 = reshape(Gaussian1, 10000,1);
G2 = reshape(Gaussian2, 10000,1);
input = [G1, G2];
output = [0, 1];
net = train(net, input, output);
When I ran the code it give me weird results.
If the code is not correct, can someone please suggest me so that I can get a decision boundary for these two distributions.
I'm pretty sure that the input must be the Gaussian distribution (and not the x coordinates). In fact the NN has to understand the relationship between the phenomenons themselves that you are interested (the Gaussian distributions) and the output labels, and not between the space in which are contained the phenomenons and the labels. Moreover, If you choose the x coordinates, the NN will try to understand some relationship between the latter and the output labels, but the x are something of potentially constant (i.e., the input data might be even all the same, because you can have very different Gaussian distribution in the same range of the x coordinates only varying the mean and the variance). Thus the NN will end up being confused, because the same input data might have more output labels (and you don't want that this thing happens!!!).
I hope I was helpful.
P.S.: for doubt's sake I have to tell you that the NN doesn't fit very well the data if you have a small training set. Moreover don't forget to validate your data model using the cross-validation technique (a good rule of thumb is to use a 20% of your training set for the cross-validation set and another 20% of the same training set for the test set and thus to use only the remaining 60% of your training set to train your model).

KNN Classifier for simple digit recognition

Actually , i have an assignment where it is required to recognize individual decimal digits as a part of the text recognition process.I am already given a set of JPEG formatted images of some digits. Each image is of size 160 x 160 pixels.After checking some resources here i managed to write this code but :
1)I am not sure if reading the images and resizing them in matrices for holding them is right or not.
2)Supposing that i have 30 train data images for numbers [0-9] each number has three images and i have 10 images for test each image is of only one digit.How to calculate distance between every test and train in a loop ? Because in my part of code for calculating Euclidean it gives an output zero.
3)How to calculate accuracy by using confusion matrix ?
% number of train data
Train = 30;
%number of test data
Test =10;
% to store my images
tData = uint8(zeros(160,160,30));
tTest = uint8(zeros(160,160,10));
for k=1:Test
s1='im-';
s2=num2str(k);
t = strcat('testy/im-',num2str(k),'.jpg');
im=rgb2gray(imread(t));
I=imresize(im,[160 160]);
tTest(:,:,k)=I;
%case testing if it belongs to zero
for l=1:3
ss1='zero-';
ss2=num2str(l);
t1 = strcat('data/zero-',num2str(l),'.jpg');
im1=rgb2gray(imread(t1));
I1=imresize(im1,[160 160]);
tData(:,:,l)=I1;
% Euclidean distance
distance= sqrt(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2));
[d,index] = sort(distance);
%k=3
% index_close(l) = index(l:3);
%x_close = I(index_close,:);
end
end
First of all i think 10 test data is not enough.
Just use the below function, data_test is your training data() and data_label is their labels. re size your images to smaller sizes!
I think the default distance measure is Euclidean distance but you can choose other ways such as City-block method for example.
Class = knnclassify(data_test, data_train, lab_train, 11);
fprintf('11-NN Accuracy: %f\n', sum(Class == lab_test')/length(lab_test));
Class = knnclassify(data_test, data_train, lab_train, 1, 'cityblock');
fprintf('1-NN Accuracy (cityblock): %f\n', sum(Class == lab_test')/length(lab_test));
Ok now you have the overall accuracy but this is not a good measure, it's better to calculate the accuracy separately for each class and then calculate their mean.
you can calculate a specific class (id) accuracy like this,,
idLocations = (lab_test == id);
NumberOfId = sum(idLocations);
NumberOfCurrect =sum (lab_test (idLocations) == Class(idLocations));
NumberOfCurrect/NumberOfId %Class id accuracy
as your questions are:
1) image re-sizing does affects the accuracy of the whole process.
Ans: As you mentioned in your question your images are already of the size 160 by 160, imresize will not affect it, but if your image is too small in size say 60*60 it will perform interpolation to increase the spatial dimensions of the image, which may affects structure and shape of the digit, to tackle these kind of variability, your training data should have much more samples(at least 50 samples per class), and some pre-processing should be apply on data like de-skewing of the digit image.
2) euclidean distance is good measure but not the best to deal with these kind of problems, as its distribution is a spherical distribution it may give same distance for to different digits. if you are working in MATLAB beware of of variable casting, you are taking difference so both the variable should be double in nature. it may be one of the reason of wrong distance calculation. in this line distance= sqrt(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2)); you are summing matrices column wise so output of this will be a row vector(1 X 160) which have sum along each corner. i think it should be like this:distance= sqrt(sum(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2))); i have just added one more sum there for getting sum of differences for whole matrix try it whether it helps or not.
3) For checking accuracy of your classifier precisely you have to have a large training dataset,by the way, Confusion matrix created during the process of cross-validation, where you split your training data into training samples and testing samples, so you know output classes in both the sample, now perform classification process, prepare a matrix for num_classe X num_classes(in your case 10 X 10), where rows resembles actual classes and columns belongs to prediction. take a sample from test and predict output class, suppose your classifier predict 5 and sample's actual class is also 5 put +1 in the confusion_matrix(5,5); if your classifier have predicted it as 3, you should do +1 at confusion_matrix(5,3). finally add diagonal elements of the confusion_mat and divide it by the total number of the test samples. output will be accuracy of your classifier.
P.S. Try to have atleast 50 samples per class and during cross-validation divide the training data 85:10 ratio where 90% sample should be used for training and rest 10 % should be used for testing the classifier.
Hope it have helps you.
feel free to share your thoughts.
Thank You

How to use SVM in Matlab?

I am new to Matlab. Is there any sample code for classifying some data (with 41 features) with a SVM and then visualize the result? I want to classify a data set (which has five classes) using the SVM method.
I read the "A Practical Guide to Support Vector Classication" article and I saw some examples. My dataset is kdd99. I wrote the following code:
%% Load Data
[data,colNames] = xlsread('TarainingDataset.xls');
groups = ismember(colNames(:,42),'normal.');
TrainInputs = data;
TrainTargets = groups;
%% Design SVM
C = 100;
svmstruct = svmtrain(TrainInputs,TrainTargets,...
'boxconstraint',C,...
'kernel_function','rbf',...
'rbf_sigma',0.5,...
'showplot','false');
%% Test SVM
[dataTset,colNamesTest] = xlsread('TestDataset.xls');
TestInputs = dataTset;
groups = ismember(colNamesTest(:,42),'normal.');
TestOutputs = svmclassify(svmstruct,TestInputs,'showplot','false');
but I don't know that how to get accuracy or mse of my classification, and I use showplot in my svmclassify but when is true, I get this warning:
The display option can only plot 2D training data
Could anyone please help me?
I recommend you to use another SVM toolbox,libsvm. The link is as follow:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
After adding it to the path of matlab, you can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically.
train_feature is n*L matrix for n samples. You'd better preprocess the feature before using it. In the test part, they should be preprocess in the same way.
The accuracy you want will be showed when test is finished, but it's only for the whole dataset.
If you need the accuracy for positive and negative samples separately, you still should calculate by yourself using the label predicted.
Hope this will help you!
Your feature space has 41 dimensions, plotting more that 3 dimensions is impossible.
In order to better understand your data and the way SVM works is to begin with a linear SVM. This tybe of SVM is interpretable, which means that each of your 41 features has a weight (or 'importance') associated with it after training. You can then use plot3() with your data on 3 of the 'best' features from the linear svm. Note how well your data is separated with those features and choose a basis function and other parameters accordingly.