I have a dataset of 43 examples (data points) and 70'000 features, that means my dataset matrix is (43 x 70'000). The labels contains 4 different values (1-4), i.e. there are 4 classes.
Now, I have done classification with a Deep Belief Network / Neural Network but I'm getting only accuracy of around 25% (chance level) with leave-one-out cross-validation. If I'm using kNN, SVM etc. I'm getting >80% accuracy.
I have used the DeepLearnToolbox for Matlab (https://github.com/rasmusbergpalm/DeepLearnToolbox) and just adapted the Deep Belief Network example from the readme of the toolbox. I have tried different number of hidden layers (1-3) and different number of hidden nodes (100, 500,...) as well as different learning rates, momentum etc but accuracy is still very bad. The feature vectors are scaled to the range [0,1] because this is needed by the toolbox.
In detail I have done the following code (only showing one run of cross-validation):
% Indices of training and test set
train = training(c,m);
test = ~train;
% Train DBN
opts = [];
dbn = [];
dbn.sizes = [500 500 500];
opts.numepochs = 50;
opts.batchsize = 1;
opts.momentum = 0.001;
opts.alpha = 0.15;
dbn = dbnsetup(dbn, feature_vectors_std(train,:), opts);
dbn = dbntrain(dbn, feature_vectors_std(train,:), opts);
%unfold dbn to nn
nn = dbnunfoldtonn(dbn, 4);
nn.activation_function = 'sigm';
nn.learningRate = 0.15;
nn.momentum = 0.001;
%train nn
opts.numepochs = 50;
opts.batchsize = 1;
train_labels = labels(train);
nClass = length(unique(train_labels));
L = zeros(length(train_labels),nClass);
for i = 1:nClass
L(train_labels == i,i) = 1;
end
nn = nntrain(nn, feature_vectors_std(train,:), L, opts);
class = nnpredict(nn, feature_vectors_std(test,:));
feature_vectors_std is the (43 x 70'000) matrix with values scaled to [0,1].
Can somebody infer why I'm getting such bad accuracy?
Because you have much more features than examples in the dataset. In other words: you have big number of weights, and you need to estimate all of them, but you can't, because NN with such huge structure cannot generalize well on so small dataset, you need more data to learn such big number of hidden weights (In fact NN may memorize your training set, but cannot infer it's "knowledge" to test set). At the same time 80% accuracy with such simple methods as SVM and kNN indicates that you can describe your data with much simpler rules, because for example SVM will have only 70k weights (Instead of 70kfirst_layer_size + first_layer_sizesecond_layer_size + ... in NN), kNN will not use weights at all.
Complex model is not silver bullet, the more complex model you trying to fit - the more data you need.
Obviousely your dataset is too small than the complexcity of your network. reference from there
The complexity of a neural network can be expressed through the number
of parameters. In the case of deep neural networks, this number can be
in the range of millions, tens of millions and in some cases even
hundreds of millions. Let’s call this number P. Since you want to be
sure of the model’s ability to generalize, a good rule of a thumb for
the number of data points is at least P*P.
While the KNN and SVM is simpler,they don't need that much of data.
So they can work better.
Related
I'm actually training an ANN on MATLAB to optimize a pump. I've got 2000 samples as input of the design of the pump and as an output the efficiency. I've got some good results, but now I want to retrain the model. I want to rearrange the weight per sample such that the samples with better efficiency have a higher weight than the small efficiency.
How can I weigh my samples by efficiency?
Here is a piece of my code:
Mdl_NN1 = fitnet([6 4],training);
Mdl_NN1.layers{2}.transferFcn = 'purelin';
Mdl_NN1.divideParam.trainRatio = 70/100;
Mdl_NN1.divideParam.valRatio = 15/100;
Mdl_NN1.divideParam.testRatio = 15/100;
Mdl_NN1.trainParam.showWindow = true;
[Mdl_NN1,TR] = train(Mdl_NN1,XtrainSet',YtrainSet(:,2)')
The Xtrainset is the design of the pump in 6 parameters and the YtrainSet is just the efficiency.
Its all about curating your data (this applies in particular for NNs). So if you want to weight certain samples more than others, duplicate them for training and remove others, which are wrong/false. This is better than fiddling with the weights or changing the portions of train/test/validation.
One word of warning: if you duplicate data, make sure that it only appears in the training set. You will get unreliable accuracy if you consider the same examples for training and testing. So you might need to set the training/testing/validation data explicitly. Have a look on divide data for optimal neural network training in the docs.
Mdl_NN1.divideFcn = 'divideind';
Mdl_NN1.divideParam.trainInd = % vector with indices
Mdl_NN1.divideParam.testInd = % vector with indices
Mdl_NN1.divideParam.valInd = % vector with indices
or may be sufficient to set if the data is ordered properly
Mdl_NN1.divideFcn = 'divideblock';
There are many examples that RBF SVM is better than Neural Network.
But Is there any 2D data with two classes that can be discovered with 100% accuracy by a Neural Network but not by an RBF SVM ?
From what I have learned, the data you use to generate predict model is called the training data, and the train-accuracy is how accurate the model is on this training data.
On the other hand, when future data comes (namely the testing data), the test-accuracy means how accurate the model, depending on the training data only, is on this testing data.
In your case, I suppose you mean the training-accuracy. Then no, RBF can always have 100% accuracy theoretically when we adjust sigma->0 and C be a sufficient large value.
You can try this on libsvm.
$ ./svm-train -c 100 -g 100 heart_scale # here g is 1/sigma^2 hence we want a large g
..*
optimization finished, #iter = 790
nu = 0.009851
obj = -132.989193, rho = 0.108246
nSV = 270, nBSV = 0
Total nSV = 270
$ ./svm-predict heart_scale heart_scale.model o
Accuracy = 100% (270/270) (classification)
I'm trying to use a neural network for a classification problem, but the result of the training produce very bad performance. The classification problem:
I have more than 300,000 training samples
Each input is a vector of 32 values (real values)
Each output is a vector of 32 values (0 or 1)
This is how I train the network:
DNN_SIZE = [1000, 1000];
% Initialize DNN
net = feedforwardnet(DNN_SIZE, 'traingda');
net.performParam.regularization = 0.2;
%Set activation functions
for i=1:length(DNN_SIZE)
net.layers{i}.transferFcn = 'poslin';
end
net.layers{end}.transferFcn = 'logsig';
net = train(net, train_inputs, train_outputs);
Note: I have tried different values for DNN_SIZE including larger and smaller values, for hidden layers and less, but it didn't make a difference.
Note 2: I have tried training the same network using a data set from Matlab's examples (simpleclass_dataset) and I still got bad performance.
The performance of the trained network is very bad- Its output is basically 0.5 in every output for every input vector (when the target outputs during training are always 0 or 1). What am I doing wrong, and how can I fix it?
Thanks.
I have two gaussian distribution samples, one guassian contains 10,000 samples and the other gaussian also contains 10,000 samples, I would like to train a feed-forward neural network with these samples but I dont know how many samples I have to take in order to get an optimal decision boundary.
Here is the code but I dont know exactly the solution and the output are weirds.
x1 = -49:1:50;
x2 = -49:1:50;
[X1, X2] = meshgrid(x1, x2);
Gaussian1 = mvnpdf([X1(:) X2(:)], mean1, var1);// for class A
Gaussian2 = mvnpdf([X1(:) X2(:)], mean2, var2);// for Class B
net = feedforwardnet(10);
G1 = reshape(Gaussian1, 10000,1);
G2 = reshape(Gaussian2, 10000,1);
input = [G1, G2];
output = [0, 1];
net = train(net, input, output);
When I ran the code it give me weird results.
If the code is not correct, can someone please suggest me so that I can get a decision boundary for these two distributions.
I'm pretty sure that the input must be the Gaussian distribution (and not the x coordinates). In fact the NN has to understand the relationship between the phenomenons themselves that you are interested (the Gaussian distributions) and the output labels, and not between the space in which are contained the phenomenons and the labels. Moreover, If you choose the x coordinates, the NN will try to understand some relationship between the latter and the output labels, but the x are something of potentially constant (i.e., the input data might be even all the same, because you can have very different Gaussian distribution in the same range of the x coordinates only varying the mean and the variance). Thus the NN will end up being confused, because the same input data might have more output labels (and you don't want that this thing happens!!!).
I hope I was helpful.
P.S.: for doubt's sake I have to tell you that the NN doesn't fit very well the data if you have a small training set. Moreover don't forget to validate your data model using the cross-validation technique (a good rule of thumb is to use a 20% of your training set for the cross-validation set and another 20% of the same training set for the test set and thus to use only the remaining 60% of your training set to train your model).
I am working on people detecting using two different features HOG and LBP. I used SVM to train the positive and negative samples. Here, I wanna ask how to improve the accuracy of SVM itself? Because, everytime I added up more positives and negatives sample, the accuracy is always decreasing. Currently my positive samples are 1500 and negative samples are 700.
%extract features
[fpos,fneg] = features(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
fprintf('Training SVM..\n');
%L = ones(length(SV),1);
T = cell2mat(HOG_featV(2,:));
HOGtP = HOG_featV(3,:)';
C = cell2mat(HOGtP); % each row of P correspond to a training example
%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);
LBP_featV = loadingV(LBPpos, LBPneg);
LBPlabel = cell2mat(LBP_featV(2,:));
LBPtP = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example
featureVector = [C M];
model = svmlearn(featureVector, T','-t 2 -g 0.3 -c 0.5');
Anyone knows how to find best C and Gamma value for improving SVM accuracy?
Thank you,
To find best C and Gamma value for improving SVM accuracy you typically perform cross-validation. In sum you can leave-one-out (1 sample) and test the VBM for that sample using the different parameters (2 parameters define a 2d grid). Typically you would test each decade of the parameters for a certain range. For example: C = [0.01, 0.1, 1, ..., 10^9]; G= [1^-5, 1^-4, ..., 1000]. This should also improve your SVM accuracy by optimizing the hyper-parameters.
By looking again to your question it seems you are using the svmlearn of the machine learning toolbox (statistics toolbox) of Matlab. Therefore you have already built-in functions for cross-validation. Give a look at: http://www.mathworks.co.uk/help/stats/support-vector-machines-svm.html
I followed ASantosRibeiro's method to optimize the parameters before and it works well.
In addition, you could try to add more negative samples until the proportion of the negative and positive reach 2:1. The reason is that when you implement real-time application, you should scan the whole image step by step and commonly the negative extracted samples would be much more than the people-contained samples.
Thus, add more negative training samples is a quite straightforward but effective way to improve overall accuracy(Both false positive and true negative).