I have tried to train a neural network in matlab,first of all I have build the ANN as follow
net = feedforwardnet([30 20 20 ]);
[net ,tr] = train(net , XTRAIN , temp);
which produce an ANN with the following architecture:
then I test my neural network as follow
outputsOfTest = sim(net , XTEST);
the outputsOfTest is a vector represent the output of neural network testing, usually some the elements ofoutputsOfTest are negative values , for example the outputsOfTest will be something like this [-.34 1.17 .17].
So How to interpret this output? what are negative values indicate to? which class the testing data will belong based on this output?
Should I take the greatest value as an indicator to the class that testing data will belong to?
for example if I have the output vector [-2 .5 1] , which is the greatest value is 1, So the class that testing data belong to is class 3
Should I take the greatest value in magnitude (taking the absolute value) ? for example if I have the output vector [-2 .5 1] , which is the greatest value in it's magnitude is the first element, So the class that testing data belong to is class 1.
Note: sometimes the sum of the elements ofoutputsOfTest exceed one, the sum of the elements may reach 2.5, does this normal?
Your output layer seems to have a linear activation function. Therefor your output vectors components have values that are not restricted to be between 0 and 1. For classification you should use a softmax activation function:
(Source)
The use of softmax results in vector components which have values between 0 and 1 and which sum to 1 for each vector. So basically you get a probability distribution over your classes. The Matlab help has an image showing the effects (left input, right after softmax):
There's more information about it in the UFDL Tutorial.
From what I could find, the following code change might work in Matlab:
net = feedforwardnet([30 20 20]);
net.layers{4}.transferFcn='softmax';
[net ,tr] = train(net , XTRAIN , temp);
Related
I’m trying to use a Neural Network for purposes of binary classification. It consist of three layers. The first layer has three input neurons, the hidden layer has two neurons, and the output layer has three neurons that output a binary value of 1 or 0. Actually the output is usually a floating point number, but it typically rounds up to a whole number.
If the network only outputs vectors of 3, then shouldn't my input vectors be the same size? Otherwise, for classification, how else do you map the output to the input?
I wrote the neural network in Excel using VBA based on the following article: https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
So far it works exactly as described in the article. I don’t have access to a machine learning library at the moment so I’ve chosen to give this a try.
For example:
If the output of the network is [n, n ,n], does that mean that my input data has to be [n, n, n] also?
From what I read in here: Neural net input/output
It seems that's the way it should be. I'm not entirely sure though.
To speak simple,
for regression task, your output usually has the dimension [1] (if you predict single value).
For the classification task, your output should have the same number of dimensions equal to the number of classes you have (outputs are probabilities, the sum of them = 1).
So, there is no need to have equal dimensions of input and output. NN is just a projection of one dimension to another.
For example,
regression, we predict house prices: input is [1, 10] (to features of the property), the output is [1] - price
classification, we predict class (will be sold or not): input is [1, 11] (same features + listed price), output is [1, 2] (probability of class 0 (will be not sold) and 1 (will be sold); for example, [1; 0], [0; 1] or [0.5; 0.5] and so on; it is binary classification)
Additionally, equality of input-output dimensions exists in more specific tasks, for example, autoencoder models (when you need to present your data in other dimension and then represent it back, to the original dimension).
Again, the output dimension is the size of outputs for 1 batch. Only one, not of the whole dataset.
Even with a simple classifier like the nearest neighbour I cannot seem to judge its accuracy and thus cannot improve it.
For example with the code below:
IDX = knnsearch(train_image_feats, test_image_feats);
predicted_categories = cell([size(test_image_feats, 1), 1]);
for i=1:size(IDX,1)
predicted_categories{i}=train_labels(IDX(i));
end
Here train_image_feats is a 300 by 256 matrix where each row represents an image. Same is the structure of test_image_feats. train_labels is the label corresponding to each row of the training matrix.
The book I am following simply said that the above method achieves an accuracy of 19%.
How did the author come to this conclusion? Is there any way to judge the accuracy of my results be it with this classifier or other?
The author then uses another method of feature extraction and says it improved accuracy by 30%.
How can I find the accuracy? Be it graphically or just via a simple percentage.
Accuracy when doing machine learning and classification is usually calculated by comparing your predicted outputs from your classifier in comparison to the ground truth. When you're evaluating the classification accuracy of your classifier, you will have already created a predictive model using a training set with known inputs and outputs. At this point, you will have a test set with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this the ground truth data set. This ground truth data set helps assess the accuracy of your classifier when you are providing inputs to this classifier that it has not seen before. You take your inputs from your test set, and run them through your classifier. You get outputs for each input and we call the collection of these outputs the predicted values.
For each predicted value, you compare to the associated ground truth value and see if it is the same. You add up all of the instances where the outputs match up between the predicted and the ground truth. Adding all of these values up, and dividing by the total number of points in your test set yields the fraction of instances where your model accurately predicted the result in comparison to the ground truth.
In MATLAB, this is really simple to calculate. Supposing that your categories for your model were enumerated from 1 to N where N is the total number of labels you are classifying with. Let groundTruth be your vector of labels that denote the ground truth while predictedLabels denote your labels that are generated from your classifier. The accuracy is simply calculated by:
accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;
The first line of code calculates what the accuracy of your model is as a fraction. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or when you want to assess accuracy. One is just normalized between [0,1] while the other is a percentage from 0% to 100%. What groundTruth == predictedLabels does is that it compares each element between groundTruth and predictedLabels. If the ith value in groundTruth matches with the ith value in predictedLabels, we output a 1. If not, we output a 0. This will be a vector of 0s and 1s and so we simply sum up all of the values that are 1, which is eloquently encapsulated in the sum operation. We then divide by the total number of points in our test set to obtain the final accuracy of the classifier.
With a toy example, supposing I had 4 labels, and my groundTruth and predictedLabels vectors were this:
groundTruth = [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];
The accuracy using the above vectors gives us:
>> accuracy
accuracy =
0.4000
>> accuracyPercentage
accuracyPercentage =
40
This means that we have a 40% accuracy or an accuracy of 0.40. Using this example, the predictive model was only able to accurately classify 40% of the test set when you put each test set input through the classifier. This makes sense, because between our predicted outputs and ground truth, only 40%, or 6 outputs match up. These are the 1st, 2nd, 6th, 7th, 10th and 15th elements. There are other metrics to calculating accuracy, like ROC curves, but when calculating accuracy in machine learning, this is what is usually done.
I have trained a multi class svm classifier with 5 classes, i.e. svm(1)...svm(5).
I then used 5 images not used to during the training of these classifiers for testing.
These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class.
predict = svmclassify(svm(i_t),test_features);
The predict produces a 5 by 1 vector showing the result.
-1
1
1
1
-1
I sum these and then insert it into a diagonal matrix.
Ideally it should be a diagonal matrix with 5 written diagonally when all images are correctly classified. But the result is very poor. I mean in some cases I am getting negative result. I just want to verify if this poor result is because my confusion matrix is not accurate or if I should use some other feature extractor.
Here is the code I wrote
svm_table = [];
for i_t = 1:numel(svm)
test_folder = [Path_training folders(i_t).name '\']; %select writer
feature_count = 1; %Initialize count for feature vector accumulation
for j_t = 6:10 %these 5 images that were not used for training
[img,map] = imread([test_folder imlist(j_t).name]);
test_img = imresize(img, [100 100]);
test_img = imcomplement(test_img);
%Features extracted here for each image.
%The feature vector for each image is a 1 x 16 vector.
test_features(feature_count,:) = Features_extracted;
%The feature vectors are accumulated in a single matrix. Each row is an image
feature_count = feature_count + 1; % increment the count
end
test_features(isnan(test_features)) = 0; %locate Nan and replace with 0
%I was getting NaN in some images, which was causing problems with svm, so just replaced with 0
predict = svmclassify(svm(i_t),test_features); %produce column vector of preicts
svm_table(end+1,end+1) = sum(predict); %sum them and add to matrix diagonally
end
this is what I am getting. Looks like a confusion matrix but is very poor result.
-1 0 0 0 0
0 -1 0 0 0
0 0 3 0 0
0 0 0 1 0
0 0 0 0 1
So I just want to know what is at fault here. My implementation of confusion matrix. My way of testing the svm or my selection of features.
I would like to add some issues:
You mention that: << These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class. >>
You are never supposed to know the class (category) of test images. Of course, you need to know the test category labels for calculating various metrics such as accuracy, precision, confusion matrix etc. Apart from that, when you are using SVM to determine which class the example belongs to, you have to try all the SVMs.
There are two popular ways of training and testing multi-class SVMs, namely one-vs-all and one-vs-one approach. Read this answer and its corresponding question to understand them in detail.
I don't know if MATLAB SVM is capable of doing multiclass classification, but if you use LIBSVM then its uses one-vs-one approach. It will also do the testing for you correctly. However, if you want to design your own one-vs-one classifier, this is how you should proceed:
Say you have 5 classes, then train all possible combinations of pairs = 5c2 = 10 pairs ({1,2}, ..., {1,5},{2,1},...,{2,5},...,{5,4}). While testing, you have to apply all the 10 models and count all the votes to decide the final result. For example, we train models for 4 pairs (say), ({1 vs 2}, {1 vs 3}, {2 vs 1}, {2 vs 3}) and the outputs of 4 models are {1,1,0,1} respectively. That means, your 4 predicted classes are {1,1,1,2}. Therefore, the final class is 1.
Once you get all the predicted labels, then you can actually use the command confusionmat to get the confusion matrix. If you want to make your own, then make a 5x5 matrix of zeros. Add a 1 to the position (actual label, predicted label) i.e. if the actual class was 2 and you predicted it as 3, then add 1 at the position (2nd row, 3rd col) in the matrix.
Several issues that I can see...
1) What you're using is not really a multi class SVM. Your taking several different SVM models and applying them to the same test data (not really the same thing). You need to look at the documentation for svmtrain. When you use it you give it two kinds of data, the training data (parameter vectors for each training image) and the Group data (vector of classes for the images associated with the vectors..). What you get will be one SVM model which will decide between 1 of the options. (I usually use libsvm, so Im not that familiar with Matlabs SVM implementation, but that should be the gist of it)
2) Your confusion matrix is derived incorrectly (see: http://en.wikipedia.org/wiki/Confusion_matrix). Start by making a 5x5 zeros matrix to hold the confusion matrix. Loop through each of your test images and let the SVM model classify the image (it should pick 1 of the five possibilities). Add 1 at the proper position of the confusion matrix. So if the image should classify as a 3 and the SVM classifies it as a 4 you should add 1 to the 3,4 position...
In my project I have two vectors of [200x1] that should be used in Neural Network and trained in a way that it gives me the subject's fingerprint gender.
I think I should provide a target vector based on the subject's gender from the ground truth data (like 1 for female and 0 for male) But I am not sure if this consideration is correct.
Any idea for this?
it sounds right. actually you are training a classification to classify two groups male and female.
Your input is two vectors each has size 200x1 and the output should have 2 nodes. y=1[male], y=0[female].
So, just merge two vectors into one vector and put it as input. You just need to know for each element in the input vector the corresponding output value (i.e. either 1 or 0). So using indexing you can shuffle the two vectors into one vector and give it to the NN as the input vector (e.g. input size = 400x1) and get two vectors as output.
So you can also use the same input output set to try with the SVM algorithm alongside to test the result. It is provided in MATLAB and the usage is pretty easy.
example:
v1 = rand(1,200); % I am setting these values as random, but you should use the values you know
v2 = rand(1,200);
input = [v1 v2]';
y = randi(2,400,1)-1; % make a random vector including 0 and 1 for female and male (you should use your data)
net = newff(...) % or whatever NN you are using
net = train(net,input,y) % train the NN using the data
I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct).
I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm having some trouble running this in MATLAB.
My approach is as follows: I have one column vector X that contains the values of the continuous variable, and another equally-sized column vector Y that contains the known classification of each value of X (e.g. 0 or 1). I'm using the following code:
[b,dev,stats] = glmfit(X,Y,'binomial','link','logit');
However, this gives me nonsensical results with a p = 1.000, coefficients (b) that are extremely high (-650.5, 1320.1), and associated standard error values on the order of 1e6.
I then tried using an additional parameter to specify the size of my binomial sample:
glm = GeneralizedLinearModel.fit(X,Y,'distr','binomial','BinomialSize',size(Y,1));
This gave me results that were more in line with what I expected. I extracted the coefficients, used glmval to create estimates (Y_fit = glmval(b,[0:0.01:1],'logit');), and created an array for the fitting (X_fit = linspace(0,1)). When I overlaid the plots of the original data and the model using figure, plot(X,Y,'o',X_fit,Y_fit'-'), the resulting plot of the model essentially looked like the lower 1/4th of the 'S' shaped plot that is typical with logistic regression plots.
My questions are as follows:
1) Why did my use of glmfit give strange results?
2) How should I go about addressing my initial question: given some input value, what's the probability that its classification is correct?
3) How do I get confidence intervals for my model parameters? glmval should be able to input the stats output from glmfit, but my use of glmfit is not giving correct results.
Any comments and input would be very useful, thanks!
UPDATE (3/18/14)
I found that mnrval seems to give reasonable results. I can use [b_fit,dev,stats] = mnrfit(X,Y+1); where Y+1 simply makes my binary classifier into a nominal one.
I can loop through [pihat,lower,upper] = mnrval(b_fit,loopVal(ii),stats); to get various pihat probability values, where loopVal = linspace(0,1) or some appropriate input range and `ii = 1:length(loopVal)'.
The stats parameter has a great correlation coefficient (0.9973), but the p values for b_fit are 0.0847 and 0.0845, which I'm not quite sure how to interpret. Any thoughts? Also, why would mrnfit work over glmfit in my example? I should note that the p-values for the coefficients when using GeneralizedLinearModel.fit were both p<<0.001, and the coefficient estimates were quite different as well.
Finally, how does one interpret the dev output from the mnrfit function? The MATLAB document states that it is "the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares." Is this useful as a stand-alone value, or is this only compared to dev values from other models?
It sounds like your data may be linearly separable. In short, that means since your input data is one dimensional, that there is some value of x such that all values of x < xDiv belong to one class (say y = 0) and all values of x > xDiv belong to the other class (y = 1).
If your data were two-dimensional this means you could draw a line through your two-dimensional space X such that all instances of a particular class are on one side of the line.
This is bad news for logistic regression (LR) as LR isn't really meant to deal with problems where the data are linearly separable.
Logistic regression is trying to fit a function of the following form:
This will only return values of y = 0 or y = 1 when the expression within the exponential in the denominator is at negative infinity or infinity.
Now, because your data is linearly separable, and Matlab's LR function attempts to find a maximum likelihood fit for the data, you will get extreme weight values.
This isn't necessarily a solution, but try flipping the labels on just one of your data points (so for some index t where y(t) == 0 set y(t) = 1). This will cause your data to no longer be linearly separable and the learned weight values will be dragged dramatically closer to zero.