I have this doubt about the ridge regression in matlab. They have mentioned at http://www.mathworks.com/help/stats/ridge.html, that ridge regression actually mean centers and make the std equal to 1 for the predictors. However, I could see that it doesn't. For e.g.
Let my x be
1 1 2
1 3 5
1 9 12
1 12 50
Let my y be
1
2
3
4
It doesn't do any normalization of the xs to 0 mean and unit variance. Any clarifications what's going on? I mean ridge should do normalization of the data i.e x to 0 mean and unit variance and then calculate the coefficients. I was expecting Ridge(y,x,0,0) to give me result of R=inv(x'*x)*x'y where R takes x and y normalized
The output must be the same, ridge regression only makes the calculation more stable numerically (less sensitive to multicollinearity).
== UPDATE ==
Now I understand better what you ask :) The documentation says:
b = ridge(y,X,k,scaled) uses the {0,1}-valued flag scaled to determine
if the coefficient estimates in b are restored to the scale of the
original data. ridge(y,X,k,0) performs this additional transformation.
You've set both the third and the fourth parameters to 0, which means that the ridge parameter is zero, and the result won't be scaled, so it should be the same as what you get with inv(x'*x)*x'y (this is what the ridge regression formula becomes if the ridge parameter k is set to 0).
Related
Hi I am running a linear regression i.e. y on x. both variables are in units. I have scaled both variables between 0 and 1. In other words Min-Max scaling is applied on y and x. I am getting a significant coefficient value of 0.5. The question is what is the appropriate way to interpret 0.5:
1). 0.5 unit increase in y due to 0.1 unit increase in x?
OR
2). Since both y and x are between 0 and 1, can we interpret it in percentage terms i.e. 0.5% increase in y due to 1% increase in x?
Thanks for your comments and feedback.
I plotted 5 fold cross-validation data as a cell array to perfcurve function with positive class=1. Then it generated 3 curves as you can see in the diagram. I was expecting only one curve.
[X,Y,T,AUC,OPTROCPT,SUBY,SUBYNAMES] = perfcurve(Actual_label,Score,1);
plot(X,Y)
Here, Actual_label and Score are a cell array of size 5 X 1. Each cell array is of size 70 X 1. And 1 denotes positive class=1.
P.S: I am using One-class SVM and 'fitSVMPosterior' function is not appropriate for one-class learning (same has been mentioned in the documentation of MATLAB). Therefore posterior probability can't be used here.
When you compute the confidence bounds, X and Y are an m-by-3 array, where m is the number of fixed X values or thresholds (T values). The first column of Y contains the mean value. The second and third columns contain the lower bound and the upper bound, respectively, of the pointwise confidence bounds. AUC is also a row vector with three elements, following the same convention.
Above explanation is taken from MATLAB documentation.
That is expected because you are plotting the ROC curve for each of the 5 folds.
Now if you want to have only one ROC for your classifier, you can either use the 5 trained classifiers to predict the labels of an independent test set or you can average the posterior probabilities of the 5 folds and have one ROC.
I am working in image classification. I am using an information that called prior probability (in Bayesian rule). It has range in [0,1]. And it requires computing in logarithm. However, as you know, logarithm of zero number is Inf.
For example, given an pixel x in image I (size 3 by 3) with an cost function such as
Cost(x)=30+log(prior(x))
where prior is an matrix 3 by 3
prior=[ 0 0 0.5;
1 1 0.2;
0.4 0 0]
I =[ 1 2 3;
4 5 6;
7 8 9]
I want to compute cost of x=1 then
cost(x=1)=30+log(0)
Now, log(0) is Inf. Then result cost(x=1) also Inf. Based on my assumption that prior=0 that mean the given pixel belongs to background, and prior=1 that mean the given pixel belongs to foreground.
My question is that how to compute log(prior) satisfy my assumption.
I am using Matlab to do it. I think that log(0) becomes very small negative value. And I just set it is -9 as my code
%% Handle with log(0)
prior(prior==0.0) = NaN;
%% Compute log
log_prior=log(prior);
%% Assume that e^-9 very near 0.
log_prior(isnan(log_prior)) = -9;
UPDATE: To make clearly what I am doing. Let see the Bayesian rule. My task is that how to assign an given pixel x belongs to Background (BG) or Foreground (FG). It will depends on the probability
P(x∈BG|x)=P(x|x∈BG)P(x∈BG)/P(x)
In which P(x|x∈BG) is likelihood function and assume that it is approximated by Gaussian distribution, P(x∈BG) is prior term and P(x) can be ignore due to it is const
Using Maximum-a-Posteriori (MAP) Estimation we can map the above equation in to log space (to resolve exponential in Gaussian function)
Cost(x)=log(P(x∈BG|x))=log(P(x|x∈BG))+log(P(x∈BG))
To make simple, let assume log(P(x|x∈BG))=30, log(P(x∈BG)) is log(prior) then my cost function can rewritten as
Cost(x)=30+log(prior(x))
Now problem is that prior is within [0,1] then it logarithm is -Inf. As the chepner said, we can add eps value as
log(prior+eps)
However, log(eps) is very a lager negative number. It will be affected my cost function (also becomes very large negative number). Then the first term in my cost function (30) becomes not necessary. Based on my assumption that log(x)=1 then the pixel x will be BG and prior(x)=1 will be FG. How to make handle with my log(prior) when I compute my cost function?
The correct thing to do, before fiddling with Matlab, is to try to understand your problem. Ask yourself "what does it mean for the prior probability to vanish?". The answer is given by Bayes theorem, one form of which is:
posterior = likelihood * prior / normalization
So places where the prior is nil are, by definition, places where you are certain that your events (the things whose probabilities you are computing) cannot happen, regardless of their apparent likelihood (i.e. "cost"). So they are not interesting for you. You just recognize that and skip them.
Even with a simple classifier like the nearest neighbour I cannot seem to judge its accuracy and thus cannot improve it.
For example with the code below:
IDX = knnsearch(train_image_feats, test_image_feats);
predicted_categories = cell([size(test_image_feats, 1), 1]);
for i=1:size(IDX,1)
predicted_categories{i}=train_labels(IDX(i));
end
Here train_image_feats is a 300 by 256 matrix where each row represents an image. Same is the structure of test_image_feats. train_labels is the label corresponding to each row of the training matrix.
The book I am following simply said that the above method achieves an accuracy of 19%.
How did the author come to this conclusion? Is there any way to judge the accuracy of my results be it with this classifier or other?
The author then uses another method of feature extraction and says it improved accuracy by 30%.
How can I find the accuracy? Be it graphically or just via a simple percentage.
Accuracy when doing machine learning and classification is usually calculated by comparing your predicted outputs from your classifier in comparison to the ground truth. When you're evaluating the classification accuracy of your classifier, you will have already created a predictive model using a training set with known inputs and outputs. At this point, you will have a test set with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this the ground truth data set. This ground truth data set helps assess the accuracy of your classifier when you are providing inputs to this classifier that it has not seen before. You take your inputs from your test set, and run them through your classifier. You get outputs for each input and we call the collection of these outputs the predicted values.
For each predicted value, you compare to the associated ground truth value and see if it is the same. You add up all of the instances where the outputs match up between the predicted and the ground truth. Adding all of these values up, and dividing by the total number of points in your test set yields the fraction of instances where your model accurately predicted the result in comparison to the ground truth.
In MATLAB, this is really simple to calculate. Supposing that your categories for your model were enumerated from 1 to N where N is the total number of labels you are classifying with. Let groundTruth be your vector of labels that denote the ground truth while predictedLabels denote your labels that are generated from your classifier. The accuracy is simply calculated by:
accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;
The first line of code calculates what the accuracy of your model is as a fraction. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or when you want to assess accuracy. One is just normalized between [0,1] while the other is a percentage from 0% to 100%. What groundTruth == predictedLabels does is that it compares each element between groundTruth and predictedLabels. If the ith value in groundTruth matches with the ith value in predictedLabels, we output a 1. If not, we output a 0. This will be a vector of 0s and 1s and so we simply sum up all of the values that are 1, which is eloquently encapsulated in the sum operation. We then divide by the total number of points in our test set to obtain the final accuracy of the classifier.
With a toy example, supposing I had 4 labels, and my groundTruth and predictedLabels vectors were this:
groundTruth = [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];
The accuracy using the above vectors gives us:
>> accuracy
accuracy =
0.4000
>> accuracyPercentage
accuracyPercentage =
40
This means that we have a 40% accuracy or an accuracy of 0.40. Using this example, the predictive model was only able to accurately classify 40% of the test set when you put each test set input through the classifier. This makes sense, because between our predicted outputs and ground truth, only 40%, or 6 outputs match up. These are the 1st, 2nd, 6th, 7th, 10th and 15th elements. There are other metrics to calculating accuracy, like ROC curves, but when calculating accuracy in machine learning, this is what is usually done.
I have trained a multi class svm classifier with 5 classes, i.e. svm(1)...svm(5).
I then used 5 images not used to during the training of these classifiers for testing.
These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class.
predict = svmclassify(svm(i_t),test_features);
The predict produces a 5 by 1 vector showing the result.
-1
1
1
1
-1
I sum these and then insert it into a diagonal matrix.
Ideally it should be a diagonal matrix with 5 written diagonally when all images are correctly classified. But the result is very poor. I mean in some cases I am getting negative result. I just want to verify if this poor result is because my confusion matrix is not accurate or if I should use some other feature extractor.
Here is the code I wrote
svm_table = [];
for i_t = 1:numel(svm)
test_folder = [Path_training folders(i_t).name '\']; %select writer
feature_count = 1; %Initialize count for feature vector accumulation
for j_t = 6:10 %these 5 images that were not used for training
[img,map] = imread([test_folder imlist(j_t).name]);
test_img = imresize(img, [100 100]);
test_img = imcomplement(test_img);
%Features extracted here for each image.
%The feature vector for each image is a 1 x 16 vector.
test_features(feature_count,:) = Features_extracted;
%The feature vectors are accumulated in a single matrix. Each row is an image
feature_count = feature_count + 1; % increment the count
end
test_features(isnan(test_features)) = 0; %locate Nan and replace with 0
%I was getting NaN in some images, which was causing problems with svm, so just replaced with 0
predict = svmclassify(svm(i_t),test_features); %produce column vector of preicts
svm_table(end+1,end+1) = sum(predict); %sum them and add to matrix diagonally
end
this is what I am getting. Looks like a confusion matrix but is very poor result.
-1 0 0 0 0
0 -1 0 0 0
0 0 3 0 0
0 0 0 1 0
0 0 0 0 1
So I just want to know what is at fault here. My implementation of confusion matrix. My way of testing the svm or my selection of features.
I would like to add some issues:
You mention that: << These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class. >>
You are never supposed to know the class (category) of test images. Of course, you need to know the test category labels for calculating various metrics such as accuracy, precision, confusion matrix etc. Apart from that, when you are using SVM to determine which class the example belongs to, you have to try all the SVMs.
There are two popular ways of training and testing multi-class SVMs, namely one-vs-all and one-vs-one approach. Read this answer and its corresponding question to understand them in detail.
I don't know if MATLAB SVM is capable of doing multiclass classification, but if you use LIBSVM then its uses one-vs-one approach. It will also do the testing for you correctly. However, if you want to design your own one-vs-one classifier, this is how you should proceed:
Say you have 5 classes, then train all possible combinations of pairs = 5c2 = 10 pairs ({1,2}, ..., {1,5},{2,1},...,{2,5},...,{5,4}). While testing, you have to apply all the 10 models and count all the votes to decide the final result. For example, we train models for 4 pairs (say), ({1 vs 2}, {1 vs 3}, {2 vs 1}, {2 vs 3}) and the outputs of 4 models are {1,1,0,1} respectively. That means, your 4 predicted classes are {1,1,1,2}. Therefore, the final class is 1.
Once you get all the predicted labels, then you can actually use the command confusionmat to get the confusion matrix. If you want to make your own, then make a 5x5 matrix of zeros. Add a 1 to the position (actual label, predicted label) i.e. if the actual class was 2 and you predicted it as 3, then add 1 at the position (2nd row, 3rd col) in the matrix.
Several issues that I can see...
1) What you're using is not really a multi class SVM. Your taking several different SVM models and applying them to the same test data (not really the same thing). You need to look at the documentation for svmtrain. When you use it you give it two kinds of data, the training data (parameter vectors for each training image) and the Group data (vector of classes for the images associated with the vectors..). What you get will be one SVM model which will decide between 1 of the options. (I usually use libsvm, so Im not that familiar with Matlabs SVM implementation, but that should be the gist of it)
2) Your confusion matrix is derived incorrectly (see: http://en.wikipedia.org/wiki/Confusion_matrix). Start by making a 5x5 zeros matrix to hold the confusion matrix. Loop through each of your test images and let the SVM model classify the image (it should pick 1 of the five possibilities). Add 1 at the proper position of the confusion matrix. So if the image should classify as a 3 and the SVM classifies it as a 4 you should add 1 to the 3,4 position...