How to predict a label in svm for real data? [duplicate] - matlab

Could you give an example of classification of 4 classes using Support Vector Machines (SVM) in matlab something like:
atribute_1 atribute_2 atribute_3 atribute_4 class
1 2 3 4 0
1 2 3 5 0
0 2 6 4 1
0 3 3 8 1
7 2 6 4 2
9 1 7 10 3

SVMs were originally designed for binary classification. They have then been extended to handle multi-class problems. The idea is to decompose the problem into many binary-class problems and then combine them to obtain the prediction.
One approach called one-against-all, builds as many binary classifiers as there are classes, each trained to separate one class from the rest. To predict a new instance, we choose the classifier with the largest decision function value.
Another approach called one-against-one (which I believe is used in LibSVM), builds k(k-1)/2 binary classifiers, trained to separate each pair of classes against each other, and uses a majority voting scheme (max-win strategy) to determine the output prediction.
There are also other approaches such as using Error Correcting Output Code (ECOC) to build many somewhat-redundant binary-classifiers, and use this redundancy to obtain more robust classifications (uses the same idea as Hamming codes).
Example (one-against-one):
%# load dataset
load fisheriris
[g gn] = grp2idx(species); %# nominal class to numeric
%# split training/testing sets
[trainIdx testIdx] = crossvalind('HoldOut', species, 1/3);
pairwise = nchoosek(1:length(gn),2); %# 1-vs-1 pairwise models
svmModel = cell(size(pairwise,1),1); %# store binary-classifers
predTest = zeros(sum(testIdx),numel(svmModel)); %# store binary predictions
%# classify using one-against-one approach, SVM with 3rd degree poly kernel
for k=1:numel(svmModel)
%# get only training instances belonging to this pair
idx = trainIdx & any( bsxfun(#eq, g, pairwise(k,:)) , 2 );
%# train
svmModel{k} = svmtrain(meas(idx,:), g(idx), ...
'BoxConstraint',2e-1, 'Kernel_Function','polynomial', 'Polyorder',3);
%# test
predTest(:,k) = svmclassify(svmModel{k}, meas(testIdx,:));
end
pred = mode(predTest,2); %# voting: clasify as the class receiving most votes
%# performance
cmat = confusionmat(g(testIdx),pred);
acc = 100*sum(diag(cmat))./sum(cmat(:));
fprintf('SVM (1-against-1):\naccuracy = %.2f%%\n', acc);
fprintf('Confusion Matrix:\n'), disp(cmat)
Here is a sample output:
SVM (1-against-1):
accuracy = 93.75%
Confusion Matrix:
16 0 0
0 14 2
0 1 15

MATLAB does not support multiclass SVM at the moment. You could use svmtrain (2-classes) to achieve this, but it would be much easier to use a standard SVM package.
I have used LIBSVM and can confirm that it's very easy to use.
%%# Your data
D = [
1 2 3 4 0
1 2 3 5 0
0 2 6 4 1
0 3 3 8 1
7 2 6 4 2
9 1 7 10 3];
%%# For clarity
Attributes = D(:,1:4);
Classes = D(:,5);
train = [1 3 5 6];
test = [2 4];
%%# Train
model = svmtrain(Classes(train),Attributes(train,:),'-s 0 -t 2');
%%# Test
[predict_label, accuracy, prob_estimates] = svmpredict(Classes(test), Attributes(test,:), model);

Related

4 Dimensional Meshgrid - matlab

I want to create a 4 dimensional meshgrid.
I know I need to use the ngrid function. However, the output of meshgrid and ngrid is not exactly the same unless one permutes dimensions.
To illustrate, a three dimensional meshgrid seems to be equivalent to a three dimensional ngrid if the following permutations are done:
[X_ndgrid,Y_ndgrid,Z_ndgrid] = ndgrid(1:3,4:6,7:9)
X_meshgrid = permute(X_ndgrid,[2,1,3]);
Y_meshgrid = permute(Y_ndgrid,[2,1,3]);
Z_meshgrid = permute(Z_ndgrid,[2,1,3]);
sum(sum(sum(X == X_meshgrid))) == 27
sum(sum(sum(Y == Y_meshgrid))) == 27
sum(sum(sum(Z == Z_meshgrid))) == 27
I was wondering what are the right permutations for a 4-D meshgrid.
[X_ndgrid,Y_ndgrid,Z_ndgrid, K_ndgrid] = ndgrid(1:3,4:6,7:9,10:12 )
Edit: EBH, thanks for your answer below. Just one more quick question. If the endgoal is to create a grid in order to use interpn, what would be the difference between creating a grid with meshgrid or with ndgrid (assuming a 3 dimensional problem?)
The difference between meshgrid and ndgrid is that meshgrid order the first input vector by the columns, and the second by the rows, so:
>> [X,Y] = meshgrid(1:3,4:6)
X =
1 2 3
1 2 3
1 2 3
Y =
4 4 4
5 5 5
6 6 6
while ndgrid order them the other way arround, like:
>> [X,Y] = ndgrid(1:3,4:6)
X =
1 1 1
2 2 2
3 3 3
Y =
4 5 6
4 5 6
4 5 6
After the first 2 dimensions, there is no difference between them, so using permute only on the first 2 dimensions should be enough. So for 4 dimensions you just write:
[X_ndgrid,Y_ndgrid,Z_ndgrid,K_ndgrid] = ndgrid(1:3,4:6,7:9,10:12);
[X_meshgrid,Y_meshgrid,Z_meshgrid] = meshgrid(1:3,4:6,7:9);
X_meshgrid_p = permute(X_meshgrid,[2,1,3]);
Y_meshgrid_p = permute(Y_meshgrid,[2,1,3]);
all(X_ndgrid(1:27).' == X_meshgrid_p(:)) % the transpose is only relevant for this comparison, not for the result.
all(Y_ndgrid(1:27).' == Y_meshgrid_p(:)) % the transpose is only relevant for this comparison, not for the result.
all(Z_ndgrid(1:27).' == Z_meshgrid(:)) % the transpose is only relevant for this comparison, not for the result.
and it will return:
ans =
1
ans =
1
ans =
1
If you want to use it as an input for interpn, you should use the ndgrid format.

Using bin counts as weights for random number selection

I have a set of data that I wish to approximate via random sampling in a non-parametric manner, e.g.:
eventl=
4
5
6
8
10
11
12
24
32
In order to accomplish this, I initially bin the data up to a certain value:
binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);
Then populate a matrix with all possible numbers covered by the bins which the approximation can choose:
sizes = transpose(1:binsize*nbins);
To use the bin counts as weights for selection i.e. bincount (1-5) = 2, thus the weight for choosing 1,2,3,4 or 5 = 2 whilst (16-20) = 0 so 16,17,18, 19 or 20 can never be chosen, I simply take the bincounts and replicate them across the bin size:
w = repelem(bincounts,binsize);
To then perform weighted number selection, I use:
[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);
For some reason this approach is unable to approximate the data. It was my understanding that was sufficient sampling depth, the binned version of R would be identical to the binned version of eventl however there is significant variation and often data found in bins whose weights were 0.
Could anybody suggest a better method to do this or point out the error?
For a better method, I suggest randsample:
values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
numberOfElements = 1000; %# how many values you want to pick
weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)
sample = randsample(values, numberOfElements, true, weights);
Note that even with 1000 samples, the distribution does not exactly correspond to the weights, so if you only pick 20 samples, the histogram may look rather different.

Leave one out crossvalind in Matlab

I have extracted HOG features for male and female pictures, now, I'm trying to use the Leave-one-out-method to classify my data.
Due the standard way to write it in Matlab is:
[Train, Test] = crossvalind('LeaveMOut', N, M);
What I should write instead of N and M?
Also, should I write above code statement inside or outside a loop?
this is my code, where I have training folder for Male (80 images) and female (80 images), and another one for testing (10 random images).
for i = 1:10
[Train, Test] = crossvalind('LeaveMOut', N, 1);
SVMStruct = svmtrain(Training_Set (Train), train_label (Train));
Gender = svmclassify(SVMStruct, Test_Set_MF (Test));
end
Notes:
Training_Set: an array consist of HOG features of training folder images.
Test_Set_MF: an array consist of HOG features of test folder images.
N: total number of images in training folder.
SVM should detect which images are male and which are female.
I will focus on how to use crossvalind for the leave-one-out-method.
I assume you want to select random sets inside a loop. N is the length of your data vector. M is the number of randomly selected observations in Test. Respectively M is the number of observations left out in Train. This means you have to set N to the length of your training-set. With M you can specify how many values you want in your Test-output, respectively you want to left out in your Train-output.
Here is an example, selecting M=2 observations out of the dataset.
dataset = [1 2 3 4 5 6 7 8 9 10];
N = length(dataset);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, M);
% do whatever you want with Train and Test
dataset(Test) % display the test-entries
end
This outputs: (this is generated randomly, so you won't have the same result)
ans =
1 9
ans =
6 8
ans =
7 10
ans =
4 5
ans =
4 7
As you have it in your code according to this post, you need to adjust it for a matrix of features:
Training_Set = rand(10,3); % 10 samples with 3 features each
N = size(Training_Set,1);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, 2);
Training_Set(Train,:) % displays the data to train
end

Apply custom filter with a custom window matlab

I'm trying to solve the following problem:
I'v a kernel made of 0's and 1's ,
e.g a crosslike kernel
kernel =
0 1 0
1 1 1
0 1 0
and I need to apply it to a given matrix like
D =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
for semplicity let's assume to start from element D(2,2), wich is 11, to avoid padding (that I can do with padarray).
I should superimpose the kernel and extract only elements where kernel==1, i.e
[2,5,11,10,7] then apply on them a custom filter like median or average and replacing central element with the result.
Then I would like to pass through all other elements (neglect edge elements for semplicity) and do the same.
Now I'm using tempS= ordfilt2(Z,order,kernel,'symmetric');
that performs exactly that operation with median filter. But I would like to use a different criterion (i.e. the average or some weird operation )
Use blockproc. This also handles border effects automatically (see the documentation). For example, to compute the median of the values masked by the kernel:
mask = logical(kernel);
R = blockproc(D, [1 1], #(d) median(d.data(mask)), ...
'bordersize', [1 1], 'trimborder', 0);
The first [1 1] indicated the step. The second [1 1] indicates how many elements to take around the central one.
With your example D, the result is
R =
2 3 3 3
9 7 8 10
5 9 10 6
4 7 6 1
This should do what you want:
D = rand(10,20);
kernel = [0,1,0;1,1,1;0,1,0];
[dy,dx] = find(kernel==1);
% should be calculated from kernel
dy = dy-2;
dx = dx-2;
% start and stop should calculated by using kernel size
result = zeros(size(D));
for y = 2:(size(D,1)-1)
for x = 2:(size(D,2)-1)
elements = D(sub2ind(size(D),y+dy,x+dx));
result(y,x) = weirdOperation(elements);
end
end
Nevertheless this will perform very poorly in terms of speed. You should consider use builtin functions. conv2 or filter2 for linear filter operations. ordfilt2 for order-statistic funtionality.

support vector machines in matlab

Could you give an example of classification of 4 classes using Support Vector Machines (SVM) in matlab something like:
atribute_1 atribute_2 atribute_3 atribute_4 class
1 2 3 4 0
1 2 3 5 0
0 2 6 4 1
0 3 3 8 1
7 2 6 4 2
9 1 7 10 3
SVMs were originally designed for binary classification. They have then been extended to handle multi-class problems. The idea is to decompose the problem into many binary-class problems and then combine them to obtain the prediction.
One approach called one-against-all, builds as many binary classifiers as there are classes, each trained to separate one class from the rest. To predict a new instance, we choose the classifier with the largest decision function value.
Another approach called one-against-one (which I believe is used in LibSVM), builds k(k-1)/2 binary classifiers, trained to separate each pair of classes against each other, and uses a majority voting scheme (max-win strategy) to determine the output prediction.
There are also other approaches such as using Error Correcting Output Code (ECOC) to build many somewhat-redundant binary-classifiers, and use this redundancy to obtain more robust classifications (uses the same idea as Hamming codes).
Example (one-against-one):
%# load dataset
load fisheriris
[g gn] = grp2idx(species); %# nominal class to numeric
%# split training/testing sets
[trainIdx testIdx] = crossvalind('HoldOut', species, 1/3);
pairwise = nchoosek(1:length(gn),2); %# 1-vs-1 pairwise models
svmModel = cell(size(pairwise,1),1); %# store binary-classifers
predTest = zeros(sum(testIdx),numel(svmModel)); %# store binary predictions
%# classify using one-against-one approach, SVM with 3rd degree poly kernel
for k=1:numel(svmModel)
%# get only training instances belonging to this pair
idx = trainIdx & any( bsxfun(#eq, g, pairwise(k,:)) , 2 );
%# train
svmModel{k} = svmtrain(meas(idx,:), g(idx), ...
'BoxConstraint',2e-1, 'Kernel_Function','polynomial', 'Polyorder',3);
%# test
predTest(:,k) = svmclassify(svmModel{k}, meas(testIdx,:));
end
pred = mode(predTest,2); %# voting: clasify as the class receiving most votes
%# performance
cmat = confusionmat(g(testIdx),pred);
acc = 100*sum(diag(cmat))./sum(cmat(:));
fprintf('SVM (1-against-1):\naccuracy = %.2f%%\n', acc);
fprintf('Confusion Matrix:\n'), disp(cmat)
Here is a sample output:
SVM (1-against-1):
accuracy = 93.75%
Confusion Matrix:
16 0 0
0 14 2
0 1 15
MATLAB does not support multiclass SVM at the moment. You could use svmtrain (2-classes) to achieve this, but it would be much easier to use a standard SVM package.
I have used LIBSVM and can confirm that it's very easy to use.
%%# Your data
D = [
1 2 3 4 0
1 2 3 5 0
0 2 6 4 1
0 3 3 8 1
7 2 6 4 2
9 1 7 10 3];
%%# For clarity
Attributes = D(:,1:4);
Classes = D(:,5);
train = [1 3 5 6];
test = [2 4];
%%# Train
model = svmtrain(Classes(train),Attributes(train,:),'-s 0 -t 2');
%%# Test
[predict_label, accuracy, prob_estimates] = svmpredict(Classes(test), Attributes(test,:), model);