feature extraction for classification matlab - matlab

I have a feature matrix 977x3
features = rand(977,3);
where each row is an observation and each column is a feature.
I calculate the pairwise distances between point with
dissimilarities = pdist(features);
and then I scale it with
feature_transf = mdscale(dissimilarities,3);
I use the new set of features (feature_transf) for classification.
Does it have any sense? I have good classification accuracy training a SVM with 10 fold cross validation.
Can you please tell me if it is methodologically incorrect?
Thanks,
Gabriele

Related

How to find the training accuracy using fitcsvm?

I would like to find the predicted labels of data point feature vectors while training the classifier, i am using MDL=fitcsvm(train_data,train_labels) in matlab the MDL is composed of properties, none of them corresponds to the training accuracy, Is there any way to find it ?
You can apply cross validation
xval = crossval(Mdl,'KFold',10);
kfoldLoss(xval)

Dimensionality reduction in HOG feature vector

I found out the HOG feature vector of the following image in MATLAB.
Input Image
I used the following code.
I = imread('input.jpg');
I = rgb2gray(I);
[features, visualization] = extractHOGFeatures(I,'CellSize',[16 16]);
features comes out to be a 1x1944 vector and I need to reduce the dimensionality of this vector (say to 1x100), what method should I employ for the same?
I thought of Principal Component Analysis and ran the following in MATLAB.
prinvec = pca(features);
prinvec comes out to be an empty matrix (1944x0). Am I doing it wrong? If not PCA, what other methods can I use to reduce the dimension?
You can't do PCA on this, since you have more features than your single observation. Get more observations, some 10,000 presumably, and you can do PCA.
See PCA in matlab selecting top n components for the more detailed and mathematical explanation as to why this is the case.

Multi-Data of K-means and SVM

I generate the multi data from mvnrnd. I could like use the K-means to clustering those data with 2 groups.And also want to know the accuracy of K-means,but i didn't know how to calculate that.How did i know the correct of k-means cluster to compare with the result and get the accuracy ?!
I have a multi data and the class , i know i could do the SVM. However,the accuracy of SVM was too low about 72% to 83%. I might have done some mistakes. I would like to hear some feedback. Thanks
n=1;mu1=[0,0,0];mu2=[1,1,1]; mu3=[2,2,2]; mu4=[3,3,3]; m=0.9;s=[1 m m ;m 1 m ; m m 1];
data1 = mvnrnd(mu1,s,1000); data2 = mvnrnd(mu2,s,1000);data3 = mvnrnd(mu3,s,1000);data4 = mvnrnd(mu4,s,1000);
all_data = [data1;data2;data3;data4];
[idx,ctrs,sumD,D] = kmeans(all_data,2,'distance','sqE','start','sample');
model = svmtrain(idx,all_data);
mu7=[0,0,0];mu8=[1,1,1];mu9=[2,2,2];mu10=[3,3,3];
data7=mvnrnd(mu7,s,1000);data8=mvnrnd(mu8,s,1000);data9=mvnrnd(mu9,s,1000);data10=mvnrnd(mu10,s,1000);
test_data = [data7;data8;data9;data10]; value = svmpredict(idx,test_data,model);
I want to know where my mistakes or something wrong of my code. I don't know why my accuracy is so low.I really want to improve my code. Thanks !!
To calculate accuracy of k-means algorithm You should have an a priori knowledge i.e. the reference class vector, like You have in SVM.
Despite that the k-means is unsupervised learning algorithm and You don't need to have class vector to classify the data, You need it to calculate the accuracy.
I have doubts for the method that You calculate model of the SVM as well. You use the indexes calculated from k-means algorithm witch has its own accuracy (right now it is not known). You use the modelled data for classification, so why won't you create your vector with classes?

Improve the accuracy performance on SVM

I am working on people detecting using two different features HOG and LBP. I used SVM to train the positive and negative samples. Here, I wanna ask how to improve the accuracy of SVM itself? Because, everytime I added up more positives and negatives sample, the accuracy is always decreasing. Currently my positive samples are 1500 and negative samples are 700.
%extract features
[fpos,fneg] = features(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
fprintf('Training SVM..\n');
%L = ones(length(SV),1);
T = cell2mat(HOG_featV(2,:));
HOGtP = HOG_featV(3,:)';
C = cell2mat(HOGtP); % each row of P correspond to a training example
%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);
LBP_featV = loadingV(LBPpos, LBPneg);
LBPlabel = cell2mat(LBP_featV(2,:));
LBPtP = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example
featureVector = [C M];
model = svmlearn(featureVector, T','-t 2 -g 0.3 -c 0.5');
Anyone knows how to find best C and Gamma value for improving SVM accuracy?
Thank you,
To find best C and Gamma value for improving SVM accuracy you typically perform cross-validation. In sum you can leave-one-out (1 sample) and test the VBM for that sample using the different parameters (2 parameters define a 2d grid). Typically you would test each decade of the parameters for a certain range. For example: C = [0.01, 0.1, 1, ..., 10^9]; G= [1^-5, 1^-4, ..., 1000]. This should also improve your SVM accuracy by optimizing the hyper-parameters.
By looking again to your question it seems you are using the svmlearn of the machine learning toolbox (statistics toolbox) of Matlab. Therefore you have already built-in functions for cross-validation. Give a look at: http://www.mathworks.co.uk/help/stats/support-vector-machines-svm.html
I followed ASantosRibeiro's method to optimize the parameters before and it works well.
In addition, you could try to add more negative samples until the proportion of the negative and positive reach 2:1. The reason is that when you implement real-time application, you should scan the whole image step by step and commonly the negative extracted samples would be much more than the people-contained samples.
Thus, add more negative training samples is a quite straightforward but effective way to improve overall accuracy(Both false positive and true negative).

How to use SVM in Matlab?

I am new to Matlab. Is there any sample code for classifying some data (with 41 features) with a SVM and then visualize the result? I want to classify a data set (which has five classes) using the SVM method.
I read the "A Practical Guide to Support Vector Classication" article and I saw some examples. My dataset is kdd99. I wrote the following code:
%% Load Data
[data,colNames] = xlsread('TarainingDataset.xls');
groups = ismember(colNames(:,42),'normal.');
TrainInputs = data;
TrainTargets = groups;
%% Design SVM
C = 100;
svmstruct = svmtrain(TrainInputs,TrainTargets,...
'boxconstraint',C,...
'kernel_function','rbf',...
'rbf_sigma',0.5,...
'showplot','false');
%% Test SVM
[dataTset,colNamesTest] = xlsread('TestDataset.xls');
TestInputs = dataTset;
groups = ismember(colNamesTest(:,42),'normal.');
TestOutputs = svmclassify(svmstruct,TestInputs,'showplot','false');
but I don't know that how to get accuracy or mse of my classification, and I use showplot in my svmclassify but when is true, I get this warning:
The display option can only plot 2D training data
Could anyone please help me?
I recommend you to use another SVM toolbox,libsvm. The link is as follow:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
After adding it to the path of matlab, you can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically.
train_feature is n*L matrix for n samples. You'd better preprocess the feature before using it. In the test part, they should be preprocess in the same way.
The accuracy you want will be showed when test is finished, but it's only for the whole dataset.
If you need the accuracy for positive and negative samples separately, you still should calculate by yourself using the label predicted.
Hope this will help you!
Your feature space has 41 dimensions, plotting more that 3 dimensions is impossible.
In order to better understand your data and the way SVM works is to begin with a linear SVM. This tybe of SVM is interpretable, which means that each of your 41 features has a weight (or 'importance') associated with it after training. You can then use plot3() with your data on 3 of the 'best' features from the linear svm. Note how well your data is separated with those features and choose a basis function and other parameters accordingly.