I would like to find the predicted labels of data point feature vectors while training the classifier, i am using MDL=fitcsvm(train_data,train_labels) in matlab the MDL is composed of properties, none of them corresponds to the training accuracy, Is there any way to find it ?
You can apply cross validation
xval = crossval(Mdl,'KFold',10);
kfoldLoss(xval)
Related
I have a training dataset (50000 X 16) and test dataset (5000 X 16)[the 16th column in both the datasets are decision labels or response. The decision label in test dataset in used for checking the classification accuracy of the trained classifier]. I am using my training data for training and validating my cross validated knn classifier. I have created a cross validated knn classifier model using the following code :
X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data
%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);
%Construct a cross-validated classifier from the model.
CVMdl = crossval(Mdl,'KFold', 10);
%Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training.
kloss = kfoldLoss(CVMdl, 'LossFun', 'ClassifError')
% Compute validation accuracy
validationAccuracy = 1 - kloss;
now I want to classify my Test data using this cross validated knn classifier but can't really figure out how to do that. I have gone through the available examples in matlab but couldn't find any suitable function or examples for doing this.
I know I can use the "predict" function for predicting the classlabels of my test data if my classifier is not cross validated. The code is as following :
X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data
%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);
%Classification using Test Data
Classifier_Output_Labels = predict(Mdl,TestDataset2(1:5000,:));
But I could not find any similar function (like "predict") for cross validated trained knn classifier. I found out the "kfoldPredict" function in Matlab documentation, but it says the function is used to evaluate the trained model.
http://www.mathworks.com/help/stats/classificationpartitionedmodel.kfoldpredict.html But I did not find any input of the new data through this function.
So could anyone please advise me how to use the cross validated knn classifier model to predict labels of new data? Any help is appreciated and badly needed. :( :(
Let's say you are doing 10-fold cross validation while learning the model. You can then use the kfoldLoss function to also get the CV loss for each fold and then choose the trained model that gives you the least CV loss in the following way:
modelLosses = kfoldLoss(Mdl,'mode','individual');
The above code will give you a vector of length 10 (10 CV error values) if you have done 10-fold cross-validation while learning. Assuming the trained model with least CV error is the 'k'th one, you would then use:
testSetPredictions = predict(Mdl.Trained{k}, testSetFeatures);
You seem to be confusing things here. Cross validation is a tool for model selection and evaluation. It is not training procedure per se. Consequently you cannot "use" cross validated object. You predict using trained object. Cross validation is a form of estimating generalization capabilities of a given model, it has nothing to do with actual training, it is rather a small statistical experiment to asses a particular property.
I have a feature matrix 977x3
features = rand(977,3);
where each row is an observation and each column is a feature.
I calculate the pairwise distances between point with
dissimilarities = pdist(features);
and then I scale it with
feature_transf = mdscale(dissimilarities,3);
I use the new set of features (feature_transf) for classification.
Does it have any sense? I have good classification accuracy training a SVM with 10 fold cross validation.
Can you please tell me if it is methodologically incorrect?
Thanks,
Gabriele
I would like to draw learning curves for a given SVM classifier. Thus, in order to do this, I would like to compute the training, cross-validation and test error, and then plot them while varying some parameter (e.g., number of instances m).
How to compute training, cross-validation and test error on libsvm when used with MATLAB?
I have seen other answers (see example) that suggest solutions for other languages.
Isn't there a compact way of doing it?
Given a set of instances described by:
a set of features featureVector;
their corresponding labels (e.g., either 0 or 1),
if a model was previously inferred via libsvm, the MSE error can be computed as follows:
[predictedLabels, accuracy, ~] = svmpredict(labels, featureVectors, model,'-q');
MSE = accuracy(2);
Notice that predictedLabels contains the labels that were predicted by the classifier for the given instances.
I have 300 data samples with around 4000 dimension feature each. Each input has a 5 dim. output which is in the range of -2 to 2. I am trying to fit a lasso model to it. I went through a few posts which talk about cross validation strategies like this one: Leave one out cross validation algorithm in matlab
But I saw that lasso does not support leaveout in Matlab! http://www.mathworks.com/help/stats/lasso.html
How can I train a model using leave one out cross validation and fit a model using lasso on my dataset? I am trying to do this in matlab. I would like to get a set of weights which I will be able to use for future predictions on other data.
I tried using glmnet: http://www.stanford.edu/~hastie/glmnet_matlab/intro.html but I couldn't compile it on my machine due to lack of proper mex compiler.
Any solutions to my problem? Thanks :)
EDIT
I am also trying to use lasso function in-built with MATLAB. It has an option to perform cross validation. It outputs B and Fit Statistics, where B is Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.
Now given a new test sample, how can I calculate the output using this model?
You can use a leave-one-out approach regardless of your training method. As explained here, you can use crossvalind to split the data into training and test sets.
[Train, Test] = crossvalind('LeaveMOut', N, M)
I am new to Matlab. Is there any sample code for classifying some data (with 41 features) with a SVM and then visualize the result? I want to classify a data set (which has five classes) using the SVM method.
I read the "A Practical Guide to Support Vector Classication" article and I saw some examples. My dataset is kdd99. I wrote the following code:
%% Load Data
[data,colNames] = xlsread('TarainingDataset.xls');
groups = ismember(colNames(:,42),'normal.');
TrainInputs = data;
TrainTargets = groups;
%% Design SVM
C = 100;
svmstruct = svmtrain(TrainInputs,TrainTargets,...
'boxconstraint',C,...
'kernel_function','rbf',...
'rbf_sigma',0.5,...
'showplot','false');
%% Test SVM
[dataTset,colNamesTest] = xlsread('TestDataset.xls');
TestInputs = dataTset;
groups = ismember(colNamesTest(:,42),'normal.');
TestOutputs = svmclassify(svmstruct,TestInputs,'showplot','false');
but I don't know that how to get accuracy or mse of my classification, and I use showplot in my svmclassify but when is true, I get this warning:
The display option can only plot 2D training data
Could anyone please help me?
I recommend you to use another SVM toolbox,libsvm. The link is as follow:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
After adding it to the path of matlab, you can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically.
train_feature is n*L matrix for n samples. You'd better preprocess the feature before using it. In the test part, they should be preprocess in the same way.
The accuracy you want will be showed when test is finished, but it's only for the whole dataset.
If you need the accuracy for positive and negative samples separately, you still should calculate by yourself using the label predicted.
Hope this will help you!
Your feature space has 41 dimensions, plotting more that 3 dimensions is impossible.
In order to better understand your data and the way SVM works is to begin with a linear SVM. This tybe of SVM is interpretable, which means that each of your 41 features has a weight (or 'importance') associated with it after training. You can then use plot3() with your data on 3 of the 'best' features from the linear svm. Note how well your data is separated with those features and choose a basis function and other parameters accordingly.