I am trying to get the 5-fold cross validation error of a model created with TreeBagger using the function crossval but I keep getting an error
Error using crossval>evalFun
The function 'regrTree' generated the following error:
Too many input arguments.
My code is below. Can anyone point me in the right direction? Thanks
%Random Forest
%%XX is training data matrix, Y is training labels vector
XX=X_Tbl(:,2:end);
Forest_Mdl = TreeBagger(1000,XX,Y,'Method','regression');
err_std = crossval('mse',XX,Y,'Predfun',#regrTree, 'kFold',5);
function yfit_std = regrTree(Forest_Mdl,XX)
yfit_std = predict(Forest_Mdl,XX);
end
Reading the documentation helps a lot!:
The function has to be defined as:
(note that it takes 3 arguments, not 2)
function yfit = myfunction(Xtrain,ytrain,Xtest)
% Calculate predicted response
...
end
Xtrain — Subset of the observations in X used as training predictor
data. The function uses Xtrain and ytrain to construct a
classification or regression model.
ytrain — Subset of the responses in y used as training response data.
The rows of ytrain correspond to the same observations in the rows of
Xtrain. The function uses Xtrain and ytrain to construct a
classification or regression model.
Xtest — Subset of the observations in X used as test predictor data.
The function uses Xtest and the model trained on Xtrain and ytrain to
compute the predicted values yfit.
yfit — Set of predicted values for observations in Xtest. The yfit
values form a column vector with the same number of rows as Xtest.
Related
I have to use SVM classifier on digits dataset. The dataset consists of images of digits 28x28 and a toal of 2000 images.
I tried to use svmtrain but the matlab gave an error that svmtrain has been removed. so now i am using fitcsvm.
My code is as below:
labelData = zeros(2000,1);
for i=1:1000
labelData(i,1)=1;
end
for j=1001:2000
labelData(j,1)=1;
end
SVMStruct =fitcsvm(trainingData,labelData)
%where training data is the set of images of digits.
I need to know how i can predict the outputs of test data using svm? Further is my code correct?
The function that you are looking for is predict. It takes the SVM-object as input followed by a data-matrix and returns the predicted labels.
Make sure that you do not train your model on all data but on a reasonable subset (usually 70%). You can use the cross-validation preparation:
% create cross-validation object
cvp = cvpartition(Lbl,'HoldOut',0.3);
% extract logical vectors for training and testing data
lgTrn = cvp.training;
lgTst = cvp.test;
% train SVM
mdl = fitcsvm(Dat(lgTrn,:),Lbl(lgTrn));
% test / predict SVM
Lbl_prd = predict(mdl,Dat(lgTst,:));
Note that your labeling produces a single vector of ones.
The reason why The Mathworks changed svmtrain to fitcsvm is conciseness. It is now clear whether it is "classification" (fitcsvm) or "regression" (fitrsvm).
I am trying to use a Support Vector Machine to classify my data in 3 classes. I used this Matlab function to train and cross-validate the SVM:
Mdl = fitcecoc(XTrain, yTrain, 'Learners', 'svm', 'ObservationsIn', 'rows', ...
'ScoreTransform', 'invlogit','Crossval','on', 'Holdout', 0.2);
where XTrain contains all of my data and yTrain is a cell containing the names of each class to be assigned to the input data in XTrain.
The function above returns to me:
Mdl --> 1x1 ClassificationPartitionedECOC
My question is, what function do I have to use in order to make predictions using new data? In the case of binary classification, I build the SVM with 'fitcsvm' and then I predicted the labels with:
[label, score] = predict(Mdl, XTest);
However, if I feed the ClassificationPartitionedECOC to the 'predict' function, it gives me this error:
No valid system or dataset was specified.
I haven't been able to find a function that allows me to perform prediction starting from the model format that I have, ClassificationPartitionedECOC.
Thanks for any help you may provide!
You can access the learner i through:
Mdl.BinaryLearners{i}
Because fitcecoc just trains a binary classifier like you would do with fitCSVM in a one versus one fashion.
I am trying to do regression with fitrtree model. It works fine without the validation but with the validation the predict function returns an error.
%works fine
tree = fitrtree(trainingData,target,'MinLeafSize',2, 'Leaveout','off');
y_hat = predict(tree, xNew);
%Returns error
tree = fitrtree(trainingData,target,'MinLeafSize',2, 'Leaveout','on');
y_hat = predict(tree, xNew);
Error: Systems of classreg.learning.partition.RegressionPartitionedModel class cannot be used with the "predict"
command. Convert the system to an identified model first, such as by using the "idss" command.
Update: I figured out that when we use cross validation of any sort, the model is in the Trained attribute of tree rather than the tree itself. what is this trained attribute (tree.Trained{1}) and what information do we get from it.?
If you choose a cross-validation method when calling fitrtree(), the output of the function is a RegressionPartitionedModel instead of a RegressionTree.
As you said, you can access objects of type RegressionTree stored in tree.Trained in your case. The number and meaning of the trees you find under this attribute depends on the cross-validation model. In your case, using Leave-one-out cross-validation (LOOCV), the Trained attribute contains N RegressionTree objects, where N is the number of data points in your training set. Each of these regression trees is obtained by training on all but one of your data points. The left out data point is used for testing.
For example, if you want to access the first and last trees obtained from cross-validation, and use them for separate predictions, you can do:
%Returns RegressionPartitionedModel
cv_trees = fitrtree(trainingData,target,'MinLeafSize',2, 'Leaveout','on');
%This is the number of regression trees stored in cv_trees for LOOCV
[N, ~] = size(trainingData);
%Use one of the models from the cross-validation as a predictor
y_hat = predict(tree.Trained{1}, xNew);
y_hat_2 = predict(tree.Trained{N}, xNew);
I have a training dataset (50000 X 16) and test dataset (5000 X 16)[the 16th column in both the datasets are decision labels or response. The decision label in test dataset in used for checking the classification accuracy of the trained classifier]. I am using my training data for training and validating my cross validated knn classifier. I have created a cross validated knn classifier model using the following code :
X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data
%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);
%Construct a cross-validated classifier from the model.
CVMdl = crossval(Mdl,'KFold', 10);
%Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training.
kloss = kfoldLoss(CVMdl, 'LossFun', 'ClassifError')
% Compute validation accuracy
validationAccuracy = 1 - kloss;
now I want to classify my Test data using this cross validated knn classifier but can't really figure out how to do that. I have gone through the available examples in matlab but couldn't find any suitable function or examples for doing this.
I know I can use the "predict" function for predicting the classlabels of my test data if my classifier is not cross validated. The code is as following :
X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data
%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);
%Classification using Test Data
Classifier_Output_Labels = predict(Mdl,TestDataset2(1:5000,:));
But I could not find any similar function (like "predict") for cross validated trained knn classifier. I found out the "kfoldPredict" function in Matlab documentation, but it says the function is used to evaluate the trained model.
http://www.mathworks.com/help/stats/classificationpartitionedmodel.kfoldpredict.html But I did not find any input of the new data through this function.
So could anyone please advise me how to use the cross validated knn classifier model to predict labels of new data? Any help is appreciated and badly needed. :( :(
Let's say you are doing 10-fold cross validation while learning the model. You can then use the kfoldLoss function to also get the CV loss for each fold and then choose the trained model that gives you the least CV loss in the following way:
modelLosses = kfoldLoss(Mdl,'mode','individual');
The above code will give you a vector of length 10 (10 CV error values) if you have done 10-fold cross-validation while learning. Assuming the trained model with least CV error is the 'k'th one, you would then use:
testSetPredictions = predict(Mdl.Trained{k}, testSetFeatures);
You seem to be confusing things here. Cross validation is a tool for model selection and evaluation. It is not training procedure per se. Consequently you cannot "use" cross validated object. You predict using trained object. Cross validation is a form of estimating generalization capabilities of a given model, it has nothing to do with actual training, it is rather a small statistical experiment to asses a particular property.
I do not understand what the function "crossval" in matlab takes as first parameter, I understand that it is a function for performing a regression, but I don´t get what is intended as "some criterion testval". I need to use it on a K-nn regressor, but the examples are not making everything clear to me.
vals = crossval(fun,X)
Each time it is called, fun should use XTRAIN to fit a model, then
return some criterion testval computed on XTEST using that fitted
model.
Here is where I am reading: Matlab reference
It should be similar to optimization functions, where the returned value from your fitting function fun should be an indication of how well it fits the data. As the documentation states, fun takes two arguments, a training data set XTRAIN and a testing data set XTEST.
If your data, X, comprises a column of known results X(:,1) and other columns of features X(:, 2:end), and train your data using XTRAIN, then your return value could be as simple as the sum-squared error of the fitted model:
testval = sum( (model(XTEST(:, 2:end)) - XTEST(:, 1)).^2 );
where model(XTEST(:, 2:end)) is the result of your fitted model on the features of the testing data set, XTEST, and XTEST(:, 1) are the known results for those feature sets.