How to predict labels for new data (test set) by the PartitionedEnsemble model in Matlab? - matlab

I trained a ensemble model (RUSBoost) for a binary classification problem by the function fitensemble() in Matlab 2014a. The training by this function is performed 10-fold cross-validation through the input parameter "kfold" of the function fitensemble().
However, the output model trained by this function cannot be used to predict the labels of new data if I use the predict(model, Xtest). I checked the Matlab documents, which says we can use kfoldPredict() function to evaluate the trained model. But I did not find any input of the new data through this function. Also, I found the structure of the trained model with cross-validation is different from that model without cross-validation. So, could anyone please advise me how to use the model, which is trained with cross-validation, to predict labels of new data? Thanks!

kfoldPredict() needs a RegressionPartitionedModel or ClassificationPartitionedEnsemble object as input. This already contains the models and data for kfold cross validation.
The RegressionPartitionedModel object has a field Trained, in which the trained learners that are used for cross validation are stored.
You can take any of these learners and use it like predict(learner, Xdata).
Edit:
If k is too large, it is possible that there is too little meaningful data in one or more iteration, so the model for that iteration is less accurate.
There are no general rules for k, but k=10 like in the MATLAB default is a good starting point to play around with it.
Maybe this is also interesting for you: https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation

Related

How can I choose the best model in cross validation in matlab?

I have two datasets and I want to train a SVM classification model (fitcsvm) by one of them and then predict labels for the other one. I use 10-fold cross-validation (crossval) to train my model so I have 10 different models. My question is which one of these models are the best for prediction and how can I find that?
here is my code:
Mdl = fitcsvm(trainingData,labels);
CVMdl = crossval(Mdl);
You may have mixed up something here. The function fitcsvm trains a single model and the function crossval validates this single model. It will then return an evaluation value.
In general, you cannot train a model by cross-validation (as it says, it is a validation technique). However, you can use cross-validation to train good models.
What you are looking for is a sort of hyperparameter optimization. Those are methods that automatically train multiple models on a given data set to find the best tuning values for the SVM. Have a look at the docs here
You can turn it on like this
Mdl = fitcsvm(trainingData,labels,'OptimizeHyperparameters','auto')
You may want to use cross-validation to train multiple models with the same tuning parameters but I guess, you'll have to write this yourself. Perhaps this already helps you.

Matlab cross validation and K-NN

I am trying to build a knn clasiffier with cross validation in Matlab. Because of my MATLAB version I have used knnclassify() in order to build the classifier (classKNN = knnclassify (sample_test, sample_training, training_label)).
I am not capable to use crossval() with that.
Thanks in advance.
There are two ways to perform the K-Nearest Neighbour in Matlab. The first one is by using knnclassify() as you did. However, this function will return the predicted labels and you cannot use crossval() with this. The cross-validation is performed on a model, not on its results. In Matlab, the model is described by an object.
crossval() only works with objects (classifier objects, be it K-NN, SVM and so on...). In order to create the so-called nearest-neighbor classification object you need to use the fitcknn() function. Given the Training Set and the Training Labels as input (in this order), such function will return your object, which you can give as input in crossval().
There's only one thing left though: how do I predict the labels for my validation set? In order to do this, you need to use the predict() function. Given the model (kNN object) and the Validation Set as input (again, in this order), such function will return (as in knnclassify()) the predicted labels vector.

Parameter selection of SVM

I have a dataset which I use for classifcation with libSVM in Matlab. The dataset consists of 4 classes.
For parameter selection of SVM I can do nested cross-validation. The problem is that I also need the value of the best parameters in the end.
After having done the nested cross-validation and having the final accuracy I want the values of the best parameters. Then I will train a SVM for each class (one-vs-all) with the best parameters for selecting the most important features (according to heighest weight), i.e. feature importance map.
How can I do this? Should I just not do nested cross-validation and only looping over all parameters and doing cross-validation?
Second, if I use a linear SVM then using this weight vector w for assigning importance to features works, but does it also work for non-linear SVM (e.g. rbf kernel)?
To find the "best" parameters for your kernel of choice, you have to loop through all parameters to perform a so called "grid search". LIBSVM does not support a build-in grid-search mechanismn.
Regarding your second question, I would suggest to perform a feature selection (e.g. Information Gain, Mutual Information, ...) as a pre-processing step before the actual work with the SVM and in a second step take the weight vector
s into consideration (but I am not sure, if this will work with RBF or Gaußian Kernels...).

how to save libsvm model in matlab

I'm using libsvm in matlab, and it seems there no existing method to save the model formed from svmtrain. Instead, the functions provided forces me to retrain everytime. Just saving the svmtrain model variable in a .mat does not work. What should I be doing?
The nice thing about SVM, unlike NN is that training is fast and in some cases can be even done online. I have a multiclass SVM and I save the training vectors, classes and the kernel parameters into a txt file.

Usage of Libsvm model

I've developed a model using Libsvm in Matlab. I've choose best parameters using CV and I obtained the model training the whole dataset. I use normalization to get better results:
maximum=max(TR)+0.00001;
minimum=min(TR);
for i=1:size(TR,2)
training(1:size(TR,1),i)=double(TR(1:size(TR,1),i)-maximum(i))/(maximum(i)-minimum(i));
end
Now how can I use directly my model to obtain classification for new data? I mean for records that haven't class label. Do I have to manually build functions from model information?
Are you using libsvmtrain to train on your training data? If so, there is an output argument that you can use to classify test/future data. Then pass that output structure to svmpredict along with test data.