Libsvm vs Weka (WLSVM) - matlab

I've got to deal with an unbalanced dataset (95% record of negative class and 5% positive). I developed a model using decision tree and Weka framework. Now I'd like to try SVM and Libsvm to get better results. I'm trying to use Libsvm for matlab an Libsvm weka wrapper. I'd like to know how to compare results that I get from them. In weka a model is built from the whole dataset and after a 10-fold cross validation is performed. How can I do it with Libsvm? From Libsvm FAQ's I discovered that CV is made only to discover best parameters for kernels,not during train/predict, so what is the exact sequence of action that I should do in Matlab to obtain similar results in order to compare them with Weka?

Related

support vector machine regression & prediction using MATLAB fitrsvm function

I'm relatively new to using SVM and I have a question regarding how to use the results of the SVM regression. I have found many easy-to-understand documentation on SVM classification, and I can understand how to use the result of SVM for binary classification (i.e. data on one side of the support vector is labeled as one, data on the other side of the support vector is labeled as another), but I have no been able to find such hints on SVM regression- which is why I have run into the following question:
Using both libsvm package and the fitrsvm function in MATLAB, I was able to successfully generate models that are capable of fitting the abalone data set. the result of the libsvm (using svmtrain function) was used along with svmpredict to the successfully predict with new input parameters as followed:
model=svmtrain(age_train,X_train,['-s 3 -t 2 -c 2 -g 2']);
[prediction,accuracy,~]=svmpredict(age_eval,X_eval,model);
Also, as I've said, I was able to achieve the same results using the fitrsvm function as followed:
model1=fitrsvm(X_train,age_train,'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
'expected-improvement-plus'),'KernelFunction','rbf');
age_predict1=predict(model1,X_eval);
Now my question is, how do the svmpredict function (in the case of libsvm package) and the predict function (in the case of fitrsvm function in MATLAB) take the values within the trained models and apply them to the new input data? For example, is there a mathematical equation in which I apply the parameters of the trained model (such as the 'Mu' and the 'Sigma' parameters in the fitrsvm result) to the new input data to obtain the results?
It would be greatly appreciated if someone could help me with this or refer me to someone/somewhere who can help me, thank you very much in advance.

How to predict labels for new data (test set) by the PartitionedEnsemble model in Matlab?

I trained a ensemble model (RUSBoost) for a binary classification problem by the function fitensemble() in Matlab 2014a. The training by this function is performed 10-fold cross-validation through the input parameter "kfold" of the function fitensemble().
However, the output model trained by this function cannot be used to predict the labels of new data if I use the predict(model, Xtest). I checked the Matlab documents, which says we can use kfoldPredict() function to evaluate the trained model. But I did not find any input of the new data through this function. Also, I found the structure of the trained model with cross-validation is different from that model without cross-validation. So, could anyone please advise me how to use the model, which is trained with cross-validation, to predict labels of new data? Thanks!
kfoldPredict() needs a RegressionPartitionedModel or ClassificationPartitionedEnsemble object as input. This already contains the models and data for kfold cross validation.
The RegressionPartitionedModel object has a field Trained, in which the trained learners that are used for cross validation are stored.
You can take any of these learners and use it like predict(learner, Xdata).
Edit:
If k is too large, it is possible that there is too little meaningful data in one or more iteration, so the model for that iteration is less accurate.
There are no general rules for k, but k=10 like in the MATLAB default is a good starting point to play around with it.
Maybe this is also interesting for you: https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation

How to perform multi-class cross-validation for LIBSVM in MatLab

I want to use LIBSVM in MatLab to do some multi-class classification. I have read that LIBSVM use One vs. One by default when provided with multiple labels, and I am fine with it.
My question is about the parameter search and the model validation. When doing a 2-class validation to find the parameters C and gamma (when using RBF as kernel), I would use the built-in cross validation to find the best (C,gamma)-pair, using a simple grid search. I have read the LIBSVM documentation but I have no idea how validation works for multiclass SVM.
Does the built-in option returns the multi-class accuracy? How can I provide the best parameters to each of the OvO models it will automaticaly built?
The answer is given there http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f507. I did not read the FAQ of LibSVM enough.

how to save libsvm model in matlab

I'm using libsvm in matlab, and it seems there no existing method to save the model formed from svmtrain. Instead, the functions provided forces me to retrain everytime. Just saving the svmtrain model variable in a .mat does not work. What should I be doing?
The nice thing about SVM, unlike NN is that training is fast and in some cases can be even done online. I have a multiclass SVM and I save the training vectors, classes and the kernel parameters into a txt file.

libsvm differences in accuracy between cmd line, weka and matlab

I'm doing some preliminary testing with 2 classes of vectors, trying to separate them with libsvm. I get a 78.2% correct ID rate in Matlab and at the cmd line (using libsvm), but in Weka I get around 95%.
No cross-validation was done in Weka; just trained model and then read in test dataset and classified it.
Can anyone offer an explanation? Thanks in advance.
If you didn't provide a separate Test Data , the validation Folds should be set, 10 or desired value. however, be sure that the same SVMType and kerneltype are being used in both program. by default Weka uses C-SVC with radial basis function.