How do I create a linear regression model in Weka without training? - linear-regression

Suppose the linear model I want is sales = (0.4 * age) + (0.05 * income). How do I create this linear regression model in Weka without training on any data? I just want to save a model file that contains the linear relationship that I already know. No training is necessary. Is this possible in the Weka GUI or through the Java API? If so, how?

The Weka classifier MathExpressionClassifier that comes with the ADAMS framework, allows you to do that. You only have to supply the formula after the = as the expression.
Another alternative, if you don't want to switch to ADAMS, is the mxexpression-weka-package library. However, you will need to convert the attribute names in your formula to attX (with X being the 1-based attribute index). This package is also part of ADAMS.

Related

Vlfeat Matlab SVM

I'm trying to build an application for image processing, the purpose is to get thermal image and to decide if the image contains a human object or no.
My thoughts were to try Matlab (actually Octave), for that mission i'm trying to use Vlfeat package and i'm really confuse on how should i use this library.
I'm trying to use the SVM trainer after extracting HOG features but couldn't figure out how to test the data.
After I have trained the SVM, how to test a new image?
*If there are better solutions I'm open for suggestions.
From the first paragraph of the link you provided
(...) Y W'*X(:,i)+B has the same sign of LABELS(i) for all i.
Then Y W'*X(:,i)+B is the value assigned to some feature vector X(:,i), so for any given feature vector x you want to test, just evaluate W.' * x+B.
EDIT: A feature vector x for some test data is generated the same as for the training data using your feature extractio method. To classify this vector you evaluate the linear function given by the svm to get the classification "value" c=W.' * x+B Then you just need to consider the sign of c as the classification to one or the other class.

Matlab cross validation and K-NN

I am trying to build a knn clasiffier with cross validation in Matlab. Because of my MATLAB version I have used knnclassify() in order to build the classifier (classKNN = knnclassify (sample_test, sample_training, training_label)).
I am not capable to use crossval() with that.
Thanks in advance.
There are two ways to perform the K-Nearest Neighbour in Matlab. The first one is by using knnclassify() as you did. However, this function will return the predicted labels and you cannot use crossval() with this. The cross-validation is performed on a model, not on its results. In Matlab, the model is described by an object.
crossval() only works with objects (classifier objects, be it K-NN, SVM and so on...). In order to create the so-called nearest-neighbor classification object you need to use the fitcknn() function. Given the Training Set and the Training Labels as input (in this order), such function will return your object, which you can give as input in crossval().
There's only one thing left though: how do I predict the labels for my validation set? In order to do this, you need to use the predict() function. Given the model (kNN object) and the Validation Set as input (again, in this order), such function will return (as in knnclassify()) the predicted labels vector.

How to predict labels for new data (test set) by the PartitionedEnsemble model in Matlab?

I trained a ensemble model (RUSBoost) for a binary classification problem by the function fitensemble() in Matlab 2014a. The training by this function is performed 10-fold cross-validation through the input parameter "kfold" of the function fitensemble().
However, the output model trained by this function cannot be used to predict the labels of new data if I use the predict(model, Xtest). I checked the Matlab documents, which says we can use kfoldPredict() function to evaluate the trained model. But I did not find any input of the new data through this function. Also, I found the structure of the trained model with cross-validation is different from that model without cross-validation. So, could anyone please advise me how to use the model, which is trained with cross-validation, to predict labels of new data? Thanks!
kfoldPredict() needs a RegressionPartitionedModel or ClassificationPartitionedEnsemble object as input. This already contains the models and data for kfold cross validation.
The RegressionPartitionedModel object has a field Trained, in which the trained learners that are used for cross validation are stored.
You can take any of these learners and use it like predict(learner, Xdata).
Edit:
If k is too large, it is possible that there is too little meaningful data in one or more iteration, so the model for that iteration is less accurate.
There are no general rules for k, but k=10 like in the MATLAB default is a good starting point to play around with it.
Maybe this is also interesting for you: https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation

Usage of Libsvm model

I've developed a model using Libsvm in Matlab. I've choose best parameters using CV and I obtained the model training the whole dataset. I use normalization to get better results:
maximum=max(TR)+0.00001;
minimum=min(TR);
for i=1:size(TR,2)
training(1:size(TR,1),i)=double(TR(1:size(TR,1),i)-maximum(i))/(maximum(i)-minimum(i));
end
Now how can I use directly my model to obtain classification for new data? I mean for records that haven't class label. Do I have to manually build functions from model information?
Are you using libsvmtrain to train on your training data? If so, there is an output argument that you can use to classify test/future data. Then pass that output structure to svmpredict along with test data.

Libsvm vs Weka (WLSVM)

I've got to deal with an unbalanced dataset (95% record of negative class and 5% positive). I developed a model using decision tree and Weka framework. Now I'd like to try SVM and Libsvm to get better results. I'm trying to use Libsvm for matlab an Libsvm weka wrapper. I'd like to know how to compare results that I get from them. In weka a model is built from the whole dataset and after a 10-fold cross validation is performed. How can I do it with Libsvm? From Libsvm FAQ's I discovered that CV is made only to discover best parameters for kernels,not during train/predict, so what is the exact sequence of action that I should do in Matlab to obtain similar results in order to compare them with Weka?