SVM Matlab classification - matlab

I'm approaching a 4 class classification problem, it's not particularly unbalanced, no missing features a lot of observation.. It seems everything good but when I approach the classification with fitcecoc it classifies everything as part of the first class. I try. to use fitclinear and fitcsvm on one vs all decomposed data but gaining the same results. Do you have any clue about the reason of that problem ?

Here are a few recommendations:
Have you normalized your data? SVM is sensitive to the features being
from different scales.
Save the mean and std you obtain during the training and use
those values during the prediction phase for normalizing the test
samples.
Change the C value and see if that changes the results.
I hope these help.

Related

Why is it important to transform the data into normal / Gaussian distribution when creating a linear regression model

I'm currently building my first regression model, and as we know that, owing to the limitations of the algorithm, we need to remove outliers and transform the distribution into a normal one.
I know that it's important and the ways to do it, but can someone please help me in understanding why exactly we need to do so? Why can't I work with a highly skewed distribution? Why does linear regression mandates this transformation in processing stage?
Classifier with and without Outliers in the data
Hope the above picture clears your doubt.
As LinearRegression model is optimized by passing through a path which has minimum squared error,
due to outliers (which are abnormal data points or noise) our classifier may deviate and work poorly on the test data (much general data).

When to use PCA for dimensionality reduction?

I am using the Matlab Classification Learner app to test different classifiers over a training set (size = 700). My response variable is a categorical label with 5 possible values. I have 7 numerical features and 2 categorical ones. I found a Cubic SVM to have the highest accuracy of 83%. But the performance goes down considerably when I enable PCA with 95% explained variance (accuracy = 40.5%). I am a student and this is the first time I am using PCA.
Why do I see such a result?
Could it be because of a small / unbalanced data set?
When is it useful to apply PCA? When we say "reduce dimensionality", is there a minimum number of features (dimensionality) in the original set?
Any help is appreciated. Thanks in advance!
I want to share my opinion
I think training set 700 means, your data is < 1k.
I'm even surprised that svm performs 83%.
Even MNIST dataset is considered to be small (60.000 training - 10.000 test). Your data is much-much smaller.
You try to reduce your small data even smaller using pca. So what will svm learns? There is no discriminating samples left?
If I were you I would test using random-forest classifier. Random-forest might even perform better.
Even if you balanced your data, it is small data.
I believe using SMOTE will not improve the result. If your data consist of images then you could use ImageDataGenerator for replicating your data. Though I'm not sure matlab contains ImageDataGenerator.
You will use PCA, when you have lots of samples. Yet the samples are not directly effecting the accuracy but they are the components of data.
For instance: Let's consider handwritten digit classification data.
From above can we say each pixel is directly effecting the accuracy?
The answer is no? Above the black pixels are not important for the accuracy, therefore to remove them we use pca.
If you want a detailed explanation with a python example. Check out my other answer

One class learning to make predictions using MATLAB

I am using MATLAB to build a prediction model which the target is binary.
The problem is that those negative observations in my training data may indeed are positives but are just not detected.
I started with a logistic regression model assuming the data is accurate and the results are less than satisfactory. After some research, I moved to one class learning hoping that I can focus on the only the part of data (the positives) that I am certain with.
I looked up the related materials from MATLAB documentation and found that I can use fitcsvm to proceed.
My current problem is:
Am I on the right path? Can one class learning solve my problem?
I tried to use fitcsvm to create a ClassificationSVM using all the positive observations that I have.
model = fitcsvm(Instance,Label,'KernelScale','auto','Standardize',true)
However, when I try to use the model to predict
[label,score] = predict(model,Test)
All the labels predicted for my Test cases are 1. I think I did something wrong. So should I feed the svm only the positive observations that I have?
If not what should I do?

Combining an image classifier and an expert system

Would it be accurate to include an expert system in an image classifying application? (I am working with Matlab, have some experience with image processing and no experience with expert systems.)
What I'm planning on doing is adding an extra feature vector that is actually an answer to a question. Is this fine?
For example: Assume I have two questions that I want the answers to : Question 1 and Question 2. Knowing the answers to these 2 questions should help classify the test image more accurately. I understand expert systems are coded differently from an image classifier but my question is would it be wrong to include the answers to these 2 questions, in a numerical form (1 can be yes, and 0 can be no) and pass this information along with the other feature vectors into a classifier.
If it matters, my current classifier is an SVM.
Regarding training images: yes, they too will be trained with the 2 extra feature vectors.
Converting a set of comments to an answer:
A similar question in cross-validated already explains that it can be done as long as data is properly preprocessed.
In short: you can combine them as long as training (and testing) data is properly preprocessed (e.g. standardized). Standardization improves the performance of most linear classifiers because it scales the variables so they have the similar weight in the learning process and improves the numerical stability (and performance) when variables are sampled from gaussian-like distributions (which is achieved by standarization).
With that, if continuous variables are standardized and categorical variables are encoded as (-1, +1) the SVM should work well. Whether it will improve or not the performance of the classifier depends on the quality of those cathegorical variables.
Answering the other question in the comment.. while using kernel SVM with for example a chi square kernel, the rows of the training data are suppose to behave like histograms (all positive and usually l1-normalized) and therefore introducing a (-1, +1) feature breaks the kernel. Using a RBF kernel the rows of the data are suppose to be L2 normalized, and again, introducing (-1, +1) features might introduce unexpected behaviour (I'm not very sure what exactly the effect would be..).
I worked on similar problem. if multiple features can be extracted from your images then you can train different classifier by using different features. You can think about these classifiers as experts in answering questions based on the features they used in training. Instead of using labels as outputs, it is better to use confidence values. uncertainty can be very important in this manner. you can use these experts to generate values. these values can be combined and used to train another classifier.

som toolbox + prediction missing valuse and outliers

i wanna use SOM toolbox (http://www.cis.hut.fi/somtoolbox/theory/somalgorithm.shtml) for predicting missing values or outliers . but i can't find any function for it.
i wrote a code for visualizaition and getting BMU(Best maching unit) but i'don't know how to use it in prediction. could you help me?
thank you in advance .
If still interests you here goes one solution.
Train your network with a training set with all the inputs that you will further on analyze. After learning, you give the new test data to classify with only the inputs that you have. The network give you back which was the best matching unit (for the features you have), and with this you can access to which of the features you do not have/outliers the BMU corresponds to.
This of course leads to a different learning and prediction implementation. The learning you implement straightforward as suggested in many tutorials. The prediction you need to make the SOM ignore NaN and calculate the BMU based on only the other values. After that, with the BMU you can get the corresponding features and use that to predict missing values or outliers.