Can you suggest any implementation (matlab) of Multi-class classification algorithm for large database, I tried libsvm it's good except for large database and for the liblinear I can't use it for the multi classification
If you want to use liblinear for multi class classification, you can use one vs all technique. For more information Look at this.
But if you have large database then use of SVM is not recommended. As Run time complexity of SVM is O(N * N * m)
N = number of samples in data
m = number of features in data
So, alternatively You can use Neural Network. You can start with nntool available in MATLAB.
Related
I am currently performing classification on the Iris dataset. I used both LDA and kNN methods to classify the data. I found both to be highly accurate and cannot decide which one is more appropriate to use? My first thought is kNN since LDA assumes the data to have a multivariate normal distribution. However, would love to know more theory behind which is better.
k-NN should run incrementally faster than LDA as you add more dimensions to your problem.
Also, the k-NN time complexity is pretty much insensitive to the number of classes in most implementations. LDA on the other hand has a direct dependence on that.
I have a large features dataset of around 111 Mb for classification with 217000 data points and each point has 1760000 features point. When used in training with SVM in MATLAB, it takes a lot of time.
How can be this data processed in MATLAB.
It depends on what sort of SVM you are building.
As a rule of thumb, with such big feature sets you need to look at linear classifiers, such as an SVM with no/the linear kernel, or logistic regression with various regularizations etc.
If you're training an SVM with a Gaussian kernel, the training algorithm has O(max(n,d) min (n,d)^2) complexity, where n is the number of examples and d the number of features. In your case it ends up being O(dn^2) which is quite big.
I am classifying gender using a KNN classifier.
I want to add an SVM classifier instead of KNN classifier with the same labels of 0 and 1 (0 for women and 1 for men)
I have a matrix of test examples, sample, a matrix of training examples, training, and a vector with the labels for the training examples group. I want class, a vector of the labels for the test examples.
class = knnclassify(sample, training, group);
if class==1
x='Male';
else
x='Female';
end
How can I change this code to find class using an SVM?
To train an SVM, you will need the Statistics and Machine Learning Toolbox.
The biggest difference between the knnclassify and using an SVM classifier is that training and classifying new labels will be two separate steps.
1. Train your SVM : fitcsvm
This step teaches the classifier how to distinguish between your two classes. It is learning a linear separator (or a weighted combination of the features) which has the largest margin between positive and negative examples. All the examples you give it need to have ground truth labels.
SVM's have many tunable parameters that you can adjust during the training step. There are several good tutorials in the Matlab documentation which describe the differences, but for the most basic version, you can just use your training examples
model = fitcsvm(training,group);
This model will be used in the next step.
2. Classify new examples : predict
To classify your new example, run
class = predict(sample, model);
Notes:
Using your model, you can also run cross-fold validation, useful for accuracy analysis.
cvModel = crossval(model);
classError = kfoldLoss(cvModel);
You can also save your model, like any other Matlab variable for future use.
save('model.m', 'model');
knnclassify comes from the bioinformatics toolbox. In the Statistics and Machine Learning Toolbox, there is also a KNN model which you train with fitcknn and classify with predict. The benefit is that you can reuse your KNN model with several sets of data, compare cross-validation results, and save it for future use.
I am trying to find a way to visualize the data with high-dimensional input for two-class classification in SVM, before analysis to decide which kernel to use. In documents online, the visualization of data is given only for two dimensional inputs (I mean two attributes).
Another question rises: What if I have multi-class and more than two attributes?
To visualize, the data should be represented by 3 or less dimension.
Simply PCA can be applied to reduce dimension.
use pre-image using MDS.
refer to a paper The pre-image problem in kernel methods and its matlab code in http://www.cse.ust.hk/~jamesk/publication.html
I have a data set to classify.By using KNN algo i am getting an accuracy of 90% but whereas by using SVM i just able to get over 70%. Is SVM not better than KNN. I know this might be stupid to ask but, what are the parameters for SVM which will give nearly approximate results as KNN algo. I am using libsvm package on matlab R2008
kNN and SVM represent different approaches to learning. Each approach implies different model for the underlying data.
SVM assumes there exist a hyper-plane seperating the data points (quite a restrictive assumption), while kNN attempts to approximate the underlying distribution of the data in a non-parametric fashion (crude approximation of parsen-window estimator).
You'll have to look at the specifics of your scenario to make a better decision as to what algorithm and configuration are best used.
It really depends on the dataset you are using. If you have something like the first line of this image ( http://scikit-learn.org/stable/_images/plot_classifier_comparison_1.png ) kNN will work really well and Linear SVM really badly.
If you want SVM to perform better you can use a Kernel based SVM like the one in the picture (it uses a rbf kernel).
If you are using scikit-learn for python you can play a bit with code here to see how to use the Kernel SVM http://scikit-learn.org/stable/modules/svm.html
kNN basically says "if you're close to coordinate x, then the classification will be similar to observed outcomes at x." In SVM, a close analog would be using a high-dimensional kernel with a "small" bandwidth parameter, since this will cause SVM to overfit more. That is, SVM will be closer to "if you're close to coordinate x, then the classification will be similar to those observed at x."
I recommend that you start with a Gaussian kernel and check the results for different parameters. From my own experience (which is, of course, focused on certain types of datasets, so your mileage may vary), tuned SVM outperforms tuned kNN.
Questions for you:
1) How are you selecting k in kNN?
2) What parameters have you tried for SVM?
3) Are you measuring accuracy in-sample or out-of-sample?