Can we use a single layer perceptron on multiclass classification problem?
How can we show that those classes are non-linearly separable if we had 30 features ?
Yes ye can use single layer perceptron (slp) for multi-class classification. We can employ one-vs-all or one-vs-one strategy for this. SLP are like logistic classifiers which are linearly separable so if the dataset is not linearly separable then you might wanna consider using Multi-layer perceptron.
I am not sure to what you are asking, but as to my understanding, if we can separate 2 classes by a straight line, they are linearly separable. This means that if in your dataset you can draw straight lines which separates examples from each other, then the problem is linearly separable. It is not usually the case though.
I really liked this blog. You might wanna check it out.
Related
I have calculated the following parameters after applying the following algorithms on a dataset from kaggle
enter image description here
In the above case,linear model is giving the best results.
Are the above results correct and can linear model actually give better results than other 3 in any case?
Or am I missing something?
According to AUC criterion this classification is perfect (1 is theoretical maximum). This means a clear difference in the data. In this case, it makes no sense to talk about differences in the results of methods. Another point is that you can play with methods parameters (you likely will get slightly different results) and other methods can become better. But real result will be indistinguishable. Sophisticated methods are invented for sophisticated data. This is not the case.
All models are wrong, some are useful. - George Box
In terms of classification, a model would be effective as long as it could nicely fit the classification boundaries.
For binary classification case, supposing your data is perfectly linearly separable, then linear model will do the job - actually the "best" job since any more complicated models won't perform better.
If your +'s and -'s are somehow a bit scattered when they cannot be separated by a line (actually hyperplane), then linear model could be beaten by decision tree simply because decision trees can provide classification boundary of more complex shape (cubes).
Then random forest may beat decision tree as classification boundary of random forest is more flexible.
However, as we mentioned early, linear model still has its time.
I have read this line about neural networks :
"Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable.
My data distribution is like this :The features are production of rubber ,consumption of rubber , production of synthetic rubber and exchange rate all values are scaled
My question is that the data is not linearly separable so should i apply ANN on it or not? is this a rule that it should be applied on linerly separable data only ? as i am getting good results using it (0.09% MAPE error) . I have also applied SVM regression (fitrsvm function in MATLAB)so I have to ask can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
Neural networks are not perceptrons. Perceptron is on of the oldest ideas, which is at most a single building block of neural networks. Perceptron is designed for binary, linear classification and your problem is neither the binary classification nor linearly separable. You are looking at regression here, where neural networks are a good fit.
can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
SVM has regression "clone" called SVR which can be used for any task NN (as a regressor) can be used. There are of course some typical characteristics of both (like SVR being non parametric estimator etc.). For the task at hand - both approaches (as well as any another regressor, there are dozens of them!) is fine.
I am trying to find a way to visualize the data with high-dimensional input for two-class classification in SVM, before analysis to decide which kernel to use. In documents online, the visualization of data is given only for two dimensional inputs (I mean two attributes).
Another question rises: What if I have multi-class and more than two attributes?
To visualize, the data should be represented by 3 or less dimension.
Simply PCA can be applied to reduce dimension.
use pre-image using MDS.
refer to a paper The pre-image problem in kernel methods and its matlab code in http://www.cse.ust.hk/~jamesk/publication.html
I want to classify a data set (which has four classes) using the SVM method. I've done it using the coding below (using a 1 against all). It isn't terribly accurate but I'm thankful for anything at this stage.
http://www.mathworks.co.uk/matlabcentral/fileexchange/39352-multi-class-svm
I was wondering if there is a way to plot the support vectors and training points. I've managed this for a 2 class SVM classification but can't find a way of doing it with >2 classes.
Any help/advice re. how to achieve a semi-pretty graph would be very much appreciated!
I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.
Basically I'm using Back-propagation algorithm to train the neural network using the dataset given here: http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements.
But in that dataset no. of attributes are very high. In fact one of the mentors of the project told me that If you train the Neural Network with that many attributes, it'll take lots of time to get trained. So is there a way to optimize the input dataset? Or I just have to use that many attributes?
1558 is actually a modest number of features/attributes. The # of instances(3279) is also small. The problem is not on the dataset side, but on the training algorithm side.
ANN is slow in training, I'd suggest you to use a logistic regression or svm. Both of them are very fast to train. Especially, svm has a lot of fast algorithms.
In this dataset, you are actually analyzing text, but not image. I think a linear family classifier, i.e. logistic regression or svm, is better for your job.
If you are using for production and you cannot use open source code. Logistic regression is very easy to implement compared to a good ANN and SVM.
If you decide to use logistic regression or SVM, I can future recommend some articles or source code for you to refer.
If you're actually using a backpropagation network with 1558 input nodes and only 3279 samples, then the training time is the least of your problems: Even if you have a very small network with only one hidden layer containing 10 neurons, you have 1558*10 weights between the input layer and the hidden layer. How can you expect to get a good estimate for 15580 degrees of freedom from only 3279 samples? (And that simple calculation doesn't even take the "curse of dimensionality" into account)
You have to analyze your data to find out how to optimize it. Try to understand your input data: Which (tuples of) features are (jointly) statistically significant? (use standard statistical methods for this) Are some features redundant? (Principal component analysis is a good stating point for this.) Don't expect the artificial neural network to do that work for you.
Also: remeber Duda&Hart's famous "no-free-lunch-theorem": No classification algorithm works for every problem. And for any classification algorithm X, there is a problem where flipping a coin leads to better results than X. If you take this into account, deciding what algorithm to use before analyzing your data might not be a smart idea. You might well have picked the algorithm that actually performs worse than blind guessing on your specific problem! (By the way: Duda&Hart&Storks's book about pattern classification is a great starting point to learn about this, if you haven't read it yet.)
aplly a seperate ANN for each category of features
for example
457 inputs 1 output for url terms ( ANN1 )
495 inputs 1 output for origurl ( ANN2 )
...
then train all of them
use another main ANN to join results