I would like to train a neural network using pybrain and i use
trainer=BackpropTrainer(n,ds, learningrate=0.01, momentum=0.9,verbose=True)
trainer.trainUntilConvergence(ds)
The training using trainUntilConvergence(ds) takes time to converge even for few examples as it uses gradient descent. Is there a function or wrapper to use BFGS quasi newton algorithm for my training in using pybrain?
Thank you
Related
I am reading a lot of articles about neural networks and I found very different information. I understand that the supervised neural network can be also regression and classification. In both cases I can use the sigmoid function but what is the difference?
A single-layer neural network is essentially the same thing as linear regression. That's because of how neural networks work: Each input gets weighted with a weight factor to produce an output, and the weight factors are iteratively chosen in such a way that the error (the discrepancy between the outputs produced by the model and the correct output that should be produced for a given input) is minimised. Linear regression does the same thing. But in a neural network, you can stack several of such layers on top of each other.
Classification is a potential, but by far not the only, use case for neural networks. Conversely, there are classification algorithms that don't use neural networks (e.g. K-nearest neighbours). The sigmoid function is often used as an activation function for the last layer in a classifier neural network.
I have read this line about neural networks :
"Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable.
My data distribution is like this :The features are production of rubber ,consumption of rubber , production of synthetic rubber and exchange rate all values are scaled
My question is that the data is not linearly separable so should i apply ANN on it or not? is this a rule that it should be applied on linerly separable data only ? as i am getting good results using it (0.09% MAPE error) . I have also applied SVM regression (fitrsvm function in MATLAB)so I have to ask can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
Neural networks are not perceptrons. Perceptron is on of the oldest ideas, which is at most a single building block of neural networks. Perceptron is designed for binary, linear classification and your problem is neither the binary classification nor linearly separable. You are looking at regression here, where neural networks are a good fit.
can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
SVM has regression "clone" called SVR which can be used for any task NN (as a regressor) can be used. There are of course some typical characteristics of both (like SVR being non parametric estimator etc.). For the task at hand - both approaches (as well as any another regressor, there are dozens of them!) is fine.
I am trying to use a standard fully-connected neural net as the basis for action values in Q-Learning. I am using http://deeplearning.net/tutorial/mlp.html#mlp as a reference specifically this line:
gparams = [T.grad(cost, param) for param in classifier.params]
I would like to calculate the error for my output unit associated with the last action using the Q-Learning policy control method (as described in http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node65.html) and set the other output errors to zero.
How can I use Theano's grad function to back-propagate the errors to the rest of the neural network?
Neural nets are just one possible way of parametrizing your Q-function. The way you perform gradient descent in this case is explained in this section of Sutton and Barto's book. Just treat the weights of your neural net as a vector of parameters.
I don't quite understand why a sigmoid function is seen as more useful (for neural networks) than a step function... hoping someone can explain this for me. Thanks in advance.
The (Heaviside) step function is typically only useful within single-layer perceptrons, an early type of neural networks that can be used for classification in cases where the input data is linearly separable.
However, multi-layer neural networks or multi-layer perceptrons are of more interest because they are general function approximators and they are able to distinguish data that is not linearly separable.
Multi-layer perceptrons are trained using backpropapagation. A requirement for backpropagation is a differentiable activation function. That's because backpropagation uses gradient descent on this function to update the network weights.
The Heaviside step function is non-differentiable at x = 0 and its derivative is 0 elsewhere. This means gradient descent won't be able to make progress in updating the weights and backpropagation will fail.
The sigmoid or logistic function does not have this shortcoming and this explains its usefulness as an activation function within the field of neural networks.
It depends on the problem you are dealing with. In case of simple binary classification, a step function is appropriate. Sigmoids can be useful when building more biologically realistic networks by introducing noise or uncertainty. Another but compeletely different use of sigmoids is for numerical continuation, i.e. when doing bifurcation analysis with respect to some parameter in the model. Numerical continuation is easier with smooth systems (and very tricky with non-smooth ones).
I have implemented a neural network in Matlab R2013a for character recognition. I have used trainbr function for nn training. 80% samples were used for training and the rest for testing. When i plot the confusion matrix, i get 100% accuracy on the training set. But for the test set the accuracy is very low(around 60%). What could be possibly wrong?