I am trying to solve the 3-bit parity problem using the functional link neural network (Pao,1988). I am performing backpropagation to update the weights and extended the input using the outerproduct model proposed by Pao, ie. x1,x1x2,x1x3,x2x3,x1x2x3 is the input as shown below:
Learning rate 0.01, momentum 0.1, transfer function log-sigmoid.
But still after 1000 iteration the weights are not able to classify properly. The FLNN fails for 1,1,1, and 0,0,0 input. If there is any idea to improve the result I'd be highly appreciate it.
This is from example 2 of the paper: Klassen, Myungsook, Yoh Han Pao, and Victor Chen. "Characteristics of the functional link net: a higher order delta rule net." Neural Networks, 1988., IEEE International Conference on. IEEE, 1988.
The problem was solved using the perceptron learning rule
W_new = W_old + learning_rate * error * input
, instead of using the generalized delta rule as mentioned in the paper. With learning rate .9 it converged within 100 iterations, whereas the neural network with hidden layer took about 1000 iterations.
Related
I'm currently working through the SciML tutorials workshop exercises for the Julia language (https://tutorials.sciml.ai/html/exercises/01-workshop_exercises.html). Specifically, I'm stuck on exercise 6 part 3, which involves training a neural network to approximate the system of equations
function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = α*x - β*x*y
du[2] = dy = -δ*y + γ*x*y
end
The goal is to replace the equation for du[2] with a neural network: du[2] = NN(u, p)
where NN is a neural net with parameters p and inputs u.
I have a set of sample data that the network should try to match. The loss function is the squared difference between the network model's output and that sample data.
I defined my network with
NN = Chain(Dense(2,30), Dense(30, 1)). I can get Flux.train! to run, but the problem is that sometimes the initial parameters for the neural network result in a loss on the order of 10^20 and so training never converges. My best try got the loss down from about 2000 initially to about 20 using the ADAM optimizer over about 1000 iterations, but I can't seem to do any better.
How can I make sure my network is consistently trainable, and is there a way to get better convergence?
How can I make sure my network is consistently trainable, and is there a way to get better convergence?
See the FAQ page on techniques for improving convergence. In a nutshell, the single shooting approach of most ML papers is very unstable and does not work on most practical problems, but there are a litany of techniques to help out. One of the best ones is multiple shooting, which optimizes only short bursts (in parallel) along the time series.
But training on a small interval and growing the interval works, also using more stable optimizers (BFGS) can work. You can also weigh the loss function so that earlier times mean more. Lastly, you can minibatch in a way similar to multiple shooting, i.e. start from a data point and only solve to the next (in fact, if you actually look at the original neural ODE paper NumPy code, they do not do the algorithm as explained but instead do this form of sampling to stabilize the spiral ODE training).
I have read this line about neural networks :
"Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable.
My data distribution is like this :The features are production of rubber ,consumption of rubber , production of synthetic rubber and exchange rate all values are scaled
My question is that the data is not linearly separable so should i apply ANN on it or not? is this a rule that it should be applied on linerly separable data only ? as i am getting good results using it (0.09% MAPE error) . I have also applied SVM regression (fitrsvm function in MATLAB)so I have to ask can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
Neural networks are not perceptrons. Perceptron is on of the oldest ideas, which is at most a single building block of neural networks. Perceptron is designed for binary, linear classification and your problem is neither the binary classification nor linearly separable. You are looking at regression here, where neural networks are a good fit.
can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
SVM has regression "clone" called SVR which can be used for any task NN (as a regressor) can be used. There are of course some typical characteristics of both (like SVR being non parametric estimator etc.). For the task at hand - both approaches (as well as any another regressor, there are dozens of them!) is fine.
I'm reading Neural Networks and Deep Learning (first two chapters), and I'm trying to follow along and build my own ANN to classify digits from the MNIST data set.
I've been scratching my head for several days now, since my implementation peaks out at ~57% accuracy at classifying digits from the test set (some 5734/10000) after 10 epochs (accuracy for the training set stagnates after the tenth epoch, and accuracy for the test set deteriorates presumably because of over-fitting).
I'm using nearly the same configuration as in the book: 2-layer feedforward ANN (784-30-10) with all layers fully connected; standard sigmoid activation functions; quadratic cost function; weights are initialized the same way (taken from a Gaussian distribution with mean 0 and standard deviation 1)
The only differences being that I'm using online training instead of batch/mini-batch training and a learning rate of 1.0 instead of 3.0 (I have tried mini-batch training + learning rate of 3.0 though)
And yet, my implementation doesn't pass the 60% percentile after a bunch of epochs where as in the book the ANN goes above %90 just after the first epoch with pretty much the exact same configuration.
At first I messed up implementing the backpropagation algorithm, but after reimplementing backpropagation differently three times, with the exactly the same results in each reimplementation, I'm stumped...
An example of the results the backpropagation algorithm is producing:
With a simpler feedforward network with the same configuration mentioned above (online training + learning rate of 1.0): 3 input neurons, 2 hidden neurons and 1 output neuron.
The initial weights are initialized as follows:
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: weights=[0.1, 0.15, 0.2] bias=0.25
- Neuron #2: weights=[0.3, 0.35, 0.4] bias=0.45
Layer #2 (1 neuron)
- Neuron #1: weights=[0.5, 0.55] bias=0.6
Given an input of [0.0, 0.5, 1.0], the output is 0.78900331.
Backpropagating for the same input and with the desired output of 1.0 gives the following partial derivatives (dw = derivative wrt weight, db = derivative wrt bias):
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: dw=[0, 0.0066968054, 0.013393611] db=0.013393611
- Neuron #2: dw=[0, 0.0061298212, 0.012259642] db=0.012259642
Layer #2 (1 neuron)
- Neuron #1: dw=[0.072069918, 0.084415339] db=0.11470326
Updating the network with those partial derivatives yields a corrected output value of 0.74862305.
If anyone would be kind enough to confirm the above results, it would help me tremendously as I've pretty much ruled out backpropagation being faulty as the reason for the problem.
Did anyone tackling the MNIST problem ever come across this problem?
Even suggestions for things I should check would help since I'm really lost here.
Doh..
Turns out nothing was wrong with my backpropagation implementation...
The problem was that I read the images into a signed char (in C++) array, and the pixel values overflowed, so that when I divided by 255.0 to normalize the input vectors into the range of 0.0-1.0, I actually got negative values... ;-;
So basically I spent some four days debugging and reimplementing the same thing when the problem was somewhere else entirely.
I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?
Recently, I am trying to using Matlab build-in neural networks toolbox to accomplish my classification problem. However, I have some questions about the parameter settings.
a. The number of neurons in the hidden layer:
The example on this page Matlab neural networks classification example shows a two-layer (i.e. one-hidden-layer and one-output-layer) feed forward neural networks. In this example, it uses 10 neurons in the hidden layer
net = patternnet(10);
My first question is how to define the best number of neurons for my classification problem? Should I use cross-validation method to get the best performed number of neurons using a training data set?
b. Is there a method to choose three-layer or more multi-layer neural networks?
c. There are many different training method we can use in the neural networks toolbox. A list can be found at Training methods list. The page mentioned that the fastest training function is generally 'trainlm'; however, generally speaking, which one will perform best? Or it totally depends on the data set I am using?
d. In each training method, there is a parameter called 'epochs', which is the training iteration for my understanding. For each training method, Matlab defined the maximum number of epochs to train. However, from the example, it seems like 'epochs' is another parameter we can tune. Am I right? Or we just set the maximum number of epochs or leave it as default?
Any experience with Matlab neural networks toolbox is welcome and thanks very much for your reply. A.
a. You can refer to How to choose number of hidden layers and nodes in neural network? and ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hu
Surely you can do cross-validation to determine the parameter of best number of neurons. But it's not recommended as it's more suitable to use it in the stage of weights training of a certain network.
b. Refer to ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hl
And for more layers of neural network, you can refer to Deep Learning, which is very hot in recent years and gets state-of-the-art performances in many of the pattern recognition tasks.
c. It depends on your data. trainlm performs better on function fitting (nonlinear regression) problems than on pattern recognition problems while training large networks and pattern recognition networks, trainscg and trainrp are good choices. Generally, Gradient Descent and Resilient Backpropagation is recommended. More detailed comparison can be found here: http://www.mathworks.cn/cn/help/nnet/ug/choose-a-multilayer-neural-network-training-function.html
d. Yes, you're right. We can tune the epochs parameter. Generally you can output the recognition results/accuracy at every epoch and you will see that it is promoting more and more slowly, and the more epochs the more computing time. You can make a compromise between the accuracy and computation time.
For part b of your question:
You can use like this code:
net = patternnet([10 15 20]);
This script create a network with 3 hidden layer that first layer has 10 neurons, second layer has 15 neurons and 3th layer has 20 neurons.