Early Detection of peaks with Neural Network - neural-network

I am using Neural Network technique(Backward Learning). As a output for example I am giving the 18 point ahead value and as input I gave the latest 5 point to train.(I tried the many combinations of input data 5,10, 20 ,30...).
For example, the way I trained my data:
t, t+1, t+2, t+3, t+4... => t+22(4+18)
t+1, t+2, t+3, t+4, t+5... => t+23
Exponential inputs:
t, t+1, t+2, t+4, t+8... => t+26(8+18)
t+1, t+2, t+3, t+8, t+9... => t+27
After I trained, I have done forward learning with the my trained values. I have observed that neural network won't able to catch the sudden peaks. Most of the time if I am going to predict 18 seconds ahead, it predicts the correct result 17 seconds later.
Do you have any advice for me about how could I able to predict sudden peaks(that will happen t seconds later) with neural network?

I work with backpropagation and I observe the same behaviour. If I understood correctly, you don't have a true forecasting. Peeks, when present on data, are available to be "predicted" only after appear in the series, and you observe and apparent prediction with a delay.
I thinf you have to user a recurrent network.

Related

How to guarantee convergence when training a neural differential equation?

I'm currently working through the SciML tutorials workshop exercises for the Julia language (https://tutorials.sciml.ai/html/exercises/01-workshop_exercises.html). Specifically, I'm stuck on exercise 6 part 3, which involves training a neural network to approximate the system of equations
function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = α*x - β*x*y
du[2] = dy = -δ*y + γ*x*y
end
The goal is to replace the equation for du[2] with a neural network: du[2] = NN(u, p)
where NN is a neural net with parameters p and inputs u.
I have a set of sample data that the network should try to match. The loss function is the squared difference between the network model's output and that sample data.
I defined my network with
NN = Chain(Dense(2,30), Dense(30, 1)). I can get Flux.train! to run, but the problem is that sometimes the initial parameters for the neural network result in a loss on the order of 10^20 and so training never converges. My best try got the loss down from about 2000 initially to about 20 using the ADAM optimizer over about 1000 iterations, but I can't seem to do any better.
How can I make sure my network is consistently trainable, and is there a way to get better convergence?
How can I make sure my network is consistently trainable, and is there a way to get better convergence?
See the FAQ page on techniques for improving convergence. In a nutshell, the single shooting approach of most ML papers is very unstable and does not work on most practical problems, but there are a litany of techniques to help out. One of the best ones is multiple shooting, which optimizes only short bursts (in parallel) along the time series.
But training on a small interval and growing the interval works, also using more stable optimizers (BFGS) can work. You can also weigh the loss function so that earlier times mean more. Lastly, you can minibatch in a way similar to multiple shooting, i.e. start from a data point and only solve to the next (in fact, if you actually look at the original neural ODE paper NumPy code, they do not do the algorithm as explained but instead do this form of sampling to stabilize the spiral ODE training).

How to improve digit recognition prediction in Neural Networks in Matlab?

I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.

Non-linear classification vs regression with FFANN

I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?

Oscillation in neural network training

I've programmed a fully connected recurrent network (based on Williams and Zipser) in Octave, and I successfully trained it using BPTT to compute an XOR as a toy example. The learning process was relatively uneventful:
XOR training
So, I thought I'd try training the network to compute the XOR of the first two inputs, and the OR of the last two inputs. However, this failed to converge the first time I ran it; instead, the error just oscillated continually. I tried decreasing the learning rate, and turning off momentum entirely, but it didn't help. When I ran it again this morning, it did end up converging, but not without more oscillations in the process:
XOR/OR training
So, my question: could this indicate a problem with my gradient computation, or is this just something that happens when training recurrent networks?

Neural Network Implementation to forecast future behaviour can't detect sudden peaks [duplicate]

This question already has an answer here:
Early Detection of peaks with Neural Network
(1 answer)
Closed 9 years ago.
Neural Network to predict future behaviour of a Time Series. My only feature is based on work load which is in between [0-100].
I am using (Backward Learning). As a output for example I am giving the 18 point ahead value and as input I gave the latest 5 point to train.(I tried the many combinations of input data 5,10, 20 ,30...).
For example, the way I trained my data:
t, t+1, t+2, t+3, t+4... => t+22(4+18)
t+1, t+2, t+3, t+4, t+5.. => t+23
Exponential inputs:
t, t+1, t+2, t+4, t+8... => t+26(8+18)
t+1, t+2, t+3, t+8, t+9...=> t+27
After I trained, I have done forward learning with the my trained values. I have observed that neural network won't able to catch the sudden peaks. Most of the time if I am going to predict 18 seconds ahead, it predicts the correct result 17 seconds later.
Do you have any advice for me about how could I able to predict sudden peaks(that will happen t seconds later) with neural network? or should I implement some other solution(like Ada Boost) to fix this situation?
Example of late prediction. At line 18 it was able to make the correct prediction because it finally reached the peaked value.
Neural networks aren't magical. They just allow you to make associations of inputs and outputs based on a trained data set. If you train it with noisy data, your model will be noisy. If you de-emphasize noisy data in its learned model, it won't be able to predict noisy data.
The stock market and other industries are still unknown because it's hard to build a model through all the noise. Eliminate the noise and you essentially have a moving average, which will tend to predict values close to your previous observed values, not the sudden peaks you were hoping for.